Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation Paper • 2410.17589 • Published Oct 23, 2024
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation Paper • 2406.02032 • Published Jun 4, 2024
Audio-Image Cross-Modal Retrieval with Onomatopoeic Images Paper • 2605.17509 • Published 10 days ago