Publications

Publication date: November 5, 2025

DRAGON: Distributional Rewards Optimize Diffusion Generative Models

Transactions on Machine Learning Research

Yatong Bai, Jonah Casebeer, Somayeh Sojoudi, Nicholas J. Bryan
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: October 13, 2025

Learning to Upsample and Upmix Audio in the Latent Domain

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2025)

Dimitrios Bralios, Paris Smaragdis, Jonah Casebeer
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: October 12, 2025

SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Sonal Kumar, Prem Seetharaman, Justin Salamon, Oriol Nieto
  • Adobe Research icon Audio

Publication date: August 31, 2025

Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders

IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2025)

Dimitrios Bralios, Jonah Casebeer, Paris Smaragdis
Best Paper
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: May 8, 2025

FLAM: Frame-Wise Language-Audio Modeling

International Conference on Machine Learning (ICML)

Yusong Wu, Christos Tsirigotis, Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, Oriol Nieto, Prem Seetharaman, Justin Salamon
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: April 26, 2025

SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation

CHI 2025

Stephen Brade, Sam Anderson, Rithesh Kumar, Zeyu Jin, Anh Truong
  • Adobe Research icon Audio
  • Adobe Research icon Human Computer Interaction

Publication date: April 24, 2025

Presto! Distilling Steps and Layers for Accelerating Music Generation

International Conference on Learning Representations (ICLR 2025)

Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
(Spotlight, top 5%)
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: April 21, 2025

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

International Conference on Learning Representations (ICLR)

Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: April 7, 2025

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Natural Language Processing

Publication date: April 7, 2025

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Hugo Flores GarcĂ­a, Oriol Nieto, Justin Salamon, Bryan Pardo, Prem Seetharaman
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: April 6, 2025

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
  • Adobe Research icon Audio

Publication date: March 24, 2025

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

International Conference on Learning Representations (ICLR)

S Sakshi, Utkarsh Tyagi, Sonal Kumar, Ashish Seth, Ramaneswaran Selvakumar, Oriol Nieto, Ramani Duraiswami, Sreyan Ghosh, Dinesh Manocha
Oral Paper (Top 5%)
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Natural Language Processing

Publication date: November 16, 2024

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Empirical Methods in Natural Language Processing Conference (ENMLP)

Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
Oral Paper (Top 5%)
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Natural Language Processing

Publication date: November 10, 2024

Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning

International Society for Music Information Retrieval Conference (ISMIR)

Ilaria Manco, Justin Salamon, Oriol Nieto
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Natural Language Processing

Publication date: November 10, 2024

DITTO-2: Distilled Diffusion Inference Time T-Optimization for Music Generation

International Society of Music Information Retrieval (ISMIR)

Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: September 5, 2024

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Interspeech 2024

Minh Van Nguyen, Franck Dernoncourt, David Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan A. Rossi, Quan Hung Tran, Trung Bui, Thien Nguyen
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Natural Language Processing

Publication date: August 28, 2024

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

Interspeech 2024

Ke Chen, Jiaqi Su, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Zeyu Jin
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: July 27, 2024

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

International Conference on Machine Learning (ICML)

Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
(Oral, top 1.5%)
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: July 23, 2024

MusicHiFi: Fast High-Fidelity Stereo Vocoding

IEEE Signal Processing Letters, vol. 31, pp. 2365-2369

Ge Zhu, Juan-Pablo Caceres, Zhiyao Duan, Nicholas J. Bryan
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: May 24, 2024

Music ControlNet: Multiple Time-varying Controls for Music Generation

IEEE Transactions on Audio, Speech, and Language Processing (TASLP)

Shih-Lu Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
1 2 3 4 9