Publications

Publication date: November 10, 2024

DITTO-2: Distilled Diffusion Inference Time T-Optimization for Music Generation

International Society of Music Information Retrieval (ISMIR)

Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: September 5, 2024

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Interspeech 2024

Minh Van Nguyen, Franck Dernoncourt, David Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan A. Rossi, Quan Hung Tran, Trung Bui, Thien Nguyen
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Natural Language Processing

Publication date: August 28, 2024

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

Interspeech 2024

Ke Chen, Jiaqi Su, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Zeyu Jin
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: July 27, 2024

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

International Conference on Machine Learning (ICML)

Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
(Oral, top 1.5%)
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: July 23, 2024

MusicHiFi: Fast High-Fidelity Stereo Vocoding

IEEE Signal Processing Letters, vol. 31, pp. 2365-2369

Ge Zhu, Juan-Pablo Caceres, Zhiyao Duan, Nicholas J. Bryan
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: May 24, 2024

Music ControlNet: Multiple Time-varying Controls for Music Generation

IEEE Transactions on Audio, Speech, and Language Processing (TASLP)

Shih-Lu Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: May 7, 2024

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

International Conference on Learning Representations (ICLR)

Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Natural Language Processing

Publication date: April 14, 2024

MDX-GAN: Enhancing Perceptual Quality in Multi-Class Source Separation Via Adversarial Training

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Ke Chen, Jiaqi Su, Zeyu Jin
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: April 14, 2024

GR0: Self-Supervised Global Representation Learning for Zero-Shot Voice Conversion

ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Yunyun Wang, Jiaqi Su, Adam Finkelstein, Zeyu Jin
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: October 22, 2023

Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Julia Wilkins, Justin Salamon, Magdalena Fuentes, Juan Pablo Bello, Oriol Nieto
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Computer Vision, Imaging & Video

Publication date: August 24, 2023

Efficient Spoken Language Recognition Via Multilabel Classification

Interspeech 2023

Oriol Nieto, Zeyu Jin, Franck Dernoncourt, Justin Salamon
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: June 18, 2023

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

Conference on Computer Vision and Pattern Recognition (CVPR)

Reuben Tan, Arijit Ray, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko
Highlight Paper (Top 10%)
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Computer Vision, Imaging & Video

Publication date: June 4, 2023

Audio-Text Models Do Not Yet Leverage Natural Language

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Ho-Hsiang Wu, Oriol Nieto, Juan Pablo Bello, Justin Salamon
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: June 4, 2023

Transcription Free Filler Word Detection with Neural Semi-CRFs

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Ge Zhu, Yujia Yan, Juan-Pablo Caceres, Zhiyao Duan
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: December 14, 2022

Automated Acoustic Monitoring Captures Timing and Intensity of Bird Migration

Journal of Applied Ecology

Benjamin M. Van Doren, Vincent Lostanlen, Aurora Cramer, Justin Salamon, Adriaan Dokter, Steve Kelling, Juan Pablo Bello, Andrew Farnsworth
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: November 23, 2022

Meta-AF: Meta-Learning for Adaptive Filters

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP 2022)

Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio

Publication date: November 2, 2022

CrossA11y: Identifying Video Accessibility Issues via Cross-modal Grounding

UIST 2022

Xingyu "Bruce" Liu, Ruolin Wang, Dingzeyu Li, Xiang "Anthony" Chen, Amy Pavel
Best Paper Award
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Computer Vision, Imaging & Video
  • Adobe Research icon Human Computer Interaction

Publication date: October 26, 2022

Beyond Subtitles: Captioning and Visualizing Non-speech Sounds to Improve Accessibility of User-Generated Videos

ASSETS 2022

Oliver Alonzo, Hijung Valentina Shin, Dingzeyu Li
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
  • Adobe Research icon Graphics (2D & 3D)
  • Adobe Research icon Human Computer Interaction

Publication date: September 18, 2022

Audio Similarity is Unreliable as a Proxy for Audio Quality

Interspeech 2022

Pranay Manocha, Zeyu Jin, Adam Finkelstein
  • Adobe Research icon Audio

Publication date: September 18, 2022

Filler Word Detection and Classification: A Dataset and Benchmark

23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022)

Ge Zhu, Juan-Pablo Caceres, Justin Salamon
  • Adobe Research icon AI & Machine Learning
  • Adobe Research icon Audio
1 2 3 4 5 10