Oriol Nieto

Senior Research Engineer II

San Francisco

Oriol (Uri) Nieto (he/they) is a Senior Research Engineer II at the Sound Design AI group (SODA) in San Francisco.  Uri works on human-centered AI for audio creativity, encompassing everything from music to audiobooks, video editing, and sound design. He transferred several technologies to Premiere Pro, including Generative Extend and Audio Category Tagging. He holds a PhD in Music Technology from the New York University, a Master’s in Music, Science, and Technology from Stanford University, and a Master’s in Information Technologies from Pompeu Fabra University. Highly involved with the Music Information Retrieval community, he was one of the three General Chairs for ISMIR 2024 in San Francisco. Uri has helped develop relevant open-source MIR packages such as librosa, mir-eval, and MSAF; contributed to PyTorch; and plays guitar, violin, cajón, and sings (and screams) in his spare time.

For more information, please visit Uri’s personal website.

Publications

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

Ghosh, Sreyan., Evuru, Chandra., Kumar, Sonal., Tyagi, Utkarsh., Nieto, Oriol., Jin, Zeyu., Manocha, Dinesh. (Apr. 21, 2025)

International Conference on Learning Representations (ICLR)

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

García, Hugo., Nieto, Oriol., Salamon, Justin., Pardo, Bryan., Seetharaman, Prem. (Apr. 7, 2025)

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Sakshi, S., Tyagi, Utkarsh., Kumar, Sonal., Seth, Ashish., Selvakumar, Ramaneswaran., Nieto, Oriol., Duraiswami, Ramani., Ghosh, Sreyan., Manocha, Dinesh. (Mar. 24, 2025)

Oral Paper (Top 5%)

International Conference on Learning Representations (ICLR)

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Ghosh, Sreyan., Kumar, Sonal., Seth, Ashish., Evuru, Chandra., Tyagi, Utkarsh., Sakshi, S., Nieto, Oriol., Duraiswami, Ramani., Manocha, Dinesh. (Nov. 16, 2024)

Oral Paper (Top 5%)

Empirical Methods in Natural Language Processing Conference (ENMLP)

Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning

Manco, Ilaria., Salamon, Justin., Nieto, Oriol. (Nov. 10, 2024)

International Society for Music Information Retrieval Conference (ISMIR)

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

Ghosh, Sreyan., Seth, Ashish., Kumar, Sonal., Tyagi, Utkarsh., Evuru, Chandra., Ramaneswaran, S.., Sakshi, S.., Nieto, Oriol., Duraiswami, Ramani., Manocha, Dinesh. (May. 7, 2024)

International Conference on Learning Representations (ICLR)

Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries

Wilkins, Julia., Salamon, Justin., Fuentes, Magdalena., Bello, Juan., Nieto, Oriol. (Oct. 22, 2023)

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Efficient Spoken Language Recognition Via Multilabel Classification

Nieto, Oriol., Jin, Zeyu., Dernoncourt, Franck., Salamon, Justin. (Aug. 24, 2023)

Interspeech 2023

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

Tan, Reuben., Ray, Arijit., Plummer, Bryan., Salamon, Justin., Nieto, Oriol., Russell, Bryan., Saenko, Kate. (Jun. 18, 2023)

Highlight Paper (Top 10%)

Conference on Computer Vision and Pattern Recognition (CVPR)

Audio-Text Models Do Not Yet Leverage Natural Language

Wu, Ho-Hsiang., Nieto, Oriol., Bello, Juan., Salamon, Justin. (Jun. 4, 2023)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Music Enhancement via Image Translation and Vocoding

Kandpal, Nikhil., Nieto, Oriol., Jin, Zeyu. (May. 8, 2022)

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Deep Embeddings and Section Fusion Improve Music Segmentation

Salamon, Justin., Nieto, Oriol., Bryan, Nicholas. (Nov. 8, 2021)

International Society for Music Information Retrieval Conference (ISMIR)

Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications

Nieto, Oriol., Mysore, Gautham., Wang, Cheng-i., Smith, Jordan., Schlüter, Jan., Grill, Thomas., McFee, Brian. (Dec. 11, 2020)

Transactions of the International Society for Music Information Retrieval (TISMIR)

News