Publications

MDX-GAN: Enhancing Perceptual Quality in Multi-Class Source Separation Via Adversarial Training

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Publication date: April 14, 2024

Ke Chen, Jiaqi Su, Zeyu Jin

Audio source separation aims to extract individual sound sources from an audio mixture. Recent studies on source separation focus primarily on minimizing signal-level distance, typically measured by source-to-distortion ratio (SDR). However, scant attention has been given to the perceptual quality of the separated tracks. In this paper, we propose MDX-GAN, an efficient and high-fidelity audio source separator based on MDX-Net for multiple sound classes. We leverage different training objectives to enhance the perceptual quality of audio source separation. Specifically, we adopt perceptually-motivated loss functions on top of the waveform loss, including multi-resolution STFT and Mel-spectrogram losses, and employ the adversarial training paradigm with multi-domain and multi-scale discriminators to refine the perceptual quality of separation. Additionally, we extend the model to support multiple sound classes within a single network via feature-wise linear modulation (FiLM). We conduct both objective and subjective experiments to evaluate MDX-GAN on real-world settings, and assess the impacts of design components on the perceptual quality and SDR scores. Results demonstrate that MDX-GAN accurately separates the sound source and achieves superior perceptual quality.

Learn More

Research Areas:  Adobe Research iconAI & Machine Learning Adobe Research iconAudio