Combining Modeling of Singing Voice and Background Music for Automatic Separation of Musical Mixtures

Musical mixtures can be modeled as being composed of two characteristic sources: singing voice and background music. Many music/voice separation techniques tend to focus on modeling one source; the residual is then used to explain the other source. In such cases, separation performance is often unsatisfactory for the source that has not been explicitly modeled. In this work, we propose to combine a method that explicitly models singing voice with a method that explicitly models background music, to address separation performance from the point of view of both sources. One method learns a singer-independent model of voice from singing examples using a Non-negative Matrix Factorization (NMF) based technique, while the other method derives a model of music by identifying and extracting repeating patterns using a similarity matrix and a median filter. Since the model of voice is singer independent and the model of music does not require training data, the proposed method does not require training data from a user, once deployed. Evaluation on a data set of 1,000 song clips showed that combining modeling of both sources can improve separation performance, when compared with modeling only one of the sources, and also compared with two other state-of the-art methods.

Publications

Combining Modeling of Singing Voice and Background Music for Automatic Separation of Musical Mixtures

International Society of Music Information Retrieval Conference (ISMIR)

Publication date: November 4, 2013

Zafar Rafii, Francois Germain, Dennis Sun, Gautham Mysore

Research Areas: AI & Machine Learning Audio