Sparse Overcomplete Decomposition for Single Channel speaker Separation

In Proc. of the IEEE International Conference on Audio and Speech Signal Processing (ICASSP)

Published November 17, 2007

M. Shashanka, B. Raj, Paris Smaragdis

We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis [6]. The idea is to extract certain characteristic spectro-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear com- binations of these learned bases. In other words, their model ex- tracts a compact code of basis functions that can explain the space spanned by spectral vectors of a speaker. In our model, we generate a sparse-distributed code where we have more basis functions than the dimensionality of the space. We propose a probabilistic frame- work to achieve sparsity. Experiments show that the resulting sparse code better captures the structure in data and hence leads to better separation.

Learn More

Research Areas:  AI & Machine Learning Audio