Partial least squares based speaker recognition system

Snowbird Learning Workshop

Published April 13, 2011

Balaji Vasan Srinivasan, D. Zotkin, R. Duraiswami

Partial least squares (PLS) methods have been successfully applied for several pattern recognition tasks in chemometrics, computer vision and medical imaging. PLS can been used for regression, classification and supervised dimensionality reduction. In this work, we present an efficient PLS framework for text-independent speaker recognition. Speaker recognition deals with the task of verifying a speaker’s identify based on the corresponding voice samples. The key challenge here is to model speaker characteristics with limited target-speaker training samples (ranging from a few seconds to a couple of minutes of data) that are susceptible to variabilities that are not speaker-specific (message, channel, noise). The features of choice for discriminative speaker training is a combination of features derived from the mel-frequency cepstral coefficients and cortical features, which are combined in to a high dimensional “supervector” based on Gaussian Mixture Models (GMMs). We use these supervectors to learn a speaker-specific PLS-subspace where the target speaker is well-separated from non-target speaker. To treat the large datasets encountered in speaker recognition, we employ a novel extension of the PLS on graphical processors which obtains a ~30X speedup. The system was tested on NIST SRE 2008 data for various training and test conditions, and the results are promising.

Learn More

Research Areas:  AI & Machine Learning Audio