Domain-specific adaptation of a partial least squares regression model for loan defaults prediction

5th International Workshop on Domain Driven Data Mining (DDDM) at the International Conference on Data Mining (ICDM)

Published December 11, 2011

Balaji Vasan Srinivasan, N. Gnanasambandam, S. Zhao, R. Minhas

Loan management agencies monitor several loan related attributes for tracking the condition and quality of their financial portfolios. If the trend of loan related status is understood well, the agency would be able to proactively take actions to avoid prolonged delinquency and loan defaults. If an early warning system is available to predict the risk with a loan well-ahead of time, the agency can potentially take corrective measures to prevent the loan from defaulting. In this paper, we use a partial least squares (PLS) regression to model the status of a loan quantized to a non-linear scale of 0 to 100 (where the severity function is built with inputs from domain experts). We use the associated "Variable Influence on Projection" or VIP scores to select the useful variables for better prediction. In order to address the imbalance in the categories of the observed records (typically the number of low risk records are much more than the risky records), we propose a multi-PLS model for loan prediction. We further enhance the model outputs based on certain domain- specific indicator variables. The resulting model shows improved predictive capacity against a direct application of the PLS model.

Learn More

Research Area:  AI & Machine Learning