Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing

Innovative Applications of Artificial Intelligence (IAAI)

Published February 4, 2017

Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, Emma Brunskill

In this paper we consider the problem of evaluating one digi- tal marketing policy (or more generally, a policy for an MDP with unknown transition and reward functions) using data collected from the execution of a different policy. We call this problem off-policy policy evaluation. Existing methods for off-policy policy evaluation assume that the transition and reward functions of the MDP are stationary—an assumption that is typically false, particularly for digital marketing appli- cations. This means that existing off-policy policy evaluation methods are reactive to nonstationarity, in that they slowly correct for changes after they occur. We argue that off-policy policy evaluation for nonstationary MDPs can be phrased as a time series prediction problem, which results in predictive methods that can anticipate changes before they happen. We therefore propose a synthesis of existing off-policy policy evaluation methods with existing time series prediction meth- ods, which we show results in a drastic reduction of mean squared error when evaluating policies using real digital mar- keting data set.

Learn More

Research Area:  AI & Machine Learning