Reinforcement Learning with a Dynamic Action Set

Workshop on Continual Learning (NeurIPS 2018)

Published December 7, 2018

Yash Chandak, Georgios Theocharous, James Kostas, Philip Thomas

Reinforcement learning has been successfully applied to many sequential decision making problems, where the set of possible actions (possible decisions) is fixed. However, in many real-world settings, the set of possible actions can change over time. We present a model-free method to continually adapt to a dynamic set of possible actions. We show how a policy can be decomposed into an internal policy that acts in a space of action representations and a reward-independent component that transforms these representations into actual actions. These representations not only make the internal policy parameterization invariant to the cardinality of the action set, but also improve generalization by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to autonomously adapt to this dynamic action set by exploiting structure in the space of actions using supervised learning while learning the internal policy using policy gradient. The efficacy of the proposed method is demonstrated on large-scale real-world continual learning problems.