I am back at Adobe Research as a Principal Research Scientist. I was at Amazon from 2021 to 2024, at Google Research from 2018 to 2021, at Adobe Research from 2014 to 2018, at Technicolor’s Research Center from 2011 to 2014, and at Intel Research from 2006 to 2011. Before 2006, I was a graduate student in the Intelligent Systems Program at the University of Pittsburgh. My advisor was Milos Hauskrecht. My e-mail is kveton@adobe.com.
I propose, analyze, and apply algorithms that learn incrementally, run in real time, and converge to near optimal solutions as the number of observations increases. Most of my recent work focuses on applying these ideas to modern generative models and human feedback.
Seamless interaction between humans and machines is the holy grail of artificial intelligence. This problem has been traditionally studied as learning to interact with an environment, with reinforcement learning and bandits being two prominent frameworks. A bandit is a framework for adaptive supervised learning, where the agent learns to act optimally conditioned on context through repeated interactions with the environment. I made several fundamental contributions to this field. My earlier work focused on structured bandit problems with graphs, submodularity, semi-bandit feedback, and low-rank matrices. This culminated in my work on online learning to rank, where we designed bandit algorithms that can handle exponentially-large action spaces and partial feedback. These algorithms are simple, theoretically sound, robust, and remain the state of the art. My recent work focused on making bandit algorithms more practical. This involves exploration through randomization, which works well with neural networks, and reducing the statistical complexity of bandit algorithms through meta-, multi-task, and federated learning.
Recent advances in machine learning have been powered by pre-trained models that excel in many human-level tasks and can adapt to new tasks in non-traditional ways, such as in-context learning. Despite this shift, the traditional problems of exploration and statistically-efficient adaptivity remain. For instance, fine-tuning of large language models is computationally costly. This cost can be reduced by fine-tuning on fewer well-chosen informative examples. The problem of choosing these examples can be formulated and solved as an optimal design. Another example is human evaluation of models. Since human feedback is costly to collect, it is natural to reuse previously collected feedback to evaluate new models. This problem can be formulated and solved as off-policy evaluation from logged human feedback.
See my home page for the complete list of my publications.