Branislav Kveton

Principal Research Scientist

San Jose

I am back at Adobe Research as a Principal Research Scientist. I was at Amazon from 2021 to 2024, at Google Research from 2018 to 2021, at Adobe Research from 2014 to 2018, at Technicolor’s Research Center from 2011 to 2014, and at Intel Research from 2006 to 2011. Before 2006, I was a graduate student in the Intelligent Systems Program at the University of Pittsburgh. My advisor was Milos Hauskrecht. My e-mail is kveton@adobe.com.

I propose, analyze, and apply algorithms that learn incrementally, run in real time, and converge to near optimal solutions as the number of observations increases. Most of my recent work focuses on applying these ideas to modern generative models and human feedback.

Seamless interaction between humans and machines is the holy grail of artificial intelligence. This problem has been traditionally studied as learning to interact with an environment, with reinforcement learning and bandits being two prominent frameworks. A bandit is a framework for adaptive supervised learning, where the agent learns to act optimally conditioned on context through repeated interactions with the environment. I made several fundamental contributions to this field. My earlier work focused on structured bandit problems with graphs, submodularity, semi-bandit feedback, and low-rank matrices. This culminated in my work on online learning to rank, where we designed bandit algorithms that can handle exponentially-large action spaces and partial feedback. These algorithms are simple, theoretically sound, robust, and remain the state of the art. My recent work focused on making bandit algorithms more practical. This involves exploration through randomization, which works well with neural networks, and reducing the statistical complexity of bandit algorithms through meta-, multi-task, and federated learning.

Recent advances in machine learning have been powered by pre-trained models that excel in many human-level tasks and can adapt to new tasks in non-traditional ways, such as in-context learning. Despite this shift, the traditional problems of exploration and statistically-efficient adaptivity remain. For instance, fine-tuning of large language models is computationally costly. This cost can be reduced by fine-tuning on fewer well-chosen informative examples. The problem of choosing these examples can be formulated and solved as an optimal design. Another example is human evaluation of models. Since human feedback is costly to collect, it is natural to reuse previously collected feedback to evaluate new models. This problem can be formulated and solved as off-policy evaluation from logged human feedback.

See my home page for the complete list of my publications.

Publications

Cascading Linear Submodular Bandits: Accounting for Position Bias and Diversity in Online Learning to Rank

Hiranandani, G., Singh, H., Gupta, P., Burhanuddin, I., Wen, Z., Kveton, B. (Jul. 22, 2019)

Conference on Uncertainty in Artificial Intelligence (UAI)

Predictive Analysis by Leveraging Temporal User Behavior

Chen, C., Kim, S., Bui, H., Rossi, R., Kveton, B., Koh, E., Bunescu, R. (Oct. 22, 2018)

CIKM'18

Offline Evaluation of Ranking Policies with Click Models

Li, S., Abbasi-Yadkori, Y., Kveton, B., Muthukrishnan, S., Vinay, V., Wen, Z. (Aug. 19, 2018)

ACM International Conference on Knowledge Discovery & Data Mining (KDD)

Bernoulli Rank-1 Bandits for Click Feedback

Katariya, S., Kveton, B., Szepesvari, C., Vernade, C., Wen, Z. (Aug. 19, 2017)

Proceedings of 26th International Joint Conference on Artificial Intelligence (IJCAI)

Online Learning to Rank in Stochastic Click Models

Zoghi, M., Tunys, T., Ghavamzadeh, M., Kveton, B., Szepesvari, C., Wen, Z. (Aug. 6, 2017)

Proceedings of International Conference on Machine Learning (ICML) 2017

Model-Independent Online Learning for Influence Maximization

Vaswani, S., Kveton, B., Wen, Z., Ghavamzadeh, M., Lakshmanan, L., Schmidt, M. (Aug. 6, 2017)

Proceedings of International Conference on Machine Learning (ICML) 2017

Get to the Bottom: Causal Analysis for User Modeling

Zong, S., Kveton, B., Berkovsky, S., Ashkan, A., Wen, Z. (Jul. 9, 2017)

Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (UMAP)

Stochastic Rank-1 Bandits

Katariya, S., Kveton, B., Szepesvari, C., Vernade, C., Wen, Z. (Apr. 20, 2017)

20th International Conference on Artificial Intelligence and Statistics

Does Weather Matter? Causal Analysis of TV Logs

Zong, S., Kveton, B., Berkovsky, S., Ashkan, A., Vlassis, N., Wen, Z. (Apr. 3, 2017)

26th International World Wide Web Conference

Graphical Model Sketch

Kveton, B., Bui, H., Ghavamzadeh, M., Theocharous, G., Muthukrishnan, S., Sun, S. (Sep. 19, 2016)

European Conference on Machine Learning and Knowledge Discovery in Databases

Minimal Interaction Content Discovery in Recommender Systems

Kveton, B., Berkovsky, S. (Jul. 31, 2016)

ACM Transactions on Interactive Intelligent Systems 6

Practical Linear Models for Large-Scale One-Class Collaborative Filtering

Sedhain, S., Bui, H., Kawale, J., Vlassis, N., Kveton, B., Menon, A., Bui, T., Sanner, S. (Jul. 9, 2016)

25th International Joint Conference on Artificial Intelligence

Cascading Bandits for Large-Scale Recommendation Problems

Zong, S., Ni, H., Sung, K., Ke, N., Wen, Z., Kveton, B. (Jun. 25, 2016)

32nd Conference on Uncertainty in Artificial Intelligence

DCM Bandits: Learning to Rank with Multiple Clicks

Katariya, S., Kveton, B., Szepesvari, C., Wen, Z. (Jun. 19, 2016)

33rd International Conference on Machine Learning

Combinatorial Cascading Bandits

Kveton, B., Wen, Z., Ashkan, A., Szepesvari, C. (Dec. 7, 2015)

Advances in Neural Information Processing Systems 28

Efficient Thompson Sampling for Online Matrix-Factorization Recommendation

Kawale, J., Bui, H., Kveton, B., Tran-Thanh, L., Chawla, S. (Dec. 7, 2015)

Advances in Neural Information Processing Systems 28

Optimal Greedy Diversity for Recommendation

Ashkan, A., Kveton, B., Berkovsky, S., Wen, Z. (Jul. 25, 2015)

24th International Joint Conference on Artificial Intelligence

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

Wen, Z., Kveton, B., Ashkan, A. (Jul. 6, 2015)

32nd International Conference on Machine Learning

Cascading Bandits: Learning to Rank in the Cascade Model

Kveton, B., Szepesvari, C., Wen, Z., Ashkan, A. (Jul. 6, 2015)

32nd International Conference on Machine Learning

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Kveton, B., Wen, Z., Ashkan, A., Szepesvari, C. (May. 9, 2015)

18th International Conference on Artificial Intelligence and Statistics

Minimal Interaction Search in Recommender Systems

Kveton, B., Berkovsky, S. (Mar. 29, 2015)

20th ACM Conference on Intelligent User Interfaces

Structured Kernel-Based Reinforcement Learning

Kveton, B., Theocharous, G. (Jul. 14, 2013)

Association for the Advancement of Artificial Intelligence (AAAI) 2013.

Kernel-Based Reinforcement Learning on Representative States

Kveton, B., Theocharous, G. (Jul. 14, 2012)

Association for the Advancement of Artificial Intelligence (AAAI) 2012.