Publications

Evaluating LLMs in Experiential Context: Insights from a Survey of Recent CHI Publications

CHI 2025 - Workshop on Human-centered Evaluation and Auditing of Language Models

Publication date: April 26, 2025

Christine Dierk, Jennifer Healey, Doga Dogan

The rise of large language models (LLMs) has had far reaching effects across multiple fields, requiring evaluation strategies to assess their impact. In contrast to the framework of quantitative benchmark-based evaluations typically used at AI conferences, evaluating LLMs for human computer interaction requires more nuanced consideration as LLM "performance" in this arena is inherently human-centered and often bespoke to the experiential context. This paper provides a set of insights distilled from a survey of 23 papers recently published at CHI and suggests a lens through which to view HCI LLM evaluation strategies. We discuss the challenges of evaluating LLMs in HCI and provide suggestions to help increase interdisciplinary rigor.

Learn More