SoundStager: Interactive Design of Story-Driven GenAI Soundscapes for Video

Sound effects (SFX) are critical to video storytelling by immersing viewers, directing attention, and shaping emotion. However, crafting an effective soundscape is difficult: creators must decidehow to source, place, layer, and mix sounds to support the narrative. Generative text-to-SFX tools enable users to create custom sounds, but creators often struggle to describe sounds with words and lack control over individual stems in premixed outputs. We propose SoundStager, an AI-assisted tool for designing generative soundscapes for video. SoundStager analyzes the video narrativeto create layered audio scenes (of keynote, signal, soundmark, and archetypal sounds) and supports iterative refinement through a combination of conversational and analog controls. SoundStager’s design was informed by formative studies with six professional sound designers, six video creators, and insights from sound design literature. Our user evaluation with twelve video creators shows that SoundStager enables users to quickly create satisfactory soundscapes while retaining creative control.

Learn More

Publications

SoundStager: Interactive Design of Story-Driven GenAI Soundscapes for Video

ACM Conference on Human Factors in Computing Systems (CHI 2026)

Publication date: April 13, 2026

Suhyeon Yoo, Adolfo Hernandez Santisteban, Prem Seetharaman, Justin Salamon, Oriol Nieto, Anh Truong

Research Areas: AI & Machine Learning Audio Computer Vision, Imaging & Video