Vidmento: Filling the gaps in your video story with generative AI 

April 3, 2026

Tags: Computer Vision, Imaging & Video, Conferences, Human Computer Interaction

Research by: Catherine Yeh (Harvard University), Anh Truong (Adobe Research), Mira Dontcheva (Adobe Research), Bryan Wang (Adobe Research) Presented at: CHI 2026, Barcelona
This research was conducted during Catherine Yeh’s internship at Adobe Research. Note: This is academic research being presented at a peer-reviewed conference. It is not a product feature or capability.

Key points: 

  • Vidmento is an experimental AI-assisted video authoring tool from Adobe Research, accepted at CHI 2026, that helps creators build complete video stories by blending their own captured footage with contextually generated clips — preserving narrative continuity and creative intent throughout. 
  • The tool introduces “generative expansion,” a framework that analyzes existing footage, surfaces narrative gaps, and generates media that fits stylistically and narratively with the surrounding material. 
  • Creators work across two linked views — a canvas for visual sequencing and a script editor for voiceover narrative — giving high-level story overview and fine-grained control in a single environment. 
  • In a study with 12 video creators, Vidmento supported narrative development and exploration in ways participants described as difficult or impossible to achieve with existing workflows. 

Every video creator knows the feeling: you have footage from a trip, an event, a moment, but the story you want to tell has gaps. A transition that needs something more. A scene that doesn’t exist in your library. A shot you wish you’d taken. Vidmento, a experimental project from Adobe Research presented at CHI 2026, tackles exactly this problem. 

The core concept is what the researchers call “generative expansion” — a framework that treats captured footage and AI-generated clips as complementary rather than competing materials. Rather than asking creators to start from a text prompt, Vidmento starts from what they already have. It analyzes the content and context of existing shots, identifies narrative gaps in the developing story, and proactively suggests what could go there. The generated clips are designed to blend with the surrounding material in style and narrative context — so the join between real and generated is a deliberate creative choice, not a visible flaw. 

The interface is built around how stories are actually assembled. A canvas gives a spatial overview of the story as scenes and shots, and shows where there are gaps in captured material. In the canvas, creators can sequence, compare, and expand their video story combining captured and generated images and video. A linked script editor supports voiceover writing with Socratic-style prompts, nudging creators to think about what the story is missing rather than simply filling it.  Captured footage is visually distinguished from generated clips throughout, keeping the hybrid nature of the story transparent at every stage. 

A key design principle is that Vidmento scaffolds rather than automates. Every AI-powered feature, including scene grouping, script suggestions, story variations, visual sequencing, shot generation, is framed as a suggestion. Creators accept, reject, or adapt each one. When participants in the 12-person study reflected on their experience, they consistently noted that working with Vidmento amplified their own creative voice rather than overriding it. Participants produced travel vlogs, wedding reels, and personal essays — videos they described as difficult or impossible to achieve with existing workflows. 

The research also surfaces the boundaries of this approach: where blending captured and generative footage works naturally, and where it requires more creator judgment to maintain coherence. Vidmento is designed as a tool for story construction — producing an initial rough cut that establishes narrative structure — leaving final refinement to the creator’s own process and preferred tools. 

Vidmento is a collaboration between Adobe Research and Harvard University. The paper will be presented at CHI 2026 in Barcelona. The full paper is available on arXiv at arxiv.org/abs/2601.22013

Recent Posts