Adobe Researchers present a powerful, unified approach to generative video editing at CVPR 2025

When artists want to edit an image, they only need to make changes to one frame. But video editors often have to make painstaking, frame-by-frame edits to track and reflect changes to objects that move and morph. That’s where new work from Adobe Research comes in.

In a groundbreaking paper presented at the 2025 Conference on Computer Vision and Pattern Recognition (CVPR), researchers shared a novel generative video propagation framework that applies edits in the first frame of a video to all the following frames in a reliable and consistent manner, all while preserving areas the user hasn’t edited.

“It’s much harder to achieve high quality in generative video editing models compared to image editing models, but our method finds a space between image and video editing—the user can simply edit the first frame as they would edit any image and then they can propagate it to the rest of the frames,” explains Soo Ye Kim, Research Scientist and one of the paper’s authors. “We discovered that we can achieve very high-quality video edits with this method.”

The research behind generative video propagation

It all began last summer when Adobe Research intern Shaoteng Liu kicked off a summer research project on video editing. Liu and several of his mentors, Kim and Research Scientists Tianyu (Steve) Wang and Jui-Hsien Wang—all of whom were once Adobe interns themselves—set a goal to develop technology that could automatically remove an object and its visual traces (such as shadows and reflections) from a video.

The team began with an image-to-video generation model (a common task in modern video generation) that already had a strong understanding of how to create natural video from an image, so it could be used to propagate changes from the first frame throughout the video. Then the challenge was to preserve the information in the rest of the original video, which the team achieved by adding a selective content encoder on top.

With these two elements in place, they developed a framework that lets a user, for example, remove a dog and its shadow from the first frame. Then the tool removes the dog and its shadow throughout, even as the dog moves, and even if the dog steps out of the frame and then returns.

To train the model, the team created synthetic data. “We gave it pairs of videos. One was the original video, and for the other we’d do something like carve out a human from another video and paste it in. Then we’d let the model see the composited video and use the original one as the ground truth,” says Jui-Hsien Wang. “The surprising thing was that what we pasted was random. It didn’t have the same lighting as the original video, and it had artifacts on the edges, and the camera wasn’t consistent. All of these things made it very different from what a realistic video would look like if you pasted an object in—but the model was powerful enough to distinguish the task.”

From the earliest experiments, the model surprised the team. “It learned that it needed to propagate the difference in the first frame and, from there, it was able to do all kinds of different tasks, and even combinations of tasks, even though they were not included in the training. For example, we could expand the video frame with outpainting, insert an object, and edit an object all at the same time. All of these are very complex tasks that are hard to achieve with previous methods. But the model could handle them all because it’s such a general framework,” explains Kim. “And it’s not just an improvement—it can do things that were not possible with traditional frame-by-frame methods.”

The team credits their intern, Liu, for pushing the research beyond simply removing objects toward a framework that can propagate nearly any type of edit, from removing an object and its visual traces, to substantially changing the shape of an object, inserting objects with their own independent motion, and tracking objects and the effects of objects together.

“At the beginning, my vision was quite modest. I simply wanted to create a solid video editing tool, starting with object removal. But after using Adobe’s editing products, I realized that keyframe-based editing is one of the most user-friendly approaches. This inspired me,” remembers Liu.

“Then, one of the most exciting moments came when we asked ourselves: what would happen if we applied arbitrary edits to an object in the first frame, such as painting it a solid red color? To our surprise, the model could propagate the solid color block across the entire video, effectively performing what is known as object tracking in computer vision. Even more excitingly, the model could also track shadows, reflections, and other contextual elements, allowing for highly flexible and creative video editing possibilities,” Liu explains.

How generative video propagation could help Adobe users

From the beginning of their research, the team was thinking about how their work might eventually impact Adobe products and their users.

“For me, the most important goal was to reduce the burden and cost for content creators,” says Steve Wang. “Traditional editing takes a lot of time and manual work that can be very boring, but it’s crucial for achieving high-quality, seamless edits. Our work eases these challenges. It saves time, and you don’t have to deal with a lot of details to achieve good results.”

Jui-Hsien Wang added, “In the past few years, Adobe has shipped great image features like Generative Fill in Photoshop, and it’s natural that our users are also thinking about videos, too—a lot of times they create both kinds of media. So one great thing about this work is that it leverages the power we have in the image editing model and brings that into a video model.”

Authors of the paper: Shaoteng Liu, Steve (Tianyu) Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang,
Joon-Young Lee, Yijun Li, Bei Yu, Zhe Lin, Soo Ye Kim*, and Jiaya Jia* (* co-corresponding authors)

Wondering what else is happening inside Adobe Research? Check out our latest news here.

Adobe Researchers present a powerful, unified approach to generative video editing at CVPR 2025

June 12, 2025

Tags: AI & Machine Learning, Computer Vision, Imaging & Video, Conferences

Recent Posts

The Story of Project Dub Dub Dub: From Adobe MAX Sneak to cutting-edge AI translation tool inside Adobe Firefly

First unveiled as Project Dub Dub Dub at Adobe Max 2023, this groundbreaking AI technology translates a video while matching the speaker's voice, tone, and cadence.

Adobe Researcher brings a scientific perspective to UK policymakers tackling disinformation and deepfakes

This spring, John Collomosse, Sr.

Adobe Research’s Valentin Deschaintre takes home the Eurographics Young Researcher Award

Adobe Research Scientist Valentin Deschaintre has received the prestigious 2025 Eurographics Young Researcher Award for his outstanding work at the intersection of computer graphics and computer vision.