With the new experimental technology MotionStream, video creators can interact with AI-generated video while it’s being created, directing the movement of objects and changing camera angles in real-time with the simple use of their cursor and sliders. With reduced latency and more control, MotionStream enables a new level of intuitive AI video exploration for creative workflows.
Adobe researchers have published their work on MotionStream and now they’re offering a preview to the public.
“I see MotionStream as a big change in how people could control video in the future,” says Eli Shechtman, Senior Principal Scientist and one of the researchers behind MotionStream.
The MotionStream experience—quick and controllable with natural movement built in
With current generative AI video tools, a user enters a text prompt, clicks, and then waits tens of seconds, even a minute, for the tool to produce or edit a video clip. Then, each new generation means starting over, which makes it difficult to control details and experiment with changes. Not to mention how repeated pausing, waiting, and beginning again breaks creative flow.
MotionStream solves these challenges by providing immediate visual feedback—creators guide and refine a video as it’s being generated. They begin with a text prompt, and from there, they can click and drag objects to control their movement and adjust the camera location. Users can even choose which elements should move and which should remain static. The results of their edits unfold in real-time.
“There’s always this kind of joy when you’re interacting with this technology and seeing what it does,” says Senior Research Scientist and MotionStream collaborator Richard Zhang. “For example, you can slosh water around or take an object and rotate it in 3D by moving two control points at one time.”
The powerful model behind MotionStream also captures physics and natural movement in the world. “That’s where a lot of the magic happens—in the secondary effects that are really hard to control manually. If you want to move an elephant, for example, you can click and move its body, but it’s a lot of work to manually make those movements look natural. This currently requires skills and specialized software to rig, and animate or keyframe the animation, following a process that typically takes hours, if not days depending on scope. Instead, the underlying video generator behind MotionStream is basically simulating the world in real time. So, the elephant’s legs move naturally, and the ears flap naturally as the elephant moves. The model provides you with knowledge about the world and you can interact with it,” says Shechtman.
New paradigm offers new editing possibilities
The approach behind MotionStream represents a new paradigm for generative video, shifting from delayed rendering to real-time interaction, giving creative professionals additional speed, responsiveness, and control. Shechtman even thinks the technology could change how people edit images in the future.
“Once video becomes interactive, your canvas could be a video that’s always running. When you interact with it, you see a smooth video changing toward the edit you’ve specified. You can watch the transition, and you could even stop it in the middle if you like the intermediate result. There’s big promise here for both image and video.”
The research behind MotionStream
MotionStream grew out of years of work inside Adobe Research, where the team helps move cutting edge technology forward and then translates their findings into new tools for creatives. In the case of MotionStream, the earliest work began with image generation.
“Early image generation was very slow, so we developed technology to speed it up. Instead of waiting for seconds for an image, you could get the result in real-time. That innovation has helped power our video generation work as well,” says Zhang.
To further speed up AI video generation, the team broke down the process for producing the videos. Early generation models created an entire video before serving it to users – each frame would look at every other frame, the future depending on the past, but the past also depending on the future. While this helped generation quality, “knowing both the past and future isn’t how the universe works. We removed that constraint,” explains Zhang.
The researchers developed a method that could generate a video in pieces, where the future frames of the video only depend on what’s already been created, known as an “autoregressive” backbone. While maintaining generation quality under such context limitations is difficult, the culmination of the innovations from previous years, along with new techniques, enabled the model to sustain high quality under tighter constraints.
As users were watching their first piece, the tool would generate the second piece behind the scenes, making it possible to show a generated video to the user in a streaming, real-time fashion.
“The natural next step, once we started breaking videos into pieces,” says Zhang, “was to ask for feedback from users as the video is being generated. That’s what brought us to MotionStream. It’s the fruit of a long line of research.”
The future of AI-powered creative tools
Through advances like MotionStream, Adobe’s researchers continue to push the boundaries of AI while creating faster, more responsive tools that push the envelope for creative professionals.
Wondering what else is happening inside Adobe Research? Check out our latest news here.