An AI that Sees the Future

By Meredith Alexander Kunz, Adobe Research

New work by Adobe Research is a step forward in predicting the future.

Intelligent agents that use computer vision—such as self-driving cars or robots—could perform better if they knew what would happen next. Predicting movement or objects in the next frames of a video feed could potentially save lives or simply increase accuracy across many tasks.

Researchers have recently explored this topic on two levels—semantics (the objects in the video) and motion dynamics (how things move). A new investigation by Adobe Research scientists and university colleagues uses deep learning to unite these two kinds of predictions into a single whole—providing much more powerful results.

The work, presented in a paper at the Conference on Neural Information Processing Systems (NIPS), is the result of a collaboration between National University of Singapore scholars and Adobe Research’s Xiaohui Shen, senior research scientist, Jimei Yang, research scientist, and Zhe Lin, principal scientist.

Shen and his collaborators created two systems tapping into convolutional neural networks and trained them on a public video dataset of city street scenes. One of the systems accepts an input of the images of the previous four frames of a video and creates an output of predicted motions in the fifth (future) frame. A second system accepts an input of semantics from four previous frames, including labeled objects such as busses, cars, people, roads, street lights, sky, etc. Its output is the next frame’s predicted objects.

Here’s the novel element: In-between the two systems is a bridge, or “transform layer,” where the two networks communicate and inform each other. This way, the system in charge of semantics can learn something about how items are moving through a scene, and the movement system can be informed by the semantic data.

The logic behind this is intuitive. “If you know the motion of a group of pixels, it can help tell you what the object is,” says Shen. “Likewise, if you know about a scene’s semantics—that a specific object is a car or person instead of a road or street sign—you know more about what those objects’ motion will be. Our contribution is to connect these two kinds of information to help the network learn to predict objects in the scene and motion of those objects together.”

Results were a significant improvement over current state-of-the-art predictions that rely on just one of these elements. “By fusing motion and semantics, you get a better outcome than you would if you looked at them separately,” says Shen.

Though this work is still early-stage, the approach could be a boon for those developing self-driving vehicles, and it has many other potential uses. Researchers imagine augmented or virtual reality applications, for instance, where future scene prediction could alleviate lags in real-time video displays.

These images show the prediction results (for motion and for objects) obtained from the team’s neural networks. These results are clearer and more accurate than the previous state-of-the-art.

Contributors:

Xiaohui Shen, Jimei Yang, and Zhe Lin, Adobe Research

Xiaojie Jin, Huaxin Xiao, Jiashi Feng, and Shuicheng Yan, National University of Singapore

Zequn Jie, Tencent AI Lab

An AI that Sees the Future

February 14, 2018

These images show the prediction results (for motion and for objects) obtained from the team’s neural networks. These results are clearer and more accurate than the previous state-of-the-art.

Recent Posts

Adobe Research’s Valentin Deschaintre takes home the Eurographics Young Researcher Award

Adobe Research Scientist Valentin Deschaintre has received the prestigious 2025 Eurographics Young Researcher Award for his outstanding work at the intersection of computer graphics and computer vision.

Adobe Researchers present a powerful, unified approach to generative video editing at CVPR 2025

Adobe researchers shared a groundbreaking paper at CVPR 2025, introducing an experimental generative video propagation framework that applies edits in the first frame of a video to all the following frames in a reliable and consistent manner, all while preserving areas the user hasn’t edited.

Adobe Research at CHI: An experimental new design approach for human-AI co-creation

To go beyond simple AI prompts and outputs, Adobe Researchers, in a paper presented at CHI 2025, outlined a new approach for designing environments where humans and AI can collaborate and iterate together across an entire creative process.