By Meredith Alexander Kunz, Adobe Research
Remember those TV crime dramas where investigators “enhance” the surveillance video of a criminal, going from a grainy blur to a super-crisp image of a person’s face? What was once science fiction is now one step closer to reality with the work of Adobe Research’s Zhaowen Wang and collaborators.
“Super resolution” focuses on making a lower resolution image into a higher one. It works by adding more pixels to an image based on what’s already there. This approach can effectively turn the fuzzy results provided by prior methods into a cleaner, better-quality image.
Wang’s new work builds on image super resolution, advancing this method for upscaling not just still photos but complex videos. It’s described in the 2017 International Conference on Computer Vision paper Robust Video Super-Resolution with Learned Temporal Dynamics, authored by Wang with Ding Liu and Yuchen Fan from the University of Illinois and colleagues from Facebook, Texas A&M University, and IBM Research.
There’s a counter-intuitive breakthrough at the core of this research: Sometimes, with video, less is more.
“Our intuition tells us that more information is better—that the more we use, the better the result,” Wang points out. “So for video, you would think that the more frames you use, the higher quality result. But that is not always true, especially when you’re working with very noisy, blurry video.”
Wang explains that when the action in a video is unpredictable—say, a waving flag or a dancing child—single-frame super resolution actually provides better results than using multiple frames to try to improve video images.
The key, he says, is to adaptively determine what you’re looking at in a video, and after that, figure out the best approach to super-sizing it. For a randomly moving object, you could use a single frame; for a predictable, rigid moving object, like a car, you could use several frames; and for a static object, like a building, you could use a wider “neighborhood” of frames to support your upscaling. “We call it temporal modulation,” Wang says—and the researchers taught computers to be able to do it.
The team used deep learning to analyze the video frame first and determine the best approach for the image’s pixels. The neural network then routed the frames through separate, customized neural sub-networks, and fused them back together in a single frame.
In the end, the network figured out how to optimize upscaling, creating some near-HD videos.
The technique will not only be promising for surveillance work. It could also be excellent for entertainment, including TV, film, and graphics rendering. So those police shows’ hugely enhanced images may be possible in the future—and, with video super-resolution, you may be able to watch whole episodes of your favorite decades-old dramas in near-HD, too.
Zhaowen Wang (Adobe Research)
Ding Liu, Yuchen Fan, and Thomas Huang (University of Illinois at Urbana-Champaign)
Xianming Liu (Facebook)
Zhangyang Wang (Texas A&M University)
Shiyu Chang (IBM Research)