Research Scientist Jianming Zhang’s work has contributed to some of the most popular AI-powered tools in Photoshop and Lightroom, and now he’s helping to build the next generation of Adobe’s generative AI tool, Firefly. At the heart of his research is the quest to understand how humans see and understand images—knowledge that helps him build new technologies for editing them.
We talked to Zhang—who began his work at Adobe Research as an intern—about what first sparked his research interests, a few of the things he’s learned about turning research into product features, and where he thinks AI tools could be headed next.
How did you first get interested in solving problems with computer vision and machine learning?
WhenI started my PhD, I was working on saliency detection. Saliency is about what we pay attention to in an image—humans tend to focus on certain things, like the most colorful part of a scene. So my research and PhD thesis were about how to model this human behavior.
There’s a lot of ambiguity in this work, which is one of the reasons it interests me. You need to use humans as the ground truth to measure how well your algorithm works.
Saliency is also interesting because it has a lot of applications. For example, you can save computation costs by allocating the majority of computational resources to the area where the human eye will focus. And here at Adobe Research, I’ve used saliency detection to help build new image editing tools, such as automatic image cropping and foreground masking.
Can you tell us more about what you’ve been working on at Adobe Research?
When I came to Adobe, I started working on problems related to segmentation and masking. The traditional way of doing segmentation involves a lot of manual effort, but it is very time consuming and doesn’t always work very well. So we developed tools that use deep learning technology to automate this process. We’ve developed subject masking, sky masking, and object masking.
With this masking technology, we created some of the first AI-powered features in Adobe products, including Sky Replacement, Select Subject, and the Object Selection tool.
Can you tell us about the process behind building one of these tools? What was it like, as a researcher, to partner with a product team?
Sky Replacement, which started several years ago from a MAX sneak, is one of the most memorable projects I’ve worked on. It gave me a chance to learn how to work deeply with the product teams—because it’s not like you just hand over a paper to the team and they build everything. It’s about all of the details, from data collection to engineering to development and evaluation, and a lot of post-processing to get the best possible outputs from our algorithm.
In the process of creating Sky Replacement, I collaborated with the product’s applied research team on collecting data, training and evaluating the models, and devising post-processing algorithms based on product requirements. And, as the product team and customers evaluated the feature, I improved the models to meet their needs.
Any tech transfer can be a very long journey with challenges along the way. With Sky Replacement, whenever we hit difficulties, the whole team kept motivated to work on the problems. I was so thankful for all of my collaborators—as I have been through all of the tech transfers I’ve been part of. A lot of credit goes to them. Ultimately, Sky Replacement became a very popular feature, so I’m proud of what we accomplished.
What’s one thing you think researchers need to know if they want to develop products?
One thing that really interests me, as a machine learning researcher, is that we always think we’re achieving very good quality based on academic benchmarks. But actually, from the user perspective, our algorithms need improvement beyond those benchmarks. To keep making things better, we need to understand users’ perspectives and workflows.
One way to do this is to use the products ourselves. Right now, I use Lightroom to do photography. One new tool we’ve worked on that I’m personally excited about is Lens Blur—which allows you to blur the background of an image. Under the hood, it uses machine learning to estimate depths, along with a rendering algorithm to simulate a very expensive lens effect. It’s an early access feature that’s out now in Lightroom, and I think people will have a lot of fun playing with it. By using this feature myself for my photo editing, I obtain a better sense of how well this feature works in a real-life image editing workflow and in what direction we should further improve the underlying technology in the next version.
You’ve been working on generative AI. Can you tell us about what you’re up to, and what you think we’ll see next?
Yes, currently I’m focused on image generation technology for Firefly, Adobe’s tool for generating images from text. I’m very excited about some new innovations that we’ve been working on for the next version, though I can’t share the details yet.
Over the longer term, I’m looking forward to the next wave, where I think we’ll be able to use different kinds of cues to specify what we want from generative AI. And in the future, we’ll be able to generate more modalities, including 3D assets and videos, too. To me, the really exciting part is connecting all these modalities together.
You got your start at Adobe Research as an intern. What has that journey been like?
During my Adobe internships, I worked on projects such as deep learning for image search. It was a great, productive experience because the culture here is very transparent and everyone welcomes collaborations and encourages publishing. During my internships, I published several papers that later became part of my thesis.
Now I mentor interns at Adobe. Working with them and seeing them grow is a very fulfilling experience. I’ve had many long-term working relationships with students, and it’s a pleasure to see them expand their skills and become more independent.
Interested in joining our innovative team? Adobe Research is hiring! Check out our openings for full-time roles and for internships.