Adobe Researchers use puzzles to teach an AI model to retouch photos

August 12, 2025

Tags: AI & Machine Learning, Graphics (2D & 3D), Intelligent Agents & Assistants

We all take photos with our phones and cameras—and those photos often need a few touchups for lighting or other details. But getting a good result can be a challenge if you’re not trained on all of the features your editing tool has to offer. And if you give your image to a generative model, you might get something back that doesn’t even feel like your photo anymore.  

So Adobe Researchers, in collaboration with University College London, developed a new, data-efficient approach: using carefully crafted puzzles, they trained an AI agent to identify the things a user might want to fix in an image, explain each of the steps, and let the user choose which edits to make. This could help novice users retouch their images without sacrificing control, and it can be tailored to professionals’ own style so they could edit batches of images more efficiently. 

The team, consisting of Principal Scientists Duygu Ceylan and Niloy Mitra along with first-year PhD student Nildari Dutt, presented this work in a new paper at this year’s prestigious SIGGRAPH conference. 

The research behind a new, AI-assisted approach to photo retouching 

Ceylan and Mitra’s research began when they discovered that existing multimodal large language models (MLLMs) weren’t very good at retouching images. “They were always doing the same types of edits without taking things like the content of the image, or the tools they had access to, into account. I just couldn’t make the models do the things I wanted for my personal collection of photos,” remembers Mitra.  

The existing MLLMs had been trained on millions of images, but the team realized that these models hadn’t ever learned how to do basic photo retouching operations, or even which kinds of operations are available in a given editing application. 

One way to solve the problem would have been to gather another enormous set of images to further train the MLLM, but the team wanted to try a different, more data-efficient approach. They decided to devise a series of puzzles that could teach the MLLM the steps to retouching an image.  

“The idea was to approach this project almost like we were teaching students,” says Mitra, who’s also a university professor. “We designed exercises to allow the MLLM to build up an understanding of the tools and methods. It was kind of like teaching a child how to add—it suffices to teach them how to add any two numbers, even though they’ll be using the skill in very specific scenarios later on.” 

Creating useful puzzles is as much of an art as it is a science—and the team had to do a lot of trial and error to find the right ones. In the end, they landed on two sets of puzzles. The first helped the model learn the individual operations available within an editing application, such as Photoshop. The second set was more advanced, teaching the model to tackle complex retouching projects using multiple steps and tools. 

“It was very exciting to see that we had to come up with visual puzzles to teach the MLLM how to enhance photos,” remembers Ceylan. “This was quite different than a standard machine learning training where you give the model an image and ask it to predict the enhancement parameters directly. Instead, we had to teach the MLLM to reason about the input image and what each enhancement tool would change.” 

One of the team’s priorities was to make sure users keep control of the photo retouching process. This is especially important because photo editing is very subjective—users’ stylistic preferences can vary quite a lot. To allow for a user’s personal style, the agent offers a list of suggested steps, and the user chooses whether or not to take them. And if the agent adjusts something too much, the user can tone it down and still run the rest of the edits.  

The agent can also be trained on a user’s own photo collection, which allows for even deeper personalization. “This is possible because the method only needs thousands, not millions of example images. Once the model is trained, a professional photographer, for example, could use it to retouch all of the images from a large photo shoot, see the results, and have the option to go back and override any of the changes,” explains Mitra. If users choose to train the model on their own data, they can do it themselves without having to share the data with anyone else.  

In addition to control, the team also wanted to be sure their agent preserved the original identity of an image. With a standard generative approach, an algorithm can change everything about an image.  

“That’s too much freedom,” says Mitra. “Compare that to Photoshop, where you can only change things using a certain library of tools. Those tools have been highly optimized—people have thought a lot about them to make sure they are useful. So one way to constrain the edits and help avoid identity loss is to only allow the MLLM to use a specific set of tools, such as the ones inside Photoshop.” 

Testing the results—and looking forward 

Since good photo retouching is in the eye of the beholder, the team had to be thoughtful about how they tested their results. They conducted two separate user studies, one with novices and one with experts, asking participants to rate their retouching results compared to other methods.  

Users answered two types of questions: Does the model preserve the identity of the image? Is it still the same image? And, do you like the result better than the original? 

They found that both novices and experts ranked the new model’s results higher than all of the other commonly available options. 

“Our approach isn’t a method in itself—it’s only as good as the tool the model is using. But we’re very excited about what it can do. We hope that, when people come to our agent with their not-so-good images, they’ll be as pleased as we were when we got really good results.”  

“Photo enhancement is one of the many creative tasks where we can leverage the power of MLLMs for reasoning and planning. I hope we can explore similar tech to help our users with other creative tasks,” adds Ceylan. 

Authors of the paper: Niladri Shekhar Dutt (University College London), Duygu Ceylan (Adobe Research), and Niloy J. Mitra (Adobe Research and University College London) 

Wondering what else is happening inside Adobe Research? Check out our latest news here. 

Recent Posts