Adobe Researchers are discovering the hidden knowledge inside AI image generation models 

October 21, 2025

Tags: AI & Machine Learning, Computer Vision, Imaging & Video, Conferences

When you type a prompt into an AI image generator, you get a batch of images back. But why those particular images? What does the model know about the visual world and what is it capable of creating, and how does it land on the variations you see? 

These are just a few of the questions PhD student Rohit Gandikota wanted to explore during his Adobe Research internship with his mentor, Adobe Senior Research Engineer Nick Kolkin. In his studies, Gandikota had already been researching concept sliders, tools that allow you to choose specific qualities of an image you want to adjust. For example, a user could create a slider that makes a landscape image appear more or less like a specific style of art. But in order to think up useful sliders, users need to have a deep understanding of which kinds of attributes a model is capable of producing and changing. Rohit and Kolkin wanted to explore tools that would be more useful for everyday users. 

The research took Gandikota, Kolkin, and several other Adobe Researchers into new territory: they set out to map the visual knowledge hidden inside diffusion models (a common type of generative AI model)—and then use that information to let users control the output. The team published their findings in a recent paper for the International Conference on Computer Vision (ICCV) 2025.  

Figuring out what AI image generation models know about the visual world 

When Gandikota and the team began their research, they were thinking like humans. To help users control their generated images, they created sliders that matched the way people might draw an image. For example, when you draw a dog, you think about drawing two eyes, two ears, a snout and a tail, so they were developing sliders to adjust those kinds of attributes.  

But as the team looked at results from generative models, they realized that the models were “thinking” in an entirely different way. Instead of types of eyes and tails, their variables seemed more like whether a dog is golden retriever-ish or pit bull-y. So the team decided to explore the variations they were naturally getting from the models instead. 

“This happens a lot with research,” says Gandikota. “You start with a goal, and you don’t get it to work, but then you start seeing something else coming out of it and you explore that direction.”  

With this new focus, the team began to think about the tremendous amount of information we don’t know about how AI models understand and create images. “These models ingest huge amounts of data that no human can comprehend,” explains Kolkin. “We wanted to know what they think a dog is, for example, and all the different ways they can visually conceive of a dog.”  

On a practical level, the team also wanted to make the generative models more controllable. “Using diffusion models often feels like playing a slot machine. Instead, we want users to feel empowered by precise control of these amazing image generators,” Kolkin adds.  

The next step was to capture images from existing models based on small categories, like dogs and toys, and larger ones like artistic styles. From there, they built a framework that could analyze the key visual variations a model is able to produce—variations like whether a dog leans more toward golden retriever or closer to pit bull. With that information, they enabled their framework to create simple sliders that control each of the key variations.  

Increasing the controllability of generative AI—and boosting diversity 

The team sees several important uses for their work. First, it can help artists understand and control their generative AI tools. “Each model is trained with visual styles, but it usually takes users weeks or months to figure them out. It can be a huge pain to play with prompts and find the recipes that get the visual effects you want for your artistic style,” explains Kolkin. “One of the use cases where our paper was really successful was discovering these styles automatically and making them immediately accessible without weeks or months of prompt exploration.” 

The technology could also benefit novice users who would never think to spend weeks or months exploring a model. “Not every user knows exactly what they want, but now we can inspire them by showing the different directions that a powerful model is capable of,” adds Gandikota.  

The team’s research also has the potential to increase the diversity of representations that come from generative models. “Models might be biased in some of their representations—for example, producing more images of male doctors than female doctors. We know that during training, the model has seen diverse data, but just not in the correct proportions. So when we train our framework on doctors, we can up-weight these visual variations that were underrepresented in the original distribution of samples,” Kolkin explains. 

This ability to increase diversity is especially promising for distilled diffusion models. These models are simplified to produce images more quickly, but that means that they lean into more common preferences—such as more golden retrievers when people ask for dogs. Sliders could allow people training the models to rebalance results, or end users could do it themselves.

How sliders could shape the future of AI image generation 

In the near term, the team plans to continue tweaking the framework to make it much faster. They imagine a future when users can type in a prompt and have helpful, custom sliders appear automatically. These could add controllability for individual users, and for developers who want to quickly correct biases in their models when they discover them.  

Beyond these practical applications, Gandikota is thinking about how the framework could change our relationship to the AI models we’ve created, but don’t fully understand. “We are building models that can do almost anything you throw at them, but we can probably extract more out of these models if we simply ask them to give us their knowledge,” he says. “My goal is to build an atlas of what a diffusion model knows and then be able to communicate with this atlas. In doing this I’m always thinking, ‘How do we enhance human capabilities with what we learn from these powerful models?’”  

Authors of the paper: Rohit Gandikota (Northeastern University, Adobe Research Intern), Zongze Wu (Adobe Research), Richard Zhang (Adobe Research), David Bau (Northeastern University), Eli Shechtman (Adobe Research), and Nick Kolkin (Adobe Research) 

Wondering what else is happening inside Adobe Research? Check out our latest news here. 

Recent Posts