Teaching an AI Agent to Retouch Photos: RetouchIQ at CVPR 2026

June 9, 2026

Tags: Computer Vision, Imaging & Video, Conferences, Content Intelligence, Intelligent Agents & Assistants

Note: This post describes academic research conducted in collaboration with Adobe Research. The system described here is experimental and does not represent a current Adobe product feature. 

Key Takeaways 

  • RetouchIQ is an experimental AI editing agent that understands what you mean when you describe how a photo should look and feel, then translates that intent into precise, executable adjustments in professional photo-editing software. 
  • A new generalist reward model evaluates edits through dynamically generated, case-specific quality metrics rather than comparing against a single reference image, better reflecting the subjective nature of creative retouching. 
  • On a new 300-image benchmark (RetouchEval), RetouchIQ achieved the highest scores in both semantic consistency and perceptual quality across quality enhancement, style transformation, and local retouching tasks, outperforming all tested baselines including general-purpose multimodal models, specialized editing agents, and diffusion-based systems. 
  • A training strategy called policy-guided reward training (PGRT) aligns the reward model with the editing agent’s actual outputs, yielding more stable and accurate reward signals during reinforcement learning. 

Research by Qiucheng Wu (UC Santa Barbara; work completed during an Adobe Research internship), Jing Shi (Adobe Research), Simon Jenni (Adobe Research), Kushal Kafle (Adobe Research), Tianyu Wang (Adobe Research), Shiyu Chang (UC Santa Barbara), Handong Zhao (Adobe Research, project lead) | CVPR 2026 

📄 Read the full paper on arXiv 

The subjectivity problem 

Professional photo retouching is inherently subjective. Ask five photographers to enhance the same sunset image, and you will get five different, equally valid results. That subjectivity creates a core challenge for AI-driven editing: standard training approaches reward a model for matching a single reference edit, penalizing creative alternatives that may be just as good. 

RetouchIQ, a new experimental framework from Adobe Research and UC Santa Barbara presented at CVPR 2026, addresses this problem directly. Instead of anchoring training to one “correct” output, the system introduces a generalist reward model that evaluates edits the way a human critic might: by generating a tailored set of quality criteria for each image and instruction, then scoring the result against those criteria. 

How RetouchIQ works 

The system is built around two cooperating models. A policy editing model, built on a 7-billion-parameter vision-language backbone, reads a user’s natural-language instruction (for example, “give this scene a warm vintage feel with depth and nostalgia”) and produces two outputs: a structured editing plan, and a precise set of parameter adjustments (exposure, contrast, color temperature, and more) that can be executed directly in Adobe Lightroom. 

The second component, the generalist reward model (GRM), acts as a quality evaluator for the agent’s edits. Given the original image, the edited result, and the user’s instruction, the GRM first generates a set of evaluation metrics specific to that request. For a vintage-warmth instruction, those metrics might include color warmth, tonal consistency, and nostalgic atmosphere. It then scores the edit on each metric and produces an overall quality signal. This context-aware evaluation replaces fixed, pixel-level comparisons that fail to capture the nuance of creative work. 

Closing the distribution gap with PGRT 

Training reward models on synthetic “bad” edits (created by randomly perturbing parameters like exposure or temperature one at a time) introduces a subtle problem: those simple perturbations look nothing like the complex, combined edits the policy model actually generates. For example, a synthetic “bad” edit might only shift color temperature, while the policy model’s actual output adjusts exposure, saturation, and temperature together. The reward model learns to spot easy failures but struggles with the real outputs it needs to judge. 

To address this, the team developed policy-guided reward training (PGRT). During reinforcement learning, the reward model trains on actual outputs from the policy model rather than synthetic perturbations, keeping both models aligned. The policy model and reward model are then updated in alternating rounds, progressively improving each other. In ablation experiments, PGRT improved the policy model’s overall quality score from 6.89 (off-the-shelf reward model) to 7.51. 

Results 

Under this benchmark setup, RetouchIQ achieved the strongest results among the evaluated baselines on RetouchEval, a new benchmark of 300 instruction-image pairs drawn from real user editing histories and covering a diverse range of quality enhancement and style-oriented retouching tasks. The evaluated baselines included general-purpose multimodal large language models, specialized MLLM editing agents, and diffusion-based methods. 

On the established MIT-Adobe5K benchmark, which focuses on general aesthetic enhancement without explicit user instructions, RetouchIQ also achieved the highest SSIM (0.86) and lowest LPIPS (0.16), demonstrating strong generalization. 

Qualitative comparisons revealed characteristic failure modes in other approaches: general-purpose models tended to over-edit images, diffusion-based methods struggled to preserve original image structure, and existing editing agents missed specific style requests. RetouchIQ consistently produced results that were more closely aligned with user intent while maintaining professional quality. 

This is one of more than 75 papers that Adobe presented at CVPR 2026. Check out more of our CVPR papers from this year and previous years here.

Recent Posts