
Research by: Sayan Nag, KJ Joseph, Koustava Goswami, Vlad I. Morariu, Balaji Vasan Srinivasan (Adobe Research) · Presented at: AAAI 2026, Singapore
Key points:
- Our Agentic Design Review System (Agentic-DRS) is an experimental multi-agent framework from Adobe Research, presented at AAAI 2026, for holistic graphic design evaluation. Specialized AI agents assess typography, color harmony, alignment, spacing, composition, and more, coordinated by a meta-agent that produces unified scores and actionable feedback.
- The framework introduces GRAD, a graph-based exemplar selection method that retrieves contextually relevant design examples using structural matching rather than global image similarity, making the agents’ analyses design-aware rather than generic.
- A new benchmark, DRS-Bench, provides standardized evaluation covering 4 datasets, 15 design attributes including typography, color palette, alignment, and grouping, and a novel Actionable Insights Metric (AIM) for measuring the quality and usefulness of AI-generated feedback.
- Agentic-DRS outperforms GPT-4o by 12.4% on the Afixa dataset and achieves 0.834 overlap correlation with human design ratings, demonstrating both accuracy and strong alignment with expert judgment.
When a graphic designer submits work for review, a good client doesn’t just say “this doesn’t work.” They explain why the typography creates visual noise, how the color palette undermines the hierarchy, and what specific changes would improve the piece. Getting that kind of structured, actionable design feedback from an AI system is harder than it might appear, however. Agentic-DRS, an experimental framework from Adobe Research, is built to address exactly this.
The insight behind the system is borrowed from academic peer review. Just as a conference paper benefits from reviewers with different areas of expertise, a graphic design benefits from evaluation across multiple distinct dimensions simultaneously. Agentic-DRS operationalizes this idea through a multi-agent architecture: specialized agents assess typography, color harmony, alignment, spacing, composition, image-text alignment, and more. A meta-agent orchestrates the process and synthesizes individual findings into a unified score and a set of actionable recommendations.
Two technical contributions make this work. The first is GRAD, a Graph-based Design exemplar selection method that retrieves contextually relevant examples to inform the agents’ analyses. Unlike approaches that rely on global visual similarity, GRAD constructs graph representations encoding semantic, spatial, and structural relationships between design elements, then uses Wasserstein and Gromov-Wasserstein distances to match both node-level features and edge-level topology. In practice, this means the system finds examples that share similar structural logic, not just a similar visual appearance. The second contribution is Structured Design Description (SDD), which grounds each agent’s analysis in explicit textual descriptions of design elements and their hierarchical relationships, reducing hallucinations and enabling more precise feedback.
To measure progress in this area, the team also introduces DRS-Bench, a new benchmark purpose-built for design review evaluation. DRS-Bench spans four datasets covering GDE, Afixa, Infographic, and an internal design collection, assesses 15 design attributes, and introduces a novel Actionable Insights Metric (AIM) that evaluates not just whether a system correctly identifies a design issue, but whether its feedback is genuinely useful to a designer.
The experimental results are strong across the board. Agentic-DRS outperforms GPT-4o by 12.4 percentage points on the Afixa benchmark, achieves 76.8% accuracy on an internal design dataset, and correlates with human ratings at 0.834 on GDE. Ablation experiments confirm that both GRAD and SDD contribute meaningfully, and that the multi-agent structure outperforms single-model approaches even when the same underlying model is used.
This research opens a new direction for automated design evaluation. For platforms and tools that handle graphic content at scale, it points toward the possibility of structured, expert-calibrated design feedback that is faster, more consistent, and more accessible than purely manual review.
Project details and a link to the paper are available here.
