By John Collomosse, Senior Principal Scientist, Adobe Research
This post describes academic research conducted in collaboration with Adobe Research. The system described here is experimental and does not represent a current Adobe product feature.
Generative AI is unlocking new ways to harness creativity. Type a few words, and an image appears: a robot painted like a Renaissance portrait, or a product rendered in a new setting. AI systems can synthesize ideas from vast collections of visual concepts and styles in seconds. This capability raises an important question: when AI systems compose many creative influences into one output, who should get credit?
This is the data attribution problem, a longstanding challenge in explainable AI (XAI) that we address in our CVPR 2026 paper, TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery, developed by Li Zhang (UC San Diego, advised by Pengtao Xie) during a research internship mentored by Vishal Asnani, Shruti Agarwal, and John Collomosse (Adobe Research).

Tracing influence inside generative models
Modern generative AI systems learn from enormous datasets containing billions of images. During training, these examples are compressed into internal representations that shape how the model later generates content. Recovering those influences reliably has proven extremely difficult, particularly when a generated image reflects multiple concepts simultaneously: for example, a specific artistic style combined with particular objects, compositions, or characters.
TokenTrace introduces a new approach to this challenge. Rather than attempting to inspect the entire model directly, the method embeds lightweight watermarks into training tokens associated with visual concepts. When the model later generates an image, TokenTrace can recover traces of these watermarked signals, allowing researchers to identify which concepts most strongly contributed to the output.
Unlike prior attribution systems that identify single dominant influences, TokenTrace can recover compositional attribution – tracing multiple contributing concepts within the same generated result. This better reflects how modern generative AI systems actually synthesize content: by blending many learned influences together.
Synthetic media provenance
TokenTrace is the latest in a series of papers from Adobe Research exploring how provenance technologies can improve transparency and creator attribution in generative AI systems.
The provenance of a synthetic image ultimately lies in the AI model – and in the training data that shaped it. Understanding how those influences contribute to generated outputs is therefore an important part of the broader provenance puzzle.
At CVPR 2023, the team introduced EKILA, the first system to integrate data attribution into an end-to-end framework for automatically tracing and compensating creators for contributions to generative AI training. EKILA learned a visual fingerprinting embedding that matched generated images back to influential training examples, relying on visual similarity correlations to infer responsibility. Subsequent works moved toward more causative attribution methods. ProMark (CVPR 2024) and CustomMark (ICCV 2025) introduced approaches based on invisible watermarking of training data, enabling attribution signals to persist through the generative process itself.
These systems made it possible to identify which training examples most directly influenced generated outputs. TokenTrace extends this line of work by addressing compositional attribution – recovering multiple interacting influences simultaneously within a single generated image.
Content authenticity and media provenance
This research also connects to Adobe’s broader leadership in content authenticity standards through the Content Authenticity Initiative (CAI), a cross-industry coalition created by Adobe to promote transparency and trust in digital media that now incorporates over 6000 organizations.
Adobe has played a leading role in the development of the open C2PA standard for media provenance, now widely adopted across cameras, publishing platforms, and generative AI systems to attach verifiable provenance information to digital content. While C2PA helps establish provenance for media assets themselves, research projects such as TokenTrace explore a complementary challenge: tracing the provenance of the training influences that contribute to AI-generated content.
Authenticity and Provenance in the Age of Generative AI (APAI)
Adobe researchers will also engage more broadly on these topics at CVPR through the workshop Authenticity and Provenance in the Age of Generative AI (APAI), taking place on June 3. In addition to co-chairing the workshop, the team will deliver a tutorial on watermarking technologies on June 4, followed by a keynote on provenance research at the SPAR-3D workshop later that day.
At APAI, the team will also present related work on PRISM, a privacy-preserving search technology that enables users to selectively license their image content for AI use without losing control of their data.
These research prototypes point a path toward a future in which open provenance technologies underpin a decentralized creative supply chain for generative AI. Attribution methods such as TokenTrace, which, when paired with provenance standards like C2PA, could enable consent, rights expression, and compensation to flow alongside it.
Combined with emerging industry standards for content authenticity and provenance, this work contributes to a broader vision for generative AI that is not only powerful and creative, but also transparent about the origins of its outputs and enable attribution for creators whose work helped make them possible.
This paper is one of over 75 papers Adobe is presenting at CVPR 2026. Check out more of those papers here.