Research by: Oindrila Saha (UMass Amherst), Vojtech Krs (Adobe Research), Radomir Mech (Adobe Research), Kevin Blackburn-Matzen (Adobe Research), Matheus Gadelha (Adobe Research), Subhransu Maji (UMass Amherst) · Published at: ICLR 2026, Rio de Janeiro · This research was conducted during Oindrila Saha’s internship at Adobe Research. Note: This is academic research being presented at a peer-reviewed conference. It is not a product feature or capability.
Key points:
- SIGMA-GEN is an experimental unified framework from Adobe Research, presented at ICLR 2026, for multi-identity-preserving image generation — placing multiple subjects into a scene in a single pass, with each subject’s appearance faithfully preserved.
- It is the first method to enable single-pass multi-subject generation guided simultaneously by both identity and spatial constraints, supporting inputs from coarse 2D or 3D bounding boxes to pixel-level segmentation masks and depth maps — all with one model.
- The team introduces SIGMA-SET27K, a new synthetic dataset purpose-built to provide the paired identity, structure, and spatial supervision the task requires — covering over 100,000 unique subjects across 27,000 images.
- SIGMA-GEN achieves state-of-the-art performance in identity preservation, image quality, and speed — and demonstrates emergent capabilities including subject reposing, style transfer via text, mixed-granularity control, and generalization to free-form masks unseen during training.
Generating an image of two specific people standing in a particular spot — one closer, one further back, both looking exactly as they do in reference photos — turns out to be a surprisingly hard problem. Existing tools handle it piecemeal: one pass for layout, another for identity, and even then the results often lose the fine details that make a face or object recognizable. SIGMA-GEN, an experimental technology from Adobe Research presented at ICLR 2026, takes a different approach: it does all of this in a single generation step.
The core challenge in multi-subject image generation is that structure and identity pull in different directions. Spatial constraints — “put subject A in the foreground, subject B behind and to the left” — require the model to understand scene geometry. Identity preservation — “make sure subject A still looks like subject A” — requires fine-grained attention to the reference image. SIGMA-GEN addresses both simultaneously. Identity is preserved by training the model to attend to the relevant representative images in the right places, allowing fine-grained feature alignment at the subject level — so the model doesn’t just place a subject, it anchors their appearance throughout the generation.
Flexibility is a deliberate design goal. Users can specify spatial layout at whatever level of precision they have available — from rough bounding boxes to pixel-level masks — and can even mix control types within a single generation, applying a precise mask to one subject and a coarse 3D box to another. One model handles it all.
Enabling this required new training data. High-quality multi-subject datasets that pair identity images with structural and spatial annotations are scarce in the real world. The team built SIGMA-SET27K synthetically — generating paired identity, structure, and spatial annotations at a scale that real-world collection cannot easily provide — resulting in 27,000 images covering more than 100,000 unique subjects. This synthetic approach is what allows a single model to generalize across the full range of subject types and control modalities.
The results go beyond what the training explicitly targets. SIGMA-GEN handles free-form masks not seen during training, supports subject reposing into new poses while preserving identity, enables style changes via text applied to placed subjects, and correctly interprets depth cues even when depth information is provided outside the subject mask area, a case the model was never explicitly trained on. These emergent capabilities suggest the model has learned something more general about how subjects relate to spatial context, not just how to match a reference image.
SIGMA-GEN is a collaboration between Adobe Research and the University of Massachusetts Amherst, with the core research conducted during Oindrila Saha’s internship at Adobe Research. For creators working with generative AI tools, it points toward a future where multi-subject scene assembly is fast, controllable, and faithful — not a series of compromises. Learn more about this research on our project page.