Giving Image Editors Adjustable Control Over Editing Instructions: SliderEdit at CVPR 2026 

May 27, 2026

Tags: AI & Machine Learning, Computer Vision, Imaging & Video, Conferences, Content Intelligence, Intelligent Agents & Assistants

By Arman Zarei (University of Maryland), Samyadeep Basu (Adobe Research), Mobina Pournemat (University of Maryland), Sayan Nag (Adobe Research), Ryan Rossi (Adobe Research), Soheil Feizi (University of Maryland) | CVPR 2026 (Oral)  

Note: This post describes academic research conducted in collaboration with Adobe Research. The system described here is experimental and does not represent a current Adobe product feature. 
Learn more about the project | Read the full paper on arXiv 

Key Takeaways 

  • SliderEdit introduces continuous, adjustable control over individual editing instructions in instruction-based image editing models, turning each instruction into a smooth slider. 
  • A single lightweight low-rank adapter generalizes across diverse edits and compositional instructions without requiring per-attribute retraining. 
  • In single as well as multi-instruction evaluations, SliderEdit achieved stronger continuity metrics while maintaining strong identity preservation compared with classifier-free guidance baselines and prior slider methods. 
  • The framework integrates with state-of-the-art image editing models, supporting applications ranging from subtle face retouching to multi-object scene transformations and iterative creative workflows. 

Today’s instruction-based image editors typically apply edits at a fixed intensity. Ask for “gold dragon skin” or “more dramatic lighting,” and the model produces a single interpretation. There is no direct way to make an edit subtler, stronger, or independently adjustable. The experimental technology SliderEdit addresses this problem by introducing a new framework that turns each editing instruction into a continuously controllable slider. 

From fixed edits to continuous sliders 

SliderEdit disentangles individual instructions within a prompt and exposes each as a smooth control. The key insight is that the transformer architectures behind modern image editors encode instruction semantics within localized token embeddings. By identifying those tokens and learning to modulate them using a low-rank update, SliderEdit can precisely control how strongly each instruction affects the final image. 

The method relies on a novel training objective called Partial Prompt Suppression (PPS). A lightweight low-rank adapter learns to suppress the effect of a target instruction, approximating the output the model would generate if that instruction were absent. The authors found that a simplified single-instruction variant of the loss was sufficient to train adapters that generalize effectively to multi-instruction scenarios. 

Once trained, the adapter’s scaling parameter becomes a natural slider: set it to zero for full edit strength, increase it toward one to suppress the edit, or push it below zero to amplify the effect beyond the model’s default edit strength. 

A single adapter for diverse edits 

Unlike prior approaches that require a separate adapter for each attribute or visual concept, SliderEdit learns a single set of low-rank matrices that generalize across diverse edits and unseen attributes. 

SliderEdit offers two adapter variants suited to different editing scenarios: 

  • GSTLoRA (Globally Selective Token LoRA): Applies updates across all token embeddings for especially smooth single-instruction transitions. 
  • STLoRA (Selective Token LoRA): Targets tokens tied to a specific instruction, enabling disentangled control across multiple simultaneous edits. 

Training is lightweight, requiring only approximately 5k samples on average and a single NVIDIA H100 GPU. Both variants converge in a few hundred iterations. 

Evaluation highlights 

For example, a facial retouching edit can be gradually adjusted from subtle enhancement to dramatic transformation while preserving subject identity. In single-instruction evaluations, GSTLoRA produced smoother transitions than classifier-free guidance baselines and prior slider methods while maintaining lower identity drift. 

STLoRA extended these gains to multi-instruction scenarios, preserving disentangled control across two and three simultaneous edit directions. The authors also note an ongoing trade-off between continuity, extrapolation, and disentanglement across configurations. 

Beyond benchmarks, the qualitative results span face editing, scene-level transformations, text-in-image styling, and zero-shot multi-subject personalization. SliderEdit also enables coherent image sequences that support visual storytelling workflows. 

Because fine-grained control over facial attributes carries responsible AI implications, the authors emphasize that SliderEdit is a research contribution and that real-world deployment would require careful consideration of potential misuse and responsible editing safeguards. 

Intuitive editing 

By enabling smooth, adjustable control over individual editing instructions, SliderEdit gives creators a more intuitive way to refine visual ideas without repeatedly rewriting prompts or regenerating images. Unlike prior approaches that required separate adapters for individual concepts, SliderEdit generalizes across diverse edits with a single lightweight framework, making controllable image editing more flexible and scalable. 

SliderEdit will be presented as an Oral paper at CVPR 2026.  

Recent Posts