Jing Shi is a research scientist at Adobe Research. His primary interests are in visual perception and generation/manipulation with the interaction of language. His recent work has focused on language-based image editing, scene understanding, and content authenticity. He also has broader interests in the principled way to understand representation learning, reinforcement learning, etc.

Before joining Adobe, he obtained CS Ph.D. at the University of Rochester in 2022 and B.E. degree at the University of Electronic Science and Technology of China.

For more information, please visit his personal webpage.

Publications

ImProvShow: Multimodal Fusion for Image Provenance Summarization

Black, Alexander., Shi, Jing., Fan, Yifei., Collomosse, John. (Nov. 23, 2025)

British Machine Vision Conference (BMVC)

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

Di, Zonglin., Shi, Jing., Fan, Yifei., Tan, Hao., Black, Alexander., Collomosse, John., Liu, Yang. (Oct. 19, 2025)

International Conference on Computer Vision (ICCV 2025)

Improving Large Vision and Language Models by Learning from a Panel of Peers

Hernandez, Jefferson., Shi, Jing., Jenni, Simon., Ordonez, Vicente., Kafle, Kushal. (Oct. 19, 2025)

International Conference on Computer Vision (ICCV 2025)

Click, Type, Repeat: A Comprehensive Survey on GUI Agents

Nguyen, Dang., Chen, Jian., Wang, Yu., Wu, Gang., Park, Namyong., Hu, Zhengmian., Lyu, Hanjia., Wu, Junda., Aponte, Ryan., Xia, Yu., Li, Xintong., Shi, Jing., Chen, Hongjie., Lai, Viet., Xie, Zhouhang., Kim, Sungchul., Zhang, Ruiyi., Yu, Tong., Tanjim, Mehrab., Ahmed, Nesreen., Mathur, Puneet., Yoon, David., Yao, Lina., Kveton, Branislav., Kil, Jihyung., Nguyen, Thien., Bui, Trung., Zhou, Tianyi., Rossi, Ryan., Dernoncourt, Franck. (Aug. 1, 2025)

ACL 2025 Findings

Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

Lee, Saehyung., Yoon, David., Bui, Trung., Shi, Jing., Yoon, Sungroh. (Jul. 19, 2025)

ICML 2025

The Photographer’s Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers

Qi, Daiqing., Zhao, Handong., Shi, Jing., Jenni, Simon., Fan, Yifei., Dernoncourt, Franck., Cohen, Scott., Li, Sheng. (Jun. 15, 2025)

CVPR 2025

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hua, Hang., Liu, Qing., Zhang, Lingzhi., Shi, Jing., Kim, Soo., Zhang, Zhifei., Wang, Yilin., Zhang, Jianming., Lin, Zhe., Luo, Jiebo. (Jun. 15, 2025)

Conference on Computer Vision and Pattern Recognition (CVPR 2025)

Visual Persona: Foundation Model for Full-Body Human Customization

Nam, Jisu., Son, Soowon., Xu, Zhan., Shi, Jing., Liu, Difan., Liu, Feng., Misraa, Aashish., Kim, Seungryong., Zhou, Yang. (Jun. 13, 2025)

Conference on Computer Vision and Pattern Recognition (CVPR 2025)

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Khosla, Savya., Tiwari, Aditi., Kafle, Kushal., Jenni, Simon., Zhao, Handong., Collomosse, John., Shi, Jing. (Jun. 1, 2025)

Association for Computational Linguistics (ACL)

FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Hua, Hang., Shi, Jing., Kafle, Kushal., Jenni, Simon., Zhang, Daoan., Collomosse, John., Cohen, Scott., Liu, Jiebo. (Oct. 1, 2024)

European Conference on Computer Vision (ECCV)

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Shi, Jing., Xiong, Wei., Lin, Zhe., Jung, Hyun. (Jun. 17, 2024)

Conference on Computer Vision and Pattern Recognition (CVPR)

VIXEN: Visual Text Comparison Network for Image Difference Captioning

Black, Alex., Shi, Jing., Fan, Yifei., Bui, Tu., Collomosse, John. (Jan. 14, 2024)

AAAI Conference on Artificial Intelligence (AAAI)