BlobGAN: Spatially Disentangled Scene Representations

European Conference on Computer Vision (ECCV'22)

Publication date: October 27, 2022

Dave Epstein, Taesung Park, Richard Zhang, Eli Shechtman, Alexei A. Efros

Adobe Research thumbnail image

We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered “blobs” of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial uniformity of blobs and the locality inherent to convolution, our network learns to associate different blobs with different entities in a scene and to arrange these blobs to capture scene layout.

Learn More