Publications

BlobGAN: Spatially Disentangled Scene Representations

European Conference on Computer Vision (ECCV'22)

Published October 27, 2022

Dave Epstein, Taesung Park, Richard Zhang, Eli Shechtman, Alexei A. Efros

We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered “blobs” of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial uniformity of blobs and the locality inherent to convolution, our network learns to associate different blobs with different entities in a scene and to arrange these blobs to capture scene layout.

Learn More