Contrastive Learning of Semantic Concepts for Open-set Cross-domain Retrieval

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Publication date: January 3, 2023

Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan, Biplab Banerjee

We consider the problem of image retrieval where query images during testing belong to classes and domains both unseen during training. This requires learning a feature space that has the ability to generalize across both classes and domains together. To this end, we propose semantic contrastive concept network (SCNNet), a new learning framework that helps take a step towards class and domain generalization in a principled fashion. Unlike existing methods that rely on global object representations, SCNNet proposes to learn local feature vectors to facilitate unseen-class generalization. To this end, SCNNet's key innovations include (a) a novel trainable local concept extraction module that learns an orthonormal set of basis vectors, and (b) computes local features for any unseen-class data as a linear combination of the learned basis set. Next, to enable unseen-domain generalization, SCNNet proposes to generate supervisory signals from an adjacent data modality, i.e., natural language, by mining freely available textual label information associated with images. SCNNet derives these signals from our novel trainable semantic ordinal distance constraints that ensure semantic consistency between pairs of images sampled from different domains. Both the proposed modules above enable end-to-end training of the SCNNet, resulting in a model that helps establish state-of-the-art performance on the standard DomainNet, PACS, and Sketchy benchmark datasets with average Prec@200 improvements of 42.6%, 6.5%, and 13.6% respectively over the most recently reported results.