Image Representations for Style Retrieval, Recognition and Background Replacement Tasks

Siddhartha Gairola

Abstract

Replacing overexposed or dull skies in outdoor photographs is a desirable photo manipulation. It is often necessary to color correct the foreground after replacement to make it consistent with the new sky. Methods have been proposed to automate the process of sky replacement and color correction. However, many times a color correction is unwanted by the artist or may produce unrealistic results. Style similarity is an important measure for many applications such as style transfer, fashion search, art exploration, etc. However, computational modeling of style is a difficult task owing to its vague and subjective nature. Most methods for style based retrieval use supervised training with pre-defined categorization of images according to style. While this paradigm is suitable for applications where style categories are well-defined and curating large datasets according to such a categorization is feasible, in several other cases such a categorization is either ill-defined or does not exist. In this thesis, we primarily study various image representations and their applications in understanding visual style and automatic background replacement. First, we propose a data-driven approach to sky-replacement that avoids color correction by finding a diverse set of skies that are consistent in color and natural illumination with the query image foreground. Our database consists of ∼1200 natural images spanning many outdoor categories. Given a query image, we retrieve the most consistent images from the database according to L2 similarity in feature space and produce candidate composites. The candidates are re-ranked based on realism and diversity. We used pre-trained CNN features and a rich set of hand-crafted features that encode color statistics, structural layout, and natural illumination statistics, but observed color statistics to be the most effective for this task. We share our findings on feature selection and show qualitative results and a user-study based evaluation to show the effectiveness of the proposed method. Next, we propose an unsupervised protocol for learning a neural embedding of visual style of images. Our protocol for learning style based representations does not leverage categorical labels but a proxy measure for forming triplets of anchor, similar, and dissimilar images. Using these triplets, we learn a compact style embedding that is useful for style-based search and retrieval. The learned embeddings outperform other unsupervised representations for style-based image retrieval task on six datasets that capture different meanings of style. We also show that by fine-tuning the learned features with datasetspecific style labels, we obtain best results for image style recognition task on five of the six datasets. To the best of our knowledge, ours is the first work that provides a comprehensive review and evaluation of style representations in an unsupervised setting. Our findings along with the curated outdoor scene database would be useful to the community for future research in the direction of sky-search and sky-replacement

Year of completion:	April 2020
Advisor :	P J Narayanan