Decomposing Bag of Words Histograms

Abstract

motivation

We aim to decompose a global histogram representation of an image into histograms of its associated objects and regions. This task is formulated as an optimization problem, given a set of linear classifiers, which can effectively discriminate the object categories present in the image. Our decomposition bypasses harder problems associated with accurately localizing and segmenting objects. We evaluate our method on a wide variety of composite histograms, and also compare it with MRF-based solutions. In addition to merely measuring the accuracy of decomposition, we also show the utility of the estimated object and background histograms for the task of image classification on the PASCAL VOC 2007 dataset.

People

Ankit Gandhi

Karteek Alahari

C. V. Jawahar

Motivation

An SVM classifier is often trained to recognize only a single class category. When multiple objects (or uncorrelated noise) are present in an image, the performance deteriorates. To better understand this issue let us consider a split of the PASCAL VOC 2007 test data into images containing a single class category (PASCAL-S) and multiple class categories (PASCAL-M). In this setting, the average precision (AP) of the BoW-trained SVM classifier for the category “cat” is 0.589 on PASCAL-S, while only 0.189 on PASCAL-M. Also, it has been observed that B o W histograms of sin- gle isolated objects are relatively easy to classify. For example, accuracy as high as 77.78% is reported on Caltech 101 dataset, while more complex images, which contain multiple objects and natural clutter are harder to work with (e.g. 62.8% is still the best score on PASCAL VOC 2007). An important reason for this deterioration in performance is the fact that a classifier trained on single objects often fails to recognize the object when the global image representation (BoW) is “corrupted” by additional objects and clutter present in the image. A question of interest to us now is the following. Is it possible to filter out the clutter and classify only the signal?

Paper

Ankit Gandhi, Karteek Alahari and C V Jawahar - Decomposing Bag of Words Histograms Proceedings of International Conference on Computer Vision, 1-8th Dec.2013, Sydney, Australia. [PDF]

[poster] [bibtex]

Datasets

PASCAL VOC 2007 [Click here]
Flickr Multiple Object Dataset [Readme] [Download] [example images]
Composed Caltech [Readme]

Code

TBA

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.