Classification, Detection and Segmentation of Deformable Animals in Images
Omkar M Parkhi (homepage)
This thesis deals with the problem of classification, detection and segmentation of objects in images. We focus on classes of objects which have considerable deformations, with the categories of cats and dogs as a case study. Many state of the art methods which perform well on the task of detecting a rigid object category, like bus, airplane, boats etc., have a poor performance on these deformable animal categories. The well known difficulty in automatically distinguishing between cats and dogs in images, has been exploited in web security systems: ASIRRA, a system developed by Microsoft Research, requires users to correctly select all cat images from the 12 cats and dogs images shown to them to gain access to a web service. Beyond this, the problem of classifying these animals into their breeds is a challenging problem even for humans. Developing machine learning methods to solve these challenging problems requires the availability of reliable training data. Here, the popularity of cats and dogs as pets provides a chance of collecting this data from various sources on the internet where they are often present in images and videos (together with people). As a part of this work, we propose: a novel method for detecting cats and dogs in the image; and, a model for classifying images of cats and dogs according to the species. We also introduce a dataset for fine grained classification of pet breeds, and develop models to solve the problem of classifying pet images according to the breed. In the process we also segment these objects.
For detecting animals in an image, we propose a mechanism based on a combination of a template based object detector and a segmentation algorithm. The template based detector of Felzenszwalb et al. is used to first detect the distinctive part of the object, and then an iterative segmentation process extracts the animal by minimizing an energy function based over a conditional random field using GraphCuts. We show quantitatively that our method works well and substantially outperforms whole-body template-based detectors for these highly deformable object categories, and indeed achieves accuracy comparable to the state-of-the-art on the PASCAL VOC competition, which includes other models such as bag-of-words.
For the task of subcategory classification, a novel dataset for pet breed discrimination, the IIIT-OXFORD PET dataset is introduced. The dataset contains 7,349 annotated images of cats and dogs of 37 different breeds. These images were selected from various sources on the internet. In addition to the pet breed, annotations include a pixel level segmentation of the body of each animal, and a bounding box marking its head. This data set is the first of its kind for pet breed classification and should provide an important benchmark for researchers working on fine grained classification. For the classification task, we propose a model to estimate a pet breed automatically from an image. The model combines shape, captured by a deformable part model detecting the pet face, and appearance, captured by a bag-of-words model that describes the pet fur. Two classification approaches are discussed: in a hierarchical approach a pet is first classified into the dog or cat species, and then classified into its corresponding breed; and a flat one, in which the breed is obtained directly. For the task of breed classification, on our 37 class dataset, an average accuracy of about 60% was achieved, a very encouraging result considering the difficulty of the problem. Also, these models are shown to improve probabilities of breaking the challenging Asirra test by more than 30%, beating all previously published results.
Year of completion: | July 2011 |
Advisor : | Dr. Andrew Zisserman (University of Oxford) and Dr. C. V. Jawahar |
Related Publications
Omkar M. Parkhi, Andrea Vedaldi, C.V. Jawahar and Andrew Zisserman - The Truth about Cats and Dogs Proceedings of 13th International Conference on Computer Vision (ICCV 2011)06-13 Nov. 2011, ISBN. 978-1-14577-1101-5, pp. 1427-1434, Barcelona, Spain. [PDF]