CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Summer School 2026
    • Talks and Visits
    • Major Events
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Classification, Detection and Segmentation of Deformable Animals in Images


Omkar M Parkhi (homepage)

This thesis deals with the problem of classification, detection and segmentation of objects in images. We focus on classes of objects which have considerable deformations, with the categories of cats and dogs as a case study. Many state of the art methods which perform well on the task of detecting a rigid object category, like bus, airplane, boats etc., have a poor performance on these deformable animal categories. The well known difficulty in automatically distinguishing between cats and dogs in images, has been exploited in web security systems: ASIRRA, a system developed by Microsoft Research, requires users to correctly select all cat images from the 12 cats and dogs images shown to them to gain access to a web service. Beyond this, the problem of classifying these animals into their breeds is a challenging problem even for humans. Developing machine learning methods to solve these challenging problems requires the availability of reliable training data. Here, the popularity of cats and dogs as pets provides a chance of collecting this data from various sources on the internet where they are often present in images and videos (together with people). As a part of this work, we propose: a novel method for detecting cats and dogs in the image; and, a model for classifying images of cats and dogs according to the species. We also introduce a dataset for fine grained classification of pet breeds, and develop models to solve the problem of classifying pet images according to the breed. In the process we also segment these objects.

For detecting animals in an image, we propose a mechanism based on a combination of a template based object detector and a segmentation algorithm. The template based detector of Felzenszwalb et al. is used to first detect the distinctive part of the object, and then an iterative segmentation process extracts the animal by minimizing an energy function based over a conditional random field using GraphCuts. We show quantitatively that our method works well and substantially outperforms whole-body template-based detectors for these highly deformable object categories, and indeed achieves accuracy comparable to the state-of-the-art on the PASCAL VOC competition, which includes other models such as bag-of-words.

For the task of subcategory classification, a novel dataset for pet breed discrimination, the IIIT-OXFORD PET dataset is introduced. The dataset contains 7,349 annotated images of cats and dogs of 37 different breeds. These images were selected from various sources on the internet. In addition to the pet breed, annotations include a pixel level segmentation of the body of each animal, and a bounding box marking its head. This data set is the first of its kind for pet breed classification and should provide an important benchmark for researchers working on fine grained classification. For the classification task, we propose a model to estimate a pet breed automatically from an image. The model combines shape, captured by a deformable part model detecting the pet face, and appearance, captured by a bag-of-words model that describes the pet fur. Two classification approaches are discussed: in a hierarchical approach a pet is first classified into the dog or cat species, and then classified into its corresponding breed; and a flat one, in which the breed is obtained directly. For the task of breed classification, on our 37 class dataset, an average accuracy of about 60% was achieved, a very encouraging result considering the difficulty of the problem. Also, these models are shown to improve probabilities of breaking the challenging Asirra test by more than 30%, beating all previously published results.

 

Year of completion:  July 2011
 Advisor : Dr. Andrew Zisserman (University of Oxford) and Dr. C. V. Jawahar

Related Publications

  • Omkar M. Parkhi, Andrea Vedaldi, C.V. Jawahar and Andrew Zisserman - The Truth about Cats and Dogs Proceedings of 13th International Conference on Computer Vision (ICCV 2011)06-13 Nov. 2011, ISBN. 978-1-14577-1101-5, pp. 1427-1434, Barcelona, Spain. [PDF]

 


Downloads

thesis

ppt

Raytracing Dynamic Scenes on the GPU


Sashidhar Guntury (homepage)

Raytracing dynamic scenes at interactive rates to realtime rates has received a lot of attention recently. In this dissertation, We present a few strategies for high performance ray tracing on an off-the-shelf commodity GGraphics Processing Unit (GPU) traditionally used for accelerating gaming and other graphics applications. We utilize the Grid datastructure for spatially arranging the triangles and raytracing efficiently. The construction of grids needs sorting, which is fast on today’s GPUs. Through results we demonstrate that the grid acceleration structure is competitive with other hierarchical acceleration datastructures and can be considered as the datastructure of choice for dynamic scenes as per-frame rebuilding is required. We advocate the use of appropriate data structures for each stage of raytracing, resulting in multiple structure building per frame. A perspective grid built for the camera achieves perfect coherence for primary rays. A perspective grid built with respect to each light source provides the best performance for shadow rays. We develop a model called Spherical light grids to handle lights positioned inside the model space. However, since perspective grids are best suited for rays with a directions, we resort back to uniform grids to trace arbitrarily directed reflection rays. Uniform grids are best for reflection and refraction rays with little coherence. We propose an Enforced Coherence method to bring coherence to them by rearranging the ray to voxel mapping using sorting. This gives the best performance on GPUs with only user managed caches. We also propose a simple, Independent Voxel Walk method, which performs best by taking advantage of the L1 and L2 caches on recent GPUs. We achieve over 10 fps of total rendering on the Conference model with one light source and one reflection bounce, while rebuilding the data structure for each stage. Ideas presented here are likely to give high performance on the future GPUs as well as other manycore architectures... (more...)

 

Year of completion:  2011
 Advisor : P. J. Narayanan

Related Publications

  • Sashidhar Guntury and P. J. Narayanan - Raytracing Dynamic Scenes with Shadows on the GPU, in Eurographics Symposium on Parallel Graphics and Visualization, 2010
  • Sashidhar Guntury and P. J. Narayanan - Raytracing Dynamic Scenes on the GPU, in IEEE Transactions on Computer Graphics and Visualization, to appear.

Downloads

thesis

ppt

Image Based PTM Synthesis For Realistic Rendering of 3D Models.


Pradeep Rajiv Nallaganchu (homepage)

Capturing the shape and texture of large structures such as monuments and statues at very high resolution is extremely expensive, both in terms of time as well as storage space. In many cases the inner details are generated by surface properties of the material, and the appearance is statistically uniform. In this paper, we present an approach to add surface details to a coarse 3D model of an object based on two additional information: a set of images of the object and a high resolution model of the material that the object is made of. The material model that we employ is the Polynomial Texture Map (PTM), which captures the appearance of a surface under various illumination conditions. We use the observed images of the object as constraints to synthesize texture samples for each triangle of the object under any given illumination.

The primary challenge is to synthesize a polynomial model of the texture, where the constraints arise in the image domain. We use the knowledge of object illumination to map the texture models into image space and compute the optimal patch. The texture transfer then happens as a complete 3D texture model. We also consider the problems of pose, scale, reflectance and smoothness of surface while carrying out the texture transfer. We synthesize the texture of an object at a per-triangle basis while carrying out operations such as normalization and blending to take care of discontinuities at the edges. (more...)

 

Year of completion:  July 2011
 Advisor : Anoop M. Namboodiri

Related Publications

  • Pradeep Rajiv, Anoop M. Namboodiri - Image based PTM Synthesis for Realistic Rendering of Low Resolution 3D Models Proceedings of Seventh Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP'10),12-15 Dec. 2010,Chennai, India. [PDF]


Downloads

thesis

ppt

Interactive visualization and tuning of multi-dimensional clusters for indexing


Dasari Pavan Kumar (homepage)

The automation of activities in all areas, including business, engineering, science, and government, produces an ever-increasing stream of data. Especially, the amount of multimedia content produced and made available on the Internet, both in professional and personal collections is growing rapidly. Equally increasing are the needs in terms of efficient and effective ways to manage it. And why is that so? Because, people believe that data collected contains valuable information. But, extracting any such information/patterns is however an extremely difficult task. This has led to a great amount of research into content based retrieval and visual recognition. The most recent retrieval systems available extract low-level image features and conceptualize them into clusters. A conventional sequential scan on those image features would approximately take about a few hours to search in a set of hundreds of images. Hence, clustering and indexing forms the very crux of the solution. The state of the art uses the 128-dimensional SIFT as low level descriptors. Indexing even a moderate collection involves several millions of such vectors. The search performance depends on the quality of indexing and there is often a need to interactively tune the process for better accuracy. In this thesis, we propose a visualization-based framework and a tool which adheres to the it to tune the indexing process for images and videos. We use a feature selection approach to improve the clustering of SIFT vectors. Users can visualize the quality of clusters and interactively control the importance of individual or groups of feature dimensions easily. The results of the process can be visualized quickly and the process can be repeated. The user can use a filter or a wrapper model in our tool. We use input sampling, GPU-based processing, and visual tools to analyze correlations to provide interactivity. We present results of tuning the indexing for a few standard datasets. A few tuning iterations resulted in an improvement of over 5% in the final classification performance, which is significant. (more...)

 

Year of completion:  2012
 Advisor : P. J. Narayanan

Related Publications

    • Dasari Pavan Kumar and P. J. Narayanan - Interactive Visualization and Tuning of SIFT Indexing in Proceedings of the Vision, Modelling and Visualization Workshop 2010, Siegen, Germany, 97-105, Eurographics Association., 2010.

    Downloads

    thesis

    ppt

    Detection of diabetic retinopathy lesions in color retinal images


    Keerthi Ram (homepage)

    Advances in medical device technology have resulted in a plethora of devices that sense, record, transform and process digital data. Images are a key form of diagnostic data, and many devices have been designed and deployed that capture high-resolution in-vivo images in different parts of the spectrum. Computers have enabled complex forms of reconstruction of cross-sectional/ 3D structure (and temporal data) non-invasively, by combining views from multiple projections. Images thus present valuable diagnostic information that may be used to make well-informed decisions.

    Computer aided diagnosis is a contemporary exploration to apply computers to process digital data with the aim of assisting medical practitioners in interpreting diagnostic information. This thesis takes up a specific disease: diabetic retinopathy, which has visual characteristics manifesting in different stages. Image analysis and pattern recognition have been used to design systems with the objective of detecting and quantifying the extent. The quantitative information can be used by the practitioner to stage the disease and plan treatment, track drug efficacy, or make decisions on course of treatment or prognosis.

    The generic task of image understanding is known to be computationally ill-posed. However adding domain constraints and restricting the size of the problem make it possible to attempt solutions that are useful. Two basic tasks in image understanding : detection and segmentation, are used. A system is designed to detect a type of vascular lesion called microaneurysm, which appear in the retinal vasculature at the advent of diabetic retinopathy. As the disease progresses it manifests visually as exudative lesions, which are to be segmented, and a system has been developed for the same.

    The developed systems are tested with image datasets that are in the public domain, as well as a real dataset (from a local speciality hospital) collected during the course of the research, to compare performance indicators against the prior art and elicit better understanding of the factors and challenges involved in creating a system that is ready for clinical use. (more...)

     

    Year of completion:  2012
     Advisor : Jayanthi Sivaswamy

    Related Publications

    • Keerthi Ram, Gopal Datt Joshi and Jayanthi Sivaswamy - A Successive Clutter-Rejection based Approach for Early Detection of Diabetic Retinopathy IEEE Transactions on Biomedial Engineering, 58(3), pp. 664-673, Mar,2011. [PDF]

    • Keerthi Ram and Jayanthi Sivaswamy - Multi-Space Clustering for Segmentation of Exudates in Retinal Color Photographs Proceedings of 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC 09), 2-6 September, 2009, Minneapolis, USA. [PDF]

    • Keerthi Ram, Yogesh Babu and Jayanthi Sivaswamy - Curvature Orientation Histograms for Detection and Matching of Vascular Landmarks in Retinal Images Proceedings of SPIE Medical Imaging(SPIE 09), 7-12 February, 2009, Florida, USA. [PDF]


    Downloads

    thesis

    ppt

    More Articles …

    1. Object-centric Video Navigation and Manipulation
    2. Improving image quality of synthetically zoomed tomographic images.
    3. Improving the Efficiency of SfM and its Applications.
    4. Scalable Clustering for Vision using GPUs.
    • Start
    • Prev
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • Next
    • End
    1. You are here:  
    2. Home
    3. Research
    4. MS Thesis
    5. Thesis Students
    Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.