CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Large Scale Character Classification


Neeba N.V (homepage)

Large scale pattern recognition systems are necessary in many real life problems like object recognition, bio-informatics, character recognition, biometrics and data-mining. This thesis focuses on pattern classification issues associated with character recognition, with special emphasis on Malayalam. We propose an architecture for the character classification, and proves the utility of the the proposed method by validating on a large dataset. The challenges in this work includes, (i) Classification in presence of large number of classes (ii) Efficient implementation of effective large scale classification (iii) Performance analysis and learning in large data sets (of Millions of examples). Throughout this work, we use examples of characters (and symbols) extracted from real-life Malayalam document images. Developing annotated data set at the symbol level from a coarse (say word-level) annotated data is addressed first with the help of a dynamic programming based algorithm. Algorithm is then generalized to handle the popular degradations in the form cuts, merges and other artifacts. As a byproduct this algorithms allows to quantitatively estimate the quality of the books, documents and words. The dynamic programming based algorithm align the text (in UNICODE) and image (in Pixels). This helps in developing a large data set which could help in conducting large scale character classification experiments.

We then conduct an empirical study of classifiers and feature combination to study their suitability to the problem of character classification. The scope of this study include (a) applicability of a spectrum of classifiers and features (b)scalability of classifiers (c) sensitivity of features to degradation (d) generalization across fonts and (e) applicability across scripts. It may be noted that all these aspects are important to solve the character classification problem. Our empirical studies and theoretical results provide convincing evidences to support the utility of SVM (multiple pair-wise) classifiers for solving the problem. However, a direct use of multiple SVM classifiers has certain disadvantages: (i) since there are nC2 pairwise classifiers, storage and computational complexity of the final classifier becomes high for many practical applications. (ii) they directly provide a class label and fail to provide an estimate of the posterior probability. We address these issues by efficiently designing a Decision Directed Acyclic Graph (DDAG) classifier and using the appropriate feature space. We also propose efficient methods to minimize the storage complexity of support vectors for the classification purpose. We also extend our algebraic simplification method for simplifying hierarchical classifier solutions.We use SVM pair-wise classifiers with DDAG architecture for classification. We use linear kernel for SVM, considering the fact that most of the classes in a large class problem are linearly separable.

We carried out our classification experiments on a huge data set, with more than 200 classes and 50 million examples, collected from 12 scanned Malayalam books. Based on the number of cuts, merges detected, the quality definitions are imposed on the document image pages. The experiments are conducted on pages with various quality. We could achieve a reasonably high accuracy on all the data considered. We do an extensive evaluation of the performance on this data set which is more than 2000 pages.

In presence of large and diverse collection of examples, it becomes important to continously learn and adapt. Such an approach could be more significant while recognizing books. We extend our classifier sysyem to continuously improve the performance by providing feedback and retraining the classifier. This thesis focuses on pattern classification issues associated with character recognition, with special emphasis on Malayalam. We propose an architecture for the character classification, and proves the utility of the the proposed method by validating on a large dataset. The challenges in this work includes, (i) Classification in presence of large number of classes (ii) Efficient implementation of effective large scale classification (iii) Performance analysis and learning in large data sets (of Millions of examples).

To summarize, major contributions of this work are:

1. A highly script independent dynamic programming (DP) based method to build large dataset for testing and training character recognition systems.
2. Empirical studies on large dataset of various Indian languages to evaluate the performance of state of the art classifiers and features on large datasets.
3. A hierarchical method to improve the computational complexity of SVM classifier for large class problems.
4. An efficient design and implementation of SVM classifier to effectively handle large class problems. The classifier module has employed for a OCR system for Malayalam.
5. The performance evaluations of the above mentioned methods on a large dataset. We tested on a large dataset of twelve Malayalam books, which is more than 2000 document pages.
6. A novel system for adapting a classifier for recognizing symbols in a book.

(more...)

 

Year of completion:  2010
 Advisor : C. V. Jawahar

Related Publications

 

  • Neeba N.V., and C. V. Jawahar - Empirical Evaluation of Character Classification Schemes Proceedings of the 7th International Conference on Advances in Pattern Recognition (ICAPR 2009), Feb . 4-6, 2009, Kolkotta, India. [PDF]

  • Ilayaraja Prabhakaran, Neeba N.V., and C.V. Jawahar - Efficient Implementation of SVM for Large Class Problems Proc. of the 19th International Conferenc eon Pattern Recognition(ICPR 08), Dec. 8-11,2008, Florida, USA. [PDF]

  • Neeba N.V., and C. V. Jawahar - Recognition of Books by Verification and Retraining Proc. of the 19th International Conference on Pattern Recognition(ICPR 08), Dec. 8-11,2008, Florida, USA. [PDF]


Book Chapter

  • N.V. Neeba, Anoop Namboodiri, C.V. Jawahar and P. J. Narayanan - Recogniton of Malayalam Documents, in Guide To Ocr For Indic Scripts: Part 1: Document Recognition And Retrieval - 2010.

Demonstrations

  • Presentation at ICAPR-2009 Presentation at ICAPR-2009
  • Poster at ICPR-2008 (SVM)
  • Poster at ICPR-2008 (Recognition)

Downloads

thesis

ppt

Efficient SVM based object classification and detection


Sreekanth Vempati (homepage)

Over the last few years, the amount of image and video data present over the internet, as well as in the personal collections has been increasing rapidly. As a result, the need for organizing and searching these vast collections of data efficiently has also increased. This has led to an active research in the areas of content based retrieval from multimedia databases. Description of the content, is built through recognition of the scenes/objects in visual data. There has been a lot of research in these areas over the last few years and are still waiting for critical breakthroughs. For these to be deployable, these solutions should not only be accurate, but also efficient and scalable.

There are two fundamental tasks in visual recognition, namely feature extraction and learning. The feature extraction stage deals with, building a representation for the image/video data and the learning stage deals with, learning a function which can distinguish different classes of data. In this thesis, we focus on building efficient methods for visual content recognition and detection in images and videos. We mainly propose new ideas for the learning stage. For this purpose we start from using the state-of-the-art techniques and then show how our proposed ideas influence the computational time and performance.

Firstly, we show the utility of the state of the art image representations and classification methods for the purpose of large scale semantic video retrieval. We demonstrate this on TRECVID 2008 and TRECVID 2009 datasets containing videos for the retrieval of various scenes, objects and actions. We use Support Vector Machines(SVMs) as classifiers, which have been the popular choice for classification tasks in many fields. They have become popular mainly because of their good generalization capability. For obtaining non-linear decision boundaries, SVMs use a kernel function. This kernel function helps in finding a linear classifier in some high dimensional feature space, without actually computing the higher dimensional vectors. In many situations, we need to use computationally expensive non-linear functions as kernels. On the other hand, linear kernel is computationally inexpensive, however it gives poor results in most of the cases.

Another contribution of this thesis is a method for improving the performance of computationally inexpensive classifiers like linear SVM. For this purpose, we explore the utility of sub-categories, which are the sub groupings present in the feature space of each semantic class. We model these subcategories by using Structural SVM framework. Also, we analyze how the choice of the groupings effect the results and present a method to learn the optimal groupings. We investigate our methods on various synthetic two dimensional datasets and real world datasets namely, VOC 2007 and TRECVID 2008.

Non-linear kernel methods yield state-of-the-art performance for image classification and object detection. However, large scale problems require machine learning techniques of atmost linear complexity and these are usually limited to linear kernels. This unfortunately rules out gold-standard kernels such as the generalized RBF kernels (e.g. exponential-\chi^2 ). Any non-linear kernel helps in computing the inner product in a high dimensional space, different from the input space. This helps in overcoming the explicit computation of the high dimensional vectors. The function which can be used to compute this high dimensional feature vector is called the feature map. But this feature map is hard to compute and is very high dimensional. In the literature, explicit feature finite dimensional feature maps have been proposed to approximate the additive kernels (intersection, \chi^2 ) by linear ones, thus enabling the use of fast machine learning techniques in a non-linear context. Also, an analogous technique was proposed for the translation invariant RBF kernels. As a part of this thesis, we complete the construction and combine the two techniques to obtain explicit feature maps for the generalized RBF kernels. Further- more, we investigate a learning method using l1 regularization to encourage sparsity in the final vector representation, and thus reduce its dimension. We evaluate this technique on the VOC 2007 detection challenge, showing when it can improve on fast additive kernels, and the trade-offs in complexity and accuracy.

Year of completion:  December 2010
 Advisor : Dr. Andrew Zisserman (University of Oxford) and Dr. C. V. Jawahar

Related Publications

  • Sreekanth Vempati, Andrea Vedaldi, Andrew Zisserman and C.V. Jawahar - Generalized RBF Feature Maps for Efficient Detection Proceedings of British Machine Vision Conference (BMVC'10),31 Aug. - 3 Sep. 2010, Aberystwyth, UK. [PDF]

  • Mihir Jain, Sreekanth Vempati, Chandrika Pulla and C.V. Jawahar - Example Based Video Filters Proceedings of the 8th International Conference on Image and Video Retrieval (CIVR 2009), July . 8-10, 2009, Santorini, Greece. [PDF]

  • Sreekanth Vempati, Mihir Jain, Omkar M. Parkhi, C. V. Jawahar, Andrea Vedaldi, Marcin Marszalek, Andrew Zisserman - Oxford/IIIT - TRECVID 2009 - Notebook paper,  In proceedings of TRECVID 2009 Workshop, Gaithersburg, Md., USA. (PDF)
  • James Philbin, Manuel Marin-Jimenez, Siddharth Srinivasan and Andrew Zisserman, Mihir Jain, Sreekanth Vempati, Pramod Sankar and C. V. Jawahar, Oxford/IIIT - TRECVID 2008 - Notebook paper, TRECVID 2008 Workshop, Gaithersburg, Md., USA. (PDF)

Downloads

thesis

ppt

 

Classification, Detection and Segmentation of Deformable Animals in Images


Omkar M Parkhi (homepage)

This thesis deals with the problem of classification, detection and segmentation of objects in images. We focus on classes of objects which have considerable deformations, with the categories of cats and dogs as a case study. Many state of the art methods which perform well on the task of detecting a rigid object category, like bus, airplane, boats etc., have a poor performance on these deformable animal categories. The well known difficulty in automatically distinguishing between cats and dogs in images, has been exploited in web security systems: ASIRRA, a system developed by Microsoft Research, requires users to correctly select all cat images from the 12 cats and dogs images shown to them to gain access to a web service. Beyond this, the problem of classifying these animals into their breeds is a challenging problem even for humans. Developing machine learning methods to solve these challenging problems requires the availability of reliable training data. Here, the popularity of cats and dogs as pets provides a chance of collecting this data from various sources on the internet where they are often present in images and videos (together with people). As a part of this work, we propose: a novel method for detecting cats and dogs in the image; and, a model for classifying images of cats and dogs according to the species. We also introduce a dataset for fine grained classification of pet breeds, and develop models to solve the problem of classifying pet images according to the breed. In the process we also segment these objects.

For detecting animals in an image, we propose a mechanism based on a combination of a template based object detector and a segmentation algorithm. The template based detector of Felzenszwalb et al. is used to first detect the distinctive part of the object, and then an iterative segmentation process extracts the animal by minimizing an energy function based over a conditional random field using GraphCuts. We show quantitatively that our method works well and substantially outperforms whole-body template-based detectors for these highly deformable object categories, and indeed achieves accuracy comparable to the state-of-the-art on the PASCAL VOC competition, which includes other models such as bag-of-words.

For the task of subcategory classification, a novel dataset for pet breed discrimination, the IIIT-OXFORD PET dataset is introduced. The dataset contains 7,349 annotated images of cats and dogs of 37 different breeds. These images were selected from various sources on the internet. In addition to the pet breed, annotations include a pixel level segmentation of the body of each animal, and a bounding box marking its head. This data set is the first of its kind for pet breed classification and should provide an important benchmark for researchers working on fine grained classification. For the classification task, we propose a model to estimate a pet breed automatically from an image. The model combines shape, captured by a deformable part model detecting the pet face, and appearance, captured by a bag-of-words model that describes the pet fur. Two classification approaches are discussed: in a hierarchical approach a pet is first classified into the dog or cat species, and then classified into its corresponding breed; and a flat one, in which the breed is obtained directly. For the task of breed classification, on our 37 class dataset, an average accuracy of about 60% was achieved, a very encouraging result considering the difficulty of the problem. Also, these models are shown to improve probabilities of breaking the challenging Asirra test by more than 30%, beating all previously published results.

 

Year of completion:  July 2011
 Advisor : Dr. Andrew Zisserman (University of Oxford) and Dr. C. V. Jawahar

Related Publications

  • Omkar M. Parkhi, Andrea Vedaldi, C.V. Jawahar and Andrew Zisserman - The Truth about Cats and Dogs Proceedings of 13th International Conference on Computer Vision (ICCV 2011)06-13 Nov. 2011, ISBN. 978-1-14577-1101-5, pp. 1427-1434, Barcelona, Spain. [PDF]

 


Downloads

thesis

ppt

Raytracing Dynamic Scenes on the GPU


Sashidhar Guntury (homepage)

Raytracing dynamic scenes at interactive rates to realtime rates has received a lot of attention recently. In this dissertation, We present a few strategies for high performance ray tracing on an off-the-shelf commodity GGraphics Processing Unit (GPU) traditionally used for accelerating gaming and other graphics applications. We utilize the Grid datastructure for spatially arranging the triangles and raytracing efficiently. The construction of grids needs sorting, which is fast on today’s GPUs. Through results we demonstrate that the grid acceleration structure is competitive with other hierarchical acceleration datastructures and can be considered as the datastructure of choice for dynamic scenes as per-frame rebuilding is required. We advocate the use of appropriate data structures for each stage of raytracing, resulting in multiple structure building per frame. A perspective grid built for the camera achieves perfect coherence for primary rays. A perspective grid built with respect to each light source provides the best performance for shadow rays. We develop a model called Spherical light grids to handle lights positioned inside the model space. However, since perspective grids are best suited for rays with a directions, we resort back to uniform grids to trace arbitrarily directed reflection rays. Uniform grids are best for reflection and refraction rays with little coherence. We propose an Enforced Coherence method to bring coherence to them by rearranging the ray to voxel mapping using sorting. This gives the best performance on GPUs with only user managed caches. We also propose a simple, Independent Voxel Walk method, which performs best by taking advantage of the L1 and L2 caches on recent GPUs. We achieve over 10 fps of total rendering on the Conference model with one light source and one reflection bounce, while rebuilding the data structure for each stage. Ideas presented here are likely to give high performance on the future GPUs as well as other manycore architectures... (more...)

 

Year of completion:  2011
 Advisor : P. J. Narayanan

Related Publications

  • Sashidhar Guntury and P. J. Narayanan - Raytracing Dynamic Scenes with Shadows on the GPU, in Eurographics Symposium on Parallel Graphics and Visualization, 2010
  • Sashidhar Guntury and P. J. Narayanan - Raytracing Dynamic Scenes on the GPU, in IEEE Transactions on Computer Graphics and Visualization, to appear.

Downloads

thesis

ppt

Image Based PTM Synthesis For Realistic Rendering of 3D Models.


Pradeep Rajiv Nallaganchu (homepage)

Capturing the shape and texture of large structures such as monuments and statues at very high resolution is extremely expensive, both in terms of time as well as storage space. In many cases the inner details are generated by surface properties of the material, and the appearance is statistically uniform. In this paper, we present an approach to add surface details to a coarse 3D model of an object based on two additional information: a set of images of the object and a high resolution model of the material that the object is made of. The material model that we employ is the Polynomial Texture Map (PTM), which captures the appearance of a surface under various illumination conditions. We use the observed images of the object as constraints to synthesize texture samples for each triangle of the object under any given illumination.

The primary challenge is to synthesize a polynomial model of the texture, where the constraints arise in the image domain. We use the knowledge of object illumination to map the texture models into image space and compute the optimal patch. The texture transfer then happens as a complete 3D texture model. We also consider the problems of pose, scale, reflectance and smoothness of surface while carrying out the texture transfer. We synthesize the texture of an object at a per-triangle basis while carrying out operations such as normalization and blending to take care of discontinuities at the edges. (more...)

 

Year of completion:  July 2011
 Advisor : Anoop M. Namboodiri

Related Publications

  • Pradeep Rajiv, Anoop M. Namboodiri - Image based PTM Synthesis for Realistic Rendering of Low Resolution 3D Models Proceedings of Seventh Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP'10),12-15 Dec. 2010,Chennai, India. [PDF]


Downloads

thesis

ppt

More Articles …

  1. Interactive visualization and tuning of multi-dimensional clusters for indexing
  2. Detection of diabetic retinopathy lesions in color retinal images
  3. Object-centric Video Navigation and Manipulation
  4. Improving image quality of synthetically zoomed tomographic images.
  • Start
  • Prev
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.