CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Towards Efficient Methods for Word Image Retrieval


Raman Jain (homepage)

The economic feasibility of maintaining large number of documents in digital image formats has created a tremendous demand for robust ways to access and manipulate the information these images contain. This requires research in the area of document image understanding, specifically in the area of document image recognition as well as document image retrieval. There have been many excellent attempts in building robust document analysis systems in industry, academia and research labs. One way to provide traditional database indexing and retrieval capabilities is to fully convert the document to an electronic representation which can be indexed automatically. Unfortunately, there are many factors which prohibit complete conversion including high cost, low document quality, and non-availability of OCRs for non-European languages.

Word spotting has been adopted and used by various researchers as a complementary technique to Optical Character Recognition for document analysis and retrieval. The various applications of word spotting include document indexing, image retrieval and information filtering. The important factors in word spotting techniques are pre-processing, selection and extraction of proper features and image matching algorithms. The Euclidean based algorithm is considered to be a faster matching algorithm. In the word spotting literature the Euclidean based algorithm has been used successfully to compare the features extracted from word images. However, the problem with this approach is that interpolation of images to get same width leads to loss of very useful informations. Dynamic Time Warping based algorithm is more preferable than Euclidean based algorithm.

In this thesis, a new approach based on Weighted Euclidean distance based algorithm has been used innovatively to compare two word images. The various features, i.e., projection profiles, word profiles and transitional features are extracted from the word images which are compared via Weighted Euclidean based algorithm with greater speed and higher accuracy. The experiments have been conducted on a large printed document databases of English language. The average precision rates achieved for this language were 95.48 %. The time taken for matching every two images was 0.03 milli-seconds.

A second line of research performed in this thesis considers keyword spotting for Hindi documents, the task of retrieving all instances of a given word or phrase from a collection of documents. During the course of this thesis, a novel learning based keyword spotting system using recurrent neural networks was developed. To the knowledge of the author, this is the first time that recurrent neural network based method is explored to Indian languages documents. In a set of experiments, its superior performance over state-of-the-art reference systems is shown.

Finally, the above system is applied for exhaustive recognition. The main findings of this thesis are that supervised learning in the form of training can increase the retrieval accuracy. The key to success lies in a well-balanced trade-off between data quality and data quantity when choosing the elements to be added to the training set. Performance evaluation using datasets from different languages shows the effectiveness of our approaches. Extension works are recommended that need further consideration in the future to further the state-of-the-art in document image retrieval. (more...)

 

Year of completion:  July 2015
 Advisor :

Prof. C. V. Jawahar


Related Publications

  • Raman Jain, Volkmar Frinken, C. V. Jawahar and R. Manmatha - BLSTM Neural Network based Word Retrieval for Hindi Documents Proceedings of 11th International Conference on Document Analysis and Recognition (ICDAR 2011),18-21 September, 2011, Beijing, China. [PDF]

  • Raman Jain and C. V. Jawahar - Towards More Effective Distance Functions for Word Image Matching Proceedings of Ninth IAPR International Workshop on Document Analysis Systems (DAS'10), pp.363-370, 9-11 June, 2010, Boston, MA, USA. [PDF]


Downloads

thesis

 ppt

 

Efficient Ray Tracing of Parametric Surgace for Advance Effects


Rohit Nigam (homepage)

Ray Tracing is one of the most important rendering techniques used in computer graphics. Ray traced images are more accurate and photo-realistic as compared to direct rendering. Ray Tracing was earlier considered impractical for rendering scenes at interactive rates because of its high computational cost. However, with the advancements in modern Graphics Processing Units (GPU) and CPUs, ray tracing at interactive rates has now become possible.

Parametric patches have been widely used in many fields to describe a model accurately. They provide a compact and effective way of representing an object and also possess the ability to remain curved on zooming. Ray Tracing of parametric surfaces was considered to be a static process because of the high complexity of intersection algorithms. With advancements in ray tracing techniques and high compute power devices, recent works on ray tracing parametric surfaces have reported near interactive results.

We present a scheme for interactive ray tracing of Bezier bicubic patches using Newton iteration in this dissertation. We use a mixed hierarchy representation as the acceleration structure. This has a bounding volume hierarchy above the patches and a fixed depth subpatch tree below it. This helps reduce the number of ray-patch intersections that needs to be evaluated and provides good initialization for the iterative step, keeping the memory requirements low. We use Newton iteration on the generated list of ray patch intersections in parallel. Our method can exploit the cores of the CPU and the GPU with OpenMP on the CPU and CUDA on the GPU by sharing work between them according to their relative speeds. A data parallel framework is used throughout starting with a list of rays, which is transformed to a list of ray-patch intersections by traversal and then to intersections and a list of secondary rays by root finding. We are able to significantly outperform multi-core CPU implementation and previous GPU implementation using the mixed hierarchy model.

3tp1lvlw

Shadow and reflection rays can be handled exactly in the same manner as a result. The secondary ray list is again sent to the starting of the algorithm to perform mixed hierarchy traversal and intersection tests. We perform fixed depth multiple bounce ray tracing. We also show how our method extends easily to generate soft shadows using area light sources. These effects provide higher realism to the ray traced images.

We render a million pixel image of the Teapot model at 125 fps on a system with an Intel i7 920 and a Nvidia GTX580 for primary rays only and at about 65 fps with one pass of shadow and refection rays. We are able to ray trace bigguy in a box scene with multi-bounce at near interactive rates. We get a speed up of about 5-30x for our hybrid Newton's method implementation over our optimized CPU implementation and about 20-50x over previous GPU implementation of Kajiya's method to ray trace Bezier surfaces. Traversing the mixed hierarchy is the most time consuming step of the algorithm. We expect to see better performance with greater cache size. The hybrid model would be optimal for systems with equal compute power of CPU and GPU. The proposed model is suitable for parallel architecture, hybrid systems and multi-GPU systems.

Global illumination effects have recently started gaining popularity with the progress in parallel architecture. We extend our algorithm to global illumination effects to demonstrate its capabilities. Global illumination effects have not been reported for parametric surfaces. We perform path tracing by tracing a large number of rays per pixel for a fixed depth. Number of samples greatly increase the quality of the image generated. We also perform more advanced effects like ambient occlusion, depth of field, motion blur and glossy surface. We are able to path trace a $512 \times 512$ image with 1000 samples per pixel in about 165 seconds. We report timings for other advanced effects. We find that ray coherence is essential for optimal performance when ray tracing Bezier surfaces on the GPU. Size of the dataset also plays a small part in the overall rendering times. The work done in this dissertation should serve as the starting point to optimally render Bezier surfaces with advanced global illumination techniques. (more...)

 

Year of completion:  July 2015
 Advisor :

P. J. Narayanan


Related Publications

  • Rohit Nigam, P J Narayanan - Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), 16-19 Dec. 2012, Bombay, India. [PDF]


Downloads

thesis

 ppt

 

 

A Hierarchical System Design for Detection of Glaucoma From Color Fundus Images


Madhulika Jain (homepage)

Glaucoma is an eye disorder which is prevalent in the aging population and causes irreversible loss of vision. Hence, computer aided solutions are of interest for screening purposes. Glaucoma is indicated by structural changes in the optic disc (OD), loss of nerve ?bres and atrophy of the peripapillary region of the OD in retina. In retinal images, most of these appear in the form of subtle variation in appearance. Hence, automated assessment of glaucoma from colour fundus images is a challenging problem. Prevalent approaches aim at detecting the primary indicator, namely, the optic cup deformation relative to the disc and use the ratio of the two diameters in the vertical direction, to classify images as normal or glaucomatous.

We explore the use of global motion pattern-based features to detect glaucoma from images and propose an image representation that serves to accentuate subtle indicators of the disease. These global image features are then used to identify normal cases effectively. The proposed method is demonstrated on a large image dataset consisting of 1845 images annotated by 3 medical experts. The global approach is extended to detect atrophy and two hierarchical system designs are proposed. In the ?rst design, only global analysis is used, while in the second both global and local analysis are employed.

In the ?rst design, the ?rst stage is based on features capturing information mainly of primary indicators while the second stage is based on features extracted for detecting atrophy (secondary visual indicator).

The second design attempts to combine the strengths of global and local analysis of the OD region. Global features are used to remove as many normal cases as possible in the ?rst stage and local features are used to perform a ?ner classi?cation in the second stage. This system has been tested on 1040 images with ground truth collected from 3 glaucoma experts. The results show the hybrid approach offers a good solution for glaucoma screening from retinal images (more...)

 

Year of completion:  July 2015
 Advisor :

Jayanthi Sivaswamy


Related Publications

  • K Sai Deepak, Madhulika Jain, Gopal Datt Joshi, Jayanthi Sivaswamy - Motion Pattern-Based Image Features for Glaucoma Detection from Retinal Images Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), 16-19 Dec. 2012, Bombay, India. [PDF]

  • Madhulika Jain, Arunava Chakravarty, Gopal Datt Joshi and Jayanthi Sivaswamy - A Hierarchical and Multifactorial System Design for detection of Glaucoma from Color Fundus Images, in Medical Image Computing and Computer Assisted Intervention 2014 (Not Accepted). [PDF]

Downloads

thesis

 ppt

 

Representation of Ballistic Strokes of Handwriting for Recognition and Verification.


Prabhu Teja S (homepage)

The primary computing device interfaces are moving away from the traditional keyboard and mouse inputs towards touch and stylus based interactions. To improve their effectiveness, the interfaces for such devices should be made robust, efficient, and intuitive. One of the most natural ways of communication for humans has been through handwriting. Handwriting based interfaces are more practical than keyboards, especially for scripts of Indic and Han family, which have a large number of symbols. Pen based computing serves four functionalities: a. pointing input b. handwriting recognition c. direct manipulation d. gesture recognition. The second and fourth problems fall under the broad umbrella of pattern recognition problems. In this thesis we focus on efficient representations for handwriting.

In the the first part of this thesis we propose a representation for online handwriting based on ballistic strokes. We first propose a technique to segment online handwriting into its constituent ballistic strokes based on the curvature profile. We argue that the proposed method of segmentation is more robust to noise, compared to the traditional speed profile minima based segmentation. We, then, propose to represent the segmented strokes as the arc of a circle. This representation if validated by the Sigma-lognormal theory of handwriting generation. These features are encoded using a bag-of-words representation, which we name bag-of-strokes. This representation is shown to achieve state- of-art accuracies on various datasets.

In the second part, we extend this representation to the problem of signature verification. To define a verification system, a similarity metric is to be defined. We propose a metric learning algorithm based on the Support Vector Machine (SVM) hyperplanes learned to separate the training data. This results in a very simple metric learning strategy that capable of being modified to increase the number of users registered by the verification system. We experiment with this technique on the publicly available SVC-2004 database and show that this method results in accuracies applicable to practical scenarios. (more...)

 

Year of completion:  July 2015
 Advisor :

Anoop M. Namboodiri


Related Publications

  • Prabhu Teja S. and Anoop M Namboodiri - A Ballistic Stroke Representation of Online Handwriting for Recognition Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), 25-28 Aug. 2013, Washington DC, USA. [PDF]


Downloads

thesis

 ppt

Optimizing Average Precision using Weakly Supervised Data.


Aseem Behl (homepage)

Many tasks in computer vision, such as action classification and object detection, require us to rank a set of samples according to their relevance to a particular visual category. The performance of such tasks is often measured in terms of the average precision (AP). Yet it is common practice to employ the support vector machine (SVM) classifier, which optimizes a surrogate 0-1 loss. The popularity of SVM can be attributed to its empirical performance. Specifically, in fully supervised settings, svm tends to provide similar accuracy to AP-SVM, which directly optimizes an AP-based loss. However, we hypothesize that in the significantly more challenging and practically useful setting of weakly supervised learning, it becomes crucial to optimize the right accuracy measure. In order to test this hypothesis, we propose a novel latent AP-SVM that minimizes a carefully designed upper bound on the AP-based loss function over weakly supervised samples. Using publicly available datasets, we demonstrate the advantage of our approach over standard loss-based learning frameworks on three challenging problems: action classification, character recognition and object detection. (more...)

 

Year of completion:  July 2015
 Advisor :

Prof. C.V. Jawahar and Dr. M. Pawan Kumar


Related Publications

  • Aseem Behl, Pritish Mohapatra, C. V. Jawahar, M. Pawan Kumar - Optimizing Average Precision using Weakly Supervised Data IEEE Transations on Pattern Analysis and Machine Intelligence (TPAMI 2015). [PDF]

  • Puneet K. Dokania, Aseem Behl, C.V. Jawahar, M. Pawan Kumar -  Learning to Rank using High-Order Information Proceedings of European Conference on Computer Vision,06-12 Sep 2014, Zurich. [PDF]

  • Aseem Behl, C.V. Jawahar and M. Pawan Kumar - Optimizing Average Precision using Weakly Supervised Data Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 23-28 June 2014, Columbus, Ohio, USA. [PDF]


Downloads

thesis

 ppt

More Articles …

  1. Two GPU Algorithms for Raytracing
  2. RemoteVIS: A Remote Rendering System
  3. Motion in Multiple Views
  4. A Probabilistic Learning Approach for Modelling Human Activities
  • Start
  • Prev
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. MS Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.