CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Summer School 2026
    • Talks and Visits
    • Major Events
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Document Annotation and Retrieval Systems


A. Balasubramanian

Digital documents are now omnipresent. Techniques and algorithms to process and understand these documents are still evolving. This thesis focuses on the non-textual documents of textual content. Example of this category are online handwritten documents and scanned printed books. Algorithms for accessing such documents at the content-level are still missing, specially for Indian Languages. This thesis addresses two fundamental problems in this area: Annotation and Retrieval. Annotated datasets of handwriting are a prerequisite for the design and training of hand- writing recognition algorithms. Retrieval from annotated data sets is relatively straightforward. However retrieval from unannotated datasets is still an open problem. We explore algorithms which make these two tasks possible. Annotation of large datasets is a tedious and expensive process. The problem becomes compounded for handwritten documents, where the characters correspond to one or more strokes. We have developed a versatile, robust annotation tool for online handwriting data. This tool is aimed at supporting the emerging UPX/hwDataset schema, a promising successor of the UNIPEN. We provide easy-to-use interface for the annotation tool. However, still the annotation is highly manual. We then propose a novel, automated method for annotation of online handwriting data at the character level, given a parallel corpus of online handwritten data and typed text. The method employs a model-based handwriting synthesis unit to map the two corpora to the same space. Annotation is then propagated to the word level and finally to the individual characters using elastic matching. The initial results of annotation are used to improve the handwriting synthesis model for the user under consideration, which in turn refine the annotation. The method takes care of errors in the handwriting such as spurious and missing strokes and characters. The output is stored in the UPX format. (more...)

 

Year of completion:  2006
 Advisor :

C. V. Jawahar


Related Publications

  • Anand Kumar, A. Balasubramanian, Anoop M. Namboodiri and C.V. Jawahar - Model-Based Annotation of Online Handwritten Datasets, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congreee Atlantia, France. [PDF]

  • C. V. Jawahar and A. Balasubramanian - Synthesis of Online Handwriting in Indian Languages, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congree Atlantia, France. [PDF]

  • A. Balasubramanian, Million Meshesha and C. V. Jawahar - Retrieval from Document Image Collections, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 1-12. [PDF]

  • Sachin Rawat, K. S. Sesh Kumar, Million Meshesha, Indineel Deb Sikdar, A. Balasubramanian and C. V. Jawahar - A Semi-Automatic Adaptive OCR for Digital Libraries, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 13-24. [PDF]

  • C. V. Jawahar, Million Meshesha and A. Balasubramanian, Searching in Document Images, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 622--627. [PDF]

  • A. Bhaskarbhatla, S. Madhavanath, M. Pavan Kumar, A. Balasubramanian, and C. V. Jawahar - Representation and Annotation of Online Handwritten Data, Proceedings of the International Workshop on Frontiers in Handwriting Recognition(IWFHR), Oct. 2004, Tokyo, Japan, pp. 136--141. [PDF]

  • C. V. Jawahar, A. Balasubramanian and Million Meshesha, Word-Level Access to Document Image Datasets, Proceedings of the Workshop on Computer Vision, Graphics and Image Processing(WCVGIP), Feb. 2004, Gwalior, India, pp. 73--76. [PDF]


Downloads

thesis

 ppt

Towards Understanding Texture Processing


Gopal Datt Joshi (homepage)

A fundamental goal of texture research is to develop automated computational methods for retrieving visual information and understanding image content based on textural properties in images. A synergy between biological and computer vision research in low-level vision can give substantial insights about the processes for extracting color, edge, motion, and spatial frequency information from images. In this thesis, we seek to understand the texture processing that takes place in low level human vision in order to develop new and effective methods for texture analysis in computer vision. The different representations formed by the early stages of HVS and visual computations carried out by them to handle various texture patterns is of interest. Such information is needed to identify the mechanisms that can be use in texture analysis tasks. We examine two types of cells, namely the bar and grating cells, which have been identified in literature to play an important role in texture processing, and develop functional models for the same. The model for the bar cell is based on the notion of surround inhibition and excitation. Whereas, the model for the grating cell is based on the fact that a grating cell receives direct inputs from the M-type ganglion cells. The representations derived by these cells are used to design solutions to two important problems of texture: texture based segmentation and classification. The former is addressed in the domain of natural image understanding and the latter is addressed in the domain of document image understanding. Based on our work, we conclude that the early stages of HVS effectively represent various texture patterns and also provide ample information to solve the higher level texture analysis tasks. The richness of information emerges from the capability of the HVS to extract global visual primitives from local features. The presented work is an initial attempt to integrate the current knowledge of HVS mechanisms and computational theories developed for texture analysis. (more...) 

 

Year of completion:
December 2006
 Advisor :

Jayanthi Sivaswamy


Related Publications

  • Joshi Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy - Script Identification from Indian Documents, Proceedings of IAPR Workshop on Document Analysis Systems (DAS 2006), Nelson, pp.255-267. [PDF]

  • Gopal Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy - A Generalised Framework for Script Identification Proc. of International Journal for Document Analysis and Recognition(IJDAR), 10(2), pp.55-68, 2007. [PDF]

  • Gopal Datt Joshi and Jayanthi Sivaswamy - A Computational Model for Boundary Detection, 5th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Madurai, India, LNCS 4338 pp.172-183, 2006. [PDF]

  • Gopal Datt Joshi, and Jayanthi Sivaswamy - A Simple Scheme for Contour Detection, Proceedings of International Conference on Computer Vision and Applications (VISAP 2006), Setubal. [PDF]

  • Gopal Datt Joshi , and Jayanthi Sivaswamy - A Multiscale Approach to Contour Detection, Proceedings of International Conference on Cognition and Recognition (ICCR), pp. 183-193, Mysore, 2005. [PDF]


Downloads

thesis

 ppt

Layer Extraction, Removal and Completion of Indoor Videos: A Tracking Based Approach


Vardhman Jain (homepage)

Image segmentation and layer extraction in video refer to the process of segmenting the image or video frames into various constituent objects. Automatic techniques for these are not always suitable, as the objective is often difficult to describe. With the advent of interactive techniques in the field, these algorithms are now usable for selecting an object of interest in an image or video precisely with less efforts. Object segmentation brings up various other possibilities like cut and paste of objects from one image or video to another. Object removal in image and videos is another application of interest. As the name suggest the task is to eliminate an object from the image or video. This involves recovering the information of the background previously occluded by the object. Object removal in both image and videos have found interesting applications especially in the entertainment industry. The concept of filling-in of information from the surrounding region for images and surrounding frames for videos has been applied for recovering damaged images or clips. This thesis presents two new approaches. The first is for object segmentation or layer extraction from a video. This method allows segmenting complex objects in videos, which can have difficult motion model. The algorithm integrates a robust points tracking algorithm to a 3D graph cuts formulation. Tracking is used for propagating the user given seeds in key frames to the intermediate frames which helps to provide better initialization to the graph cuts optimization. The second is an approach for video completion in indoor scenes. We propose a novel method for video completion using multiview information without applying a full frame or complete motion segmentation. The heart of the algorithm is a method to partition the scenes into regions supporting multiple homographies based on a geometric formulation and thereby providing precise segmentation even at the points where the actual scene information is missing due to the removal of the object. We demonstrate our algorithms on a number of representative videos. We also present a few directions for future work that extends the work presented here.

 

Year of completion:  2006
 Advisor :

P. J. Narayanan


Related Publications

  • Vardhman Jain and P.J. Narayanan - Layer Extraction Using Graph Cuts and Feature Tracking, The 3rd International Conference on Visual Information Engineering (VIE), 26-28 September 2006 in Bangalore, India. [PDF]

  • Vardhman Jain and P. J. Narayanan - Video Completion for Indoor Scenes, 5th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Madurai, India, LNCS 4338 pp.409-420, 2006. [PDF]


Downloads

thesis

 ppt

DGTk: A Data Generation Tool for CV and IBR


V. Vamsi Krishna

Computer Vision (CV) and Image Based Rendering (IBR) are the fields which have emerged in search of a means to make the computers understand the images like humans and the never ending pursuit of the Computer Graphics community to achieve photo realistic rendering. Though each of these fields deal with a completely different problems, both CV and IBR algorithms require high quality ground-truth information about the scenes they are applied on. Traditionally research groups have spent large amounts of resources on creating data using high-resolution equipment for qualitative analysis of CV and IBR algorithms. Such high quality data provided a platform for comparison of CV and IBR algorithms. Though these datasets have enabled comparison of algorithms, during the past decade, the development in the fields of CV and IBR have outpaced the ability of such standard datasets to differentiate among the best performing algorithms. All the resources invested for generating these datasets become wasted. To overcome this problem, researchers have resorted to creating synthetic datasets by extending existing 3D authoring tools, developing stand alone tools for generating synthetic data and developing novel methods of data acquisition for acquiring high quality real world data. The disadvantage of acquiring data using high resolution equipment include (1) Time required for setting up the configuration of equipment, (2) Errors in measuring devices due to physical limitations, (3) Repeatability of experiments due to un-controllable parameters like wind, fog, rain etc. Synthetic data is preferred for the early testing of algorithms, since they make qualitative and quantitative analysis possible. The performance of an algorithm on synthetic data generally provides a good indication of it's performance on the real world data.. (more...)

 

Year of completion:  2006
 Advisor : P. J. Narayanan

Related Publications

  • Vamsikrishna and P.J. Narayanan - Data Generation Toolkit for Image Based Rendering Algorithms , The 3rd International Conference on Visual Information Engineering (VIE), 26-28 September 2006 in Bangalore, India. [PDF]


Downloads

thesis

 ppt

Robust Registration For Video Mosaicing Using Camera Motion Properties


Pulkit Parikh (Home Page)

In recent years, video mosaicing has emerged as an important problem in the domain of computer vision and computer graphics. Video mosaics find applications in many popular arenas including video compression, virtual environments and panoramic photography. Mosaicing is the process of generating a single, large, integrated image by combining the visual clues from multiple images. The composite image called mosaic provides high field of view without compromising the image resolution. When the input images are the frames of a video, the process is called video mosaicing. The process of mosaicing primarily consists of two key components: image registration and image compositing. The focus of this work is on image registration - the process of estimating a transformation that relates the frames of the video. In addition to mosaicing, registration has a wide range of other applications in ortho-rectification, scene transfer, video in-painting, etc. We employ homography, the general 2D transformation, for image registration. Typically, homography estimation is done from a set of matched feature points, extracted from the frames of the input video.

Video mosaicing has been viewed traditionally as a problem of registering (and stitching) only successive video frames. While some of the recent global alignment approaches make use of the information stemming from non-consecutive pairs of video frames, the registration of each frame pair is typically done independently. No real emphasis has been laid on how the imaging process and the camera motion relate the pair-wise homographies. Therefore, accurate registration and in turn, mosaicing, especially, in face of poor feature point correspondence, still remains a challenging task. For example, mosaicing a desert video wherein the frames have minimal texture to provide feature points (e.g. corners) or mosaicing in presence of repetitive texture leading to several mismatched points is highly error-prone.

It is known that the camera that captures the video frames to be mosaiced, does not undergo arbitrary motion. The trajectory of the camera is almost always smooth. Some examples where such motion is seen are aerial videos taken by the camera mounted on an aircraft, videos captured by robots, automated vehicles, etc. We propose an approach which exploits this smoothness constraint to refine outlier homographies i.e., homographies that are detected to be erroneous. In many scenarios, especially in machine-controlled environments, the camera motion, apart from being continuous, also follows a model (say a linear motion model), giving us much more concrete and precise information. In this thesis, we derive relationships between homographies in a video sequence, under many practical scenarios, for various camera motion models. In other words, we translate the camera motion model into a global homography model whose parameters can characterize homographies between every pair of frames. We present a generic methodology which uses this global homography model for accurate registration.

Above mentioned derivations and algorithms have been implemented, verified and tested, on various datasets. The analysis and comparative results demonstrate significant improvement over the existing approaches, in terms of accuracy and robustness. Superior quality mosaics have been developed using our algorithms, in presence of many commonly observed issues like texture-less frames and frames containing repetitive texture, with applicability to indoor (e.g., mosaicing a map) as well as outdoor (e.g., representing an aerial video as a mosaic) settings. For quantitative evaluation of mosaics, we have devised a novel, computationally efficient quality measure. The quantitative results are completely in tune with the visual perception, further testifying the superiority of our approach. In a nutshell, the premise that the properties of the camera motion can serve as an important additional cue for improved video mosaicing, has been empirically validated.

 

Year of completion:  2007
 Advisor :

C. V. Jawahar


Related Publications

  • Pulkit Parikh and C.V. Jawahar - Enhanced Video Mosaicing using Camera Motion Properties Proc. of IEEE Workshop on Motion and Video Computing(WMVC 2007), Austin, Texas, USA, 2007. [PDF]

  • Pulkit Parikh and C.V. Jawahar - Motion constraints For Video Mosaicing, The 3rd International Conference on Visual Information Engineering (VIE), 26-28 September 2006 in Bangalore, India. [PDF]

 


Downloads

thesis

 ppt

More Articles …

  1. Enhancing Weak Biometric Authentication by Adaptation and Improved User-Discrimination
  2. Vision based Robot Navigation using an On-line Visual Experience
  3. Kernel Methods and Factorization for Image and Video Analysis
  4. Imaging and Depth Estimation in an Optimization Framework
  • Start
  • Prev
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. MS Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.