Thesis Students

Geometric Grouping of Planar Patterns in a Perspective View

Kiran Varanasi (homepage)

When geometric primitives such as curves and patterns appear in repetition over a per- spective image, they offer key information for recovering the real metric structure of the scene. This happens because multiplicity is equivalent to motion - a single image of such a scene is equivalent to multiple images taken from varying camera viewpoints. Multiplicity can manifest in the image through several forms - tiling (translational symmetry), reflection (bilateral symmetry), or rotation (point symmetry). In all these cases, it pays dividends to group these patterns into geometrically meaningful sets, i.e, into sets of patterns which pro- duce a uniform geometric constraint. For example, a unique vanishing point. Each of these patterns is defined in terms of some interest points and contour segments. It is conceivable that these points and contours are ill-identified, and this makes the task of geometric group- ing all the more challenging. However, if only the patterns are grouped robustly, the later tasks of 3D reconstruction and the estimation of pose can be handled with high accuracy.

Symmetrical patterns are commonplace in natural and man-made environments. Espe- cially in architectural scenes, these patterns abound in all types of variety. Several attempts have been being made by the research community to exploit this information. Success has been reported in several areas, particularly in the area of image based modeling and ren- dering (IBMR). In this thesis, we study the problem of geometric grouping of patterns, from the context of an interactive IBMR application. We will demonstrate how geometric grouping with minimal user interaction is useful towards improving the robustness of struc- ture recovery. We handle the problem of geometric grouping of planar patterns in all its generality - we do not presume a known period of repetition or a known template for the patterns. The only properties which we use for grouping the patterns are the geometric constraints (such as the vanishing line and the circular points) - properties that would be es- timated by the patterns themselves. Our algorithms facilitate the aggregation of geometric information from multiple sources in the image, and thus make robust rectification possible.

The principal contributions of our work are on three fronts - (1) A method for identify- ing interest points on a set of poorly identified and badly fragmented image contours (2) A greedy optimization approach for computing point correspondences through preservingthe coherence of spatial information (3) Ways to incorporate the information provided by user-input into the optimization process. Below, we discuss each of these issues in brief. In order to have a generic mechanism for describing patterns, we need to have a method for detecting interest points on the patterns reliably. The color / intensity information may not hold important cues for corner detection if the saliency of the required points is primarily geometric. A pattern in our case is represented using a collection of image con- tours, which could be badly fragmented and erroneous. We note that local shape properties such as derivatives at a point would be damaged badly by the operation of perspective projection. However, global shape properties such as the relative distance of points from the center of mass of the contour would not be badly affected, though not guaranteed to be preserved. We shall use these global properties and guardedly compute a set of interest points. We apply a neighborhood of a given size and suppress interest points which are not maximal in their neighborhoods. In the thesis, we demonstrate through experiments that this method identifies interest points reliably on a perspectively distorted image of the pattern. We compare our results with those provided by the method of Shi-Tomasi.

After the detection of interest points, we study the problem of pairing two sets of points uniquely with each other. We provide an effective solution for this problem by an opti- mization approach which tries to maximize the spatial coherence subject to a consensus on geometric constraints. Our method stands in contrast to classical methods which try to match points through the use of geometric invariants. Instead, we try to satisfy spatial co- herence between the point matches - which means the conservation of Euclidean properties such as angle and ratio of lengths. Though it is true that these Euclidean properties are no longer valid after a projective transform, we show that they can be preserved in an approxi- mate sense. Unlike invariant-based methods, our method has the added advantage of being robust to noise and outliers. We frame the optimization problem in a greedy setting and thus obtain a locally maximal solution. This produces a matching between the two point sets. When the replication (multiplicity) of the patterns is more than two, they are handled through generalizing our solution to the entire set of points. The model is solved using Levenberg Marquardt optimization. This is similar to the bundle-adjustment algorithm in stereo.

We study the problem of geometric grouping in the context of an interactive application. One of the major concerns for such an application would be to minimize the level of user- interaction and also to be able to tolerate errors at the micro-level. Previous methods of IBMR through plane-based rectification have required the user to provide information at the pixel level, in the form of parallel and perpendicular line segments. This method is not only error-prone but also extremely taxing for the user to provide. In our application, we provide new ways for the user to interact with the system and new means of incorporating his user-input into the optimization process. We demonstrate that the user input is indeed useful in reducing the combinatorial complexity of the algorithm by a large factor.

The method of geometric grouping of patterns has several applications. We discuss the direct application of an interactive image-based modeler. We provide references to the other applications - tracking, recognition, stereo etc in the future work section.

Year of completion:	2006
Advisor :	P. J. Narayanan

Related Publications

Downloads

ppt

GSWall: A Scalable Tiled-Display Wall

Nirnimesh (homepage)

Tiled displays can provide high resolution and large display area. Cluster-based tiled displays are cost-effective and scalable. Chromium is a popular software API used to build such displays; Chromium based tiled displays tend to be network-limited affecting the scalability in the number of nodes and the ability to handle large environment models. This thesis presents a tiled Display Wall setup based on a client-server architecture, with the server managing all the aspects leaving the clients with rendering as their sole responsibility. Our system uses off-the-shelf graphics hardware and standard ethernet net- work. High-level scene structure and hierarchy of a scene graph (OpenSceneGraph) is used by a central server to minimize network load. The view-frustums of the rendering nodes are treated hierarchically as well. A novel algorithm combines the object hierarchy of the scene graph with the hierarchy of frustums to determine the optimal process of unfolding the hierarchies so as to minimize the number of computations involved. Visible parts of the scene graph are transmitted and cached by the clients to take advantage of temporal coherence. The server, following a push-philosophy, is able to exploit the high degree of overlap in the computation space for each rendering node to avoid concurrent redundant com- putations. We use a multicast oriented protocol for data-transmission to the clients, making the system scalable. Geometry push philosophy from the server helps keep the clients in sync with one another and facilitates the pipelining of the constituent stages. Distributed rendering allows the display wall to be able to render scenes which are otherwise too bulky for any of the individual rendering nodes. No node, including the server, needs to render the entire environment, making our system suitable for interactive rendering of massive models. We show performance measures for the different underlying aspects of our display wall. The display wall application is implemented as a library-intercept mechanism to seam- lessly render any OpenSceneGraph-based graphics application to a tiled-display wall without the need of modification, recompilation or even relinking. This makes our display wall easy to use for several already existing applications. Our studies show that the server and network loads grow sub-linearly with the number of tiles. This makes our scheme suitable for the construction of very large-resolution displays.

Year of completion:	2006
Advisor :	P. J. Narayanan

Related Publications

Downloads

ppt

Document Annotation and Retrieval Systems

A. Balasubramanian

Digital documents are now omnipresent. Techniques and algorithms to process and understand these documents are still evolving. This thesis focuses on the non-textual documents of textual content. Example of this category are online handwritten documents and scanned printed books. Algorithms for accessing such documents at the content-level are still missing, specially for Indian Languages. This thesis addresses two fundamental problems in this area: Annotation and Retrieval. Annotated datasets of handwriting are a prerequisite for the design and training of hand- writing recognition algorithms. Retrieval from annotated data sets is relatively straightforward. However retrieval from unannotated datasets is still an open problem. We explore algorithms which make these two tasks possible. Annotation of large datasets is a tedious and expensive process. The problem becomes compounded for handwritten documents, where the characters correspond to one or more strokes. We have developed a versatile, robust annotation tool for online handwriting data. This tool is aimed at supporting the emerging UPX/hwDataset schema, a promising successor of the UNIPEN. We provide easy-to-use interface for the annotation tool. However, still the annotation is highly manual. We then propose a novel, automated method for annotation of online handwriting data at the character level, given a parallel corpus of online handwritten data and typed text. The method employs a model-based handwriting synthesis unit to map the two corpora to the same space. Annotation is then propagated to the word level and finally to the individual characters using elastic matching. The initial results of annotation are used to improve the handwriting synthesis model for the user under consideration, which in turn refine the annotation. The method takes care of errors in the handwriting such as spurious and missing strokes and characters. The output is stored in the UPX format. (more...)

Year of completion:	2006
Advisor :	C. V. Jawahar

Related Publications

Anand Kumar, A. Balasubramanian, Anoop M. Namboodiri and C.V. Jawahar - Model-Based Annotation of Online Handwritten Datasets, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congreee Atlantia, France. [PDF]
C. V. Jawahar and A. Balasubramanian - Synthesis of Online Handwriting in Indian Languages, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congree Atlantia, France. [PDF]
A. Balasubramanian, Million Meshesha and C. V. Jawahar - Retrieval from Document Image Collections, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 1-12. [PDF]
Sachin Rawat, K. S. Sesh Kumar, Million Meshesha, Indineel Deb Sikdar, A. Balasubramanian and C. V. Jawahar - A Semi-Automatic Adaptive OCR for Digital Libraries, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 13-24. [PDF]
C. V. Jawahar, Million Meshesha and A. Balasubramanian, Searching in Document Images, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 622--627. [PDF]
A. Bhaskarbhatla, S. Madhavanath, M. Pavan Kumar, A. Balasubramanian, and C. V. Jawahar - Representation and Annotation of Online Handwritten Data, Proceedings of the International Workshop on Frontiers in Handwriting Recognition(IWFHR), Oct. 2004, Tokyo, Japan, pp. 136--141. [PDF]
C. V. Jawahar, A. Balasubramanian and Million Meshesha, Word-Level Access to Document Image Datasets, Proceedings of the Workshop on Computer Vision, Graphics and Image Processing(WCVGIP), Feb. 2004, Gwalior, India, pp. 73--76. [PDF]

Downloads

ppt

Towards Understanding Texture Processing

Gopal Datt Joshi (homepage)

A fundamental goal of texture research is to develop automated computational methods for retrieving visual information and understanding image content based on textural properties in images. A synergy between biological and computer vision research in low-level vision can give substantial insights about the processes for extracting color, edge, motion, and spatial frequency information from images. In this thesis, we seek to understand the texture processing that takes place in low level human vision in order to develop new and effective methods for texture analysis in computer vision. The different representations formed by the early stages of HVS and visual computations carried out by them to handle various texture patterns is of interest. Such information is needed to identify the mechanisms that can be use in texture analysis tasks. We examine two types of cells, namely the bar and grating cells, which have been identified in literature to play an important role in texture processing, and develop functional models for the same. The model for the bar cell is based on the notion of surround inhibition and excitation. Whereas, the model for the grating cell is based on the fact that a grating cell receives direct inputs from the M-type ganglion cells. The representations derived by these cells are used to design solutions to two important problems of texture: texture based segmentation and classification. The former is addressed in the domain of natural image understanding and the latter is addressed in the domain of document image understanding. Based on our work, we conclude that the early stages of HVS effectively represent various texture patterns and also provide ample information to solve the higher level texture analysis tasks. The richness of information emerges from the capability of the HVS to extract global visual primitives from local features. The presented work is an initial attempt to integrate the current knowledge of HVS mechanisms and computational theories developed for texture analysis. (more...)

Year of completion:	December 2006
Advisor :	Jayanthi Sivaswamy

Related Publications

Joshi Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy - Script Identification from Indian Documents, Proceedings of IAPR Workshop on Document Analysis Systems (DAS 2006), Nelson, pp.255-267. [PDF]
Gopal Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy - A Generalised Framework for Script Identification Proc. of International Journal for Document Analysis and Recognition(IJDAR), 10(2), pp.55-68, 2007. [PDF]
Gopal Datt Joshi and Jayanthi Sivaswamy - A Computational Model for Boundary Detection, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.172-183, 2006. [PDF]
Gopal Datt Joshi, and Jayanthi Sivaswamy - A Simple Scheme for Contour Detection, Proceedings of International Conference on Computer Vision and Applications (VISAP 2006), Setubal. [PDF]
Gopal Datt Joshi , and Jayanthi Sivaswamy - A Multiscale Approach to Contour Detection, Proceedings of International Conference on Cognition and Recognition ,pp. 183-193, Mysore, 2005. [PDF]

Downloads

ppt

Layer Extraction, Removal and Completion of Indoor Videos: A Tracking Based Approach

Vardhman Jain (homepage)

Image segmentation and layer extraction in video refer to the process of segmenting the image or video frames into various constituent objects. Automatic techniques for these are not always suitable, as the objective is often difficult to describe. With the advent of interactive techniques in the field, these algorithms are now usable for selecting an object of interest in an image or video precisely with less efforts. Object segmentation brings up various other possibilities like cut and paste of objects from one image or video to another. Object removal in image and videos is another application of interest. As the name suggest the task is to eliminate an object from the image or video. This involves recovering the information of the background previously occluded by the object. Object removal in both image and videos have found interesting applications especially in the entertainment industry. The concept of filling-in of information from the surrounding region for images and surrounding frames for videos has been applied for recovering damaged images or clips. This thesis presents two new approaches. The first is for object segmentation or layer extraction from a video. This method allows segmenting complex objects in videos, which can have difficult motion model. The algorithm integrates a robust points tracking algorithm to a 3D graph cuts formulation. Tracking is used for propagating the user given seeds in key frames to the intermediate frames which helps to provide better initialization to the graph cuts optimization. The second is an approach for video completion in indoor scenes. We propose a novel method for video completion using multiview information without applying a full frame or complete motion segmentation. The heart of the algorithm is a method to partition the scenes into regions supporting multiple homographies based on a geometric formulation and thereby providing precise segmentation even at the points where the actual scene information is missing due to the removal of the object. We demonstrate our algorithms on a number of representative videos. We also present a few directions for future work that extends the work presented here.

Year of completion:	2006
Advisor :	P. J. Narayanan

Related Publications

Vardhman Jain and P.J. Narayanan - Layer Extraction Using Graph Cuts and Feature Tracking, The 3rd International Conference on Visual Information Engineering 26-28 September 2006 in Bangalore, India. [PDF]
Vardhman Jain and P. J. Narayanan - Video Completion for Indoor Scenes, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.409-420, 2006. [PDF]

Downloads

ppt

Geometric Grouping of Planar Patterns in a Perspective View

Related Publications

Downloads

GSWall: A Scalable Tiled-Display Wall

Related Publications

Downloads

Document Annotation and Retrieval Systems

Related Publications

Downloads

Towards Understanding Texture Processing

Related Publications

Downloads

Layer Extraction, Removal and Completion of Indoor Videos: A Tracking Based Approach

Related Publications

Downloads

More Articles …