CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Depth Image Representation for Image Based Rendering


Sashi Kumar Penta

Conventional approaches to render a scene require geometric description such as polygonal meshes, etc and appearance descriptions such as lights and material properties. To render high quality images from such descriptions, we require very accurate models and materials. Creating such accurate geometric models of real scenes is a difficult and time consuming problem. Image Based Rendering (IBR) holds a lot of promise for navigating through a real world scene without modeling it manually. Different representations have been proposed for IBR in the literature. In this thesis, we explore the Depth Image representation consisiting of depth maps and texture images from a number of viewpoints as a rich and viable representation for IBR.

main

We discuss various aspects of this Depth Image representation including its capture, representation, compression, and rendering. We present algorithms for efficient and smooth rendering of new views using the Depth Image representation. We show several examples of using the representation to model and render the complex scenes. We present a fast rendering algorithm using Depth Images on programmable GPUs. Compression of multiple images has attracted lot of attention in the past. Compression of multiple depth maps of the same scene has not been explored in the literature. We propose a method for compressing multiple depth maps in this paper using a geometric proxy. Different quality of rendering and compression ratios can be achieved by varying different parameters. Experiments show the effectiveness of the compression technique on several model data.

 

Year of completion:  2005
 Advisor :

P. J. Narayanan


Related Publications

  • Sashi Kumar Penta and P. J. Narayanan, - Compression of Multiple Depth-Maps for IBR, The Visual Computer, International Journal of Computer Graphics, Vol. 21, No.8-10, September 2005, pp. 611--618. [PDF]

  • P. J. Narayanan, Sashi Kumar P and Sireesh Reddy K, Depth+Texture Representation for Image Based Rendering, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 113--118. [PDF]

  • Pooja Verlani, Aditi Goswami, P. J. Narayanan, Shekhar Dwivedi and Sashi Kumar Penta - Depth Images: Representations and Real-time Rendering in Symposium on 3D Data Processing, Visualization and Transmission, June 14-16, 2006 (3DPVT 2006).


Downloads

thesis

 ppt

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification


MNSSK Pavan Kumar

Pattern recognition is an important area of research in information science with widespread application in image analysis, language processing, data mining, bioinformatics, etc. Most of the research in this area was centered around building efficient classifiers for recognition problems involving two classes of samples. However, most of the real world pattern recognition problems are not simple binary problems; they have multiple classes to recognize. It is only in the recent past that researchers have started to focus on building classifiers for multiple class classification. There are two popular directions for developing multiple class classifiers --- one is to extend the theory of binary classification directly to adopt to the multiple classes, and the other is to build large multiclass systems with efficient binary classifiers as components. The latter paradigm has received much attention and popularity due to its empirical success in various applications such as Optical Character Recognition, Biometrics, Data Mining etc. This Thesis focuses on the classifier combination paradigm to design efficient and accurate classifiers for multiclass recognition of large number of classes.

Decision Directed Acyclic Graph (DDAG) classifiers integrate a set of pairwise classifiers using a graph based architecture. Accuracy of the DDAG can be improved by appropriate design of the individual nodes. An optimal feature selection scheme is employed for improving the performance of the nodes. This feature selection scheme extracts separate set of features for each node, which increases the time taken for a sample to be classified. A novel approach for the computation of the condendsed representation of the large set of features using LDA followed by PCA is proposed to overcome this problem. Nodes with low performance are boosted using a popular boosting algorithm called Adaboost. Improvement in performance is demonstrated on Character Recognition and other datasets.

The arrangement of nodes in a DDAG is shown to affect the classification performance of the overall classifier. Popular DDAG algorithm is very accurate; however provides no more information than the class label. We modify this algorithm to handle a `reject-class' for improving the performance. The design of this DDAG is posed as an optimization problem in terms of misclassification probabilities. The problem being NP-Hard, approximate algorithms are proposed to design an improved DDAG. It is experimentally shown that the proposed algorithm provides a design close to the average case performance of the DDAG.

Although DDAGs are an attractive way to build multiclass classifiers, they suffer from the large number of component classifiers they employ. A Binary Hierarchical Classifier (BHC) is an alternate approach, which uses very few component classifiers unlike a DDAG. We propose a new scheme to divide the set of available classes into overlapping partitions so as to maximize the margin at each node of the BHC, and thereby improving the performance. BHC and DDAGs are the classifiers with complementary advantages. A DDAG has high accuracy but has high storage and classification time complexity. BHC is extremely efficient in storage, with a reasonably high accuracy. We exploit the advantages of both these classifiers by designing a new hybrid classifier. This employs binary partitions at most of the nodes where the classifications are relatively easy, and DDAGs are used for classifying a complex subset of classes wherever appropriate. The hybrid architecture is shown to perform better than the BHC, with a performance close to that of the DDAG, but with a great reduction in the number of nodes required compared to that of a complete DDAG.

This thesis presents a spectrum of classifier design algorithms for improving performance by feature selection, component classifier selection and combination architecture design. The problems are formulated in their generality, analyzed and the proposed solutions are are empirically evaluated on popular datasets. Experimental results demonstrate that these techniques are promising.

 

Year of completion:  2005
 Advisor :

C. V. Jawahar


Related Publications

  • M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Design of Hierarchical Classifier with Hybrid Architectures, Proceedings of First International Conference on Pattern Recognition and Machine Intelligence(PReMI 2005) Kolkata, India. December 2005, pp 276-279. [PDF]

  • M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Configurable Hybrid Architectures for Character Recognition Applications, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 1199-1203. [PDF]

  • MNSSK Pavan Kumar and C. V. Jawahar - On Improving Design of Multiclass Classifiers, Proceedings of the International Conference on Advances in Pattern Recognition(ICAPR), Dec. 2003, Calcutta, India, pp. 109--112. [PDF]

  • C. V. Jawahar, MNSSK Pavan Kumar and S. S. Ravikiran - A Bilingual OCR system for Hindi-Telugu Documents and its Applications, Proceedings of the International Conference on Document Analysis and Recognition(ICDAR) Aug. 2003, Edinburgh, Scotland, pp. 408--413. [PDF]


Downloads

thesis

 ppt

 

Modelling and Recognition of Dynamic Events in Video


Karteek Alahari

Computer Vision algorithms, which mainly focussed on analyzing image data till the early 1980's, have now matured to handle video data more efficiently. In the past, computational barriers have limited the complexity of video processing applications. As a consequence, most systems were either too slow to be practical, or succeeded by restricting themselves to very controlled situations. With the availability of faster computing resources over the past couple of decades, video processing applications have gained popularity in the computer vision research community. Moreover, the advances in data capturing, storage, and communication technologies have made vast amounts of video data available to consumer and enterprise applications. This has naturally created a demand for video analysis research.

Video sequences typically consist of long-temporal objects - called events - which usually extend over tens or hundreds of frames. They provide useful cues for analysis of video information, including, eventbased video indexing, browsing, retrieval, clustering, segmentation, recognition, summarization, etc. The state-of-the-art techniques seldom use the event information inherent in videos for all these problems. They either simply recognize the events or use primitive features to address other video analysis issues. Furthermore, due to the large volume of video data we need efficient models to capture the essential content in the events. This involves removing the acceptable statistical variability across all the videos. These requirements create the need for learning-based approaches for video analysis.

In this thesis, we aim to address the video analysis problems by modelling and recognizing the dynamic events in them. We propose a model to learn efficient representation of events for analyzing continuous video sequences and demonstrate its applicability for summarizing them. Further, we observe that all parts of a video sequence may not be equally important for the classification task. Based on the characteristics of each part we compute its potential in influencing the decision criterion. Another observation we make is that, a feature set appropriate for one event may be completely irrelevant for another. Hence, an adaptive feature selection scheme is essential. We present an approach to learn an optimal combination of spatial and temporal based on the events being analyzed. Finally, we describe some of our work on unsupervised framework for video analysis.

 

Year of completion:  2006
 Advisor :

C. V. Jawahar & P. J. Narayanan


Related Publications

  • Karteek Alahari, Satya Lahari Putrevu and C.V. Jawahar - Learning Mixtures of Offline and Online Features for Handwritten Stroke Recognition, Proc. 18th IEEE International Conference on Pattern Recognition(ICPR'06), Hong Kong, Aug 2006, Vol. III, pp.379-382. [PDF]

  • Karteek Alahari and C.V. Jawahar - Dynamic Events as Mixtures of Spatial and Temporal Features, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.540-551, 2006. [PDF]

  • Karteek Alahari and C.V. Jawahar - Discriminative Actions for Recognising Events, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.552-563, 2006. [PDF]

  • Karteek Alahari, Satya Lahari P and C. V. Jawahar - Discriminant Substrokes for Online Handwriting Recognition, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 499-503. [PDF]

  • S. S. Ravi Kiran, Karteek Alahari and C. V. Jawahar, Recognizing Human Activities from Constituent Actions, Proceedings of the National Conference on Communications (NCC), Jan. 2005, Kharagpur, India, pp. 351-355. [PDF]

  • Karteek Alahari, Ravi Kiran Sarvadevabhatla, C. V. Jawahar - A Spatiotemporal Model for Recognizing Human Activities from Constituent Actions, in Pattern Recognition, Journal of Pattern Recognition Society. (submitted)

Downloads

thesis

 ppt

 

 

Analysis of Retinal Angiogram Images


B. R. Siva Chandra (homepage)

Diabetes is occuring in an ever increasing percentage of the human population. Though generally non- fatal, it can lead to diseases of other vital organs of the human body. Diabetic Retinopathy (DR) is one such disease which affects the human retina. If not treated in time, the affected patient can lose his/ her sight. With a growing number of patients affected with diabetes, the need is for fast and automatic com- puter aided tools which can aid in the diagnosis of DR. Currently, DR is diagnosed by a manual analysis of retinal angiogram images (RAIs). This process is tedious and depends on the subjective perception of the doctors and technicians. In this thesis, we propose a modular framework for computer aided analysis of RAIs which can be used to build analysis systems which can automatically detect diseases like the DR and assign an objective measure to the extent of the disease. The framework consists four independent modules: 1) The Pre-processing Module - For rectification of the problems and defects affecting a RAI; 2) The Structure Analysis Module - For extraction of the structure of the retina; 3) The Disease Analy- sis Module - For extracting the candidate regions affected by a particular disease; 4) The Classification Module - For classifying the candidate ‘disease-regions’ into true positives and false positives. Depend- ing on the desired output, one can choose to incorporate some or all of these modules into the analysis system.

Non-uniform illumination is a common problem affecting RAIs and needs to be addressed. A technique for correcting non-uniform illumination forms a part of the pre-processing module. In this thesis, a technique for illumination correction, which models the illumination effect as a multiplicative degradation, is presented.

The most important of the structural features of the retina are the blood vessels. Blood vessels can be detected by modeling them as topographic ridges. In this thesis, a novel curvature estimation technique is presented, using which a ridge detection algorithm is formulated for single scale as well as multiple scales.

DR leads to two different kinds of pathologies in the human retina. These are: a.) Microaneurysms, (MAs) and b.) Capillary Non-Perfusion (CNP). In this thesis, a novel curvature based technique for detection of MAs is presented. Likewise, a novel technique for segmentation of regions of CNP, from RAIs obtained using a laser camera, is presented. This segmentation technique uses a special property of the images obtained using a laser camera.

To showcase the proposed framework, a tool called the ‘CNP Analyser’that was developed is presented. This tool can detect the regions of CNP from RAIs obtained using a laser camera. The proposed illumination correction technique and the CNP segmentation technique are incorporated into this tool. A measure of the extent of CNP is derived using the percentage area of the regions of CNP.

 

Year of completion:  2005
 Advisor :

Jayanthi Sivaswamy


Related Publications


Downloads

thesis

 ppt

 

Geometric Grouping of Planar Patterns in a Perspective View


Kiran Varanasi (homepage)

When geometric primitives such as curves and patterns appear in repetition over a per- spective image, they offer key information for recovering the real metric structure of the scene. This happens because multiplicity is equivalent to motion - a single image of such a scene is equivalent to multiple images taken from varying camera viewpoints. Multiplicity can manifest in the image through several forms - tiling (translational symmetry), reflection (bilateral symmetry), or rotation (point symmetry). In all these cases, it pays dividends to group these patterns into geometrically meaningful sets, i.e, into sets of patterns which pro- duce a uniform geometric constraint. For example, a unique vanishing point. Each of these patterns is defined in terms of some interest points and contour segments. It is conceivable that these points and contours are ill-identified, and this makes the task of geometric group- ing all the more challenging. However, if only the patterns are grouped robustly, the later tasks of 3D reconstruction and the estimation of pose can be handled with high accuracy.

Symmetrical patterns are commonplace in natural and man-made environments. Espe- cially in architectural scenes, these patterns abound in all types of variety. Several attempts have been being made by the research community to exploit this information. Success has been reported in several areas, particularly in the area of image based modeling and ren- dering (IBMR). In this thesis, we study the problem of geometric grouping of patterns, from the context of an interactive IBMR application. We will demonstrate how geometric grouping with minimal user interaction is useful towards improving the robustness of struc- ture recovery. We handle the problem of geometric grouping of planar patterns in all its generality - we do not presume a known period of repetition or a known template for the patterns. The only properties which we use for grouping the patterns are the geometric constraints (such as the vanishing line and the circular points) - properties that would be es- timated by the patterns themselves. Our algorithms facilitate the aggregation of geometric information from multiple sources in the image, and thus make robust rectification possible.

The principal contributions of our work are on three fronts - (1) A method for identify- ing interest points on a set of poorly identified and badly fragmented image contours (2) A greedy optimization approach for computing point correspondences through preservingthe coherence of spatial information (3) Ways to incorporate the information provided by user-input into the optimization process. Below, we discuss each of these issues in brief. In order to have a generic mechanism for describing patterns, we need to have a method for detecting interest points on the patterns reliably. The color / intensity information may not hold important cues for corner detection if the saliency of the required points is primarily geometric. A pattern in our case is represented using a collection of image con- tours, which could be badly fragmented and erroneous. We note that local shape properties such as derivatives at a point would be damaged badly by the operation of perspective projection. However, global shape properties such as the relative distance of points from the center of mass of the contour would not be badly affected, though not guaranteed to be preserved. We shall use these global properties and guardedly compute a set of interest points. We apply a neighborhood of a given size and suppress interest points which are not maximal in their neighborhoods. In the thesis, we demonstrate through experiments that this method identifies interest points reliably on a perspectively distorted image of the pattern. We compare our results with those provided by the method of Shi-Tomasi.

After the detection of interest points, we study the problem of pairing two sets of points uniquely with each other. We provide an effective solution for this problem by an opti- mization approach which tries to maximize the spatial coherence subject to a consensus on geometric constraints. Our method stands in contrast to classical methods which try to match points through the use of geometric invariants. Instead, we try to satisfy spatial co- herence between the point matches - which means the conservation of Euclidean properties such as angle and ratio of lengths. Though it is true that these Euclidean properties are no longer valid after a projective transform, we show that they can be preserved in an approxi- mate sense. Unlike invariant-based methods, our method has the added advantage of being robust to noise and outliers. We frame the optimization problem in a greedy setting and thus obtain a locally maximal solution. This produces a matching between the two point sets. When the replication (multiplicity) of the patterns is more than two, they are handled through generalizing our solution to the entire set of points. The model is solved using Levenberg Marquardt optimization. This is similar to the bundle-adjustment algorithm in stereo.

We study the problem of geometric grouping in the context of an interactive application. One of the major concerns for such an application would be to minimize the level of user- interaction and also to be able to tolerate errors at the micro-level. Previous methods of IBMR through plane-based rectification have required the user to provide information at the pixel level, in the form of parallel and perpendicular line segments. This method is not only error-prone but also extremely taxing for the user to provide. In our application, we provide new ways for the user to interact with the system and new means of incorporating his user-input into the optimization process. We demonstrate that the user input is indeed useful in reducing the combinatorial complexity of the algorithm by a large factor.

The method of geometric grouping of patterns has several applications. We discuss the direct application of an interactive image-based modeler. We provide references to the other applications - tracking, recognition, stereo etc in the future work section.

 

Year of completion:  2006
 Advisor :

P. J. Narayanan


Related Publications

 

 


Downloads

thesis

 ppt

 

More Articles …

  1. GSWall: A Scalable Tiled-Display Wall
  2. Document Annotation and Retrieval Systems
  3. Towards Understanding Texture Processing
  4. Layer Extraction, Removal and Completion of Indoor Videos: A Tracking Based Approach
  • Start
  • Prev
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.