CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

A Probabilistic Learning Approach for Modelling Human Activities


Ravi Kiran Sarvadevabhatla

An understanding of methods which impart computers with an ability to learn akin to humans forms the motivation for one of the most active disciplines of computer science - Machine Learning. Machine Learning draws on concepts and results from many fields such as artificial intelligence, statistics and information theory. One discipline which has benefited from machine learning techniques is Computer Vision. Computer Vision deals with the conversion of image(s) into descriptions. Quite often, statistical methods are used to understand image data using models constructed with the aid of geometry, physics and learning theory.

Real-world images are often characterized by uncertainty (`visual noise') that accompanies the data. Probability theory provides a proper framework to model this uncertainty. An attractive view of Computer Vision systems is to view them as models which ``learn'' descriptions while addressing uncertainty using probability. Therefore, machine learning algorithms such as Maximum-Likelihood estimation, Mixture Modelling, Expectation-Maximization, which are rooted in probabilistic reasoning, have becoming popular in Computer Vision. An emerging trend is towards graph based representations called Graphical Models. These representations model the dependencies embedded among visual data and address uncertainty via probabilistic inference.

The automatic deduction of the structure of a possibly dynamic 3-D world from 2-D images is an important problem in Computer Vision, particularly when the 2-D images contain people. There has been considerable progress in the areas of Computer Vision such as object recognition, image understanding and scene reconstruction from image(s). This progress, coupled with the improvements in computational power, has prompted a new research focus of making machines that can see people, recognize them and interpret their activities. This has spawned applications in various domains including surveillance, sign language recognition and Human-Computer Interaction ( HCI ).

In this thesis, a new framework to solve the problem of recognizing dynamic activities is presented, in the spirit of aforementioned probabilistic learning. An activity is defined as a finite spatio-temporal change in the state of an entity (e.g. a human being waving hands ). Activities, in turn, are composed of spatio-temporal units called actions (e.g. `sitting' and `standing up' are two predominant actions within the activity 'squatting') . Many human activities contain common actions which causes image data to be highly correlated and containing redundancies. Two dimensional image redundancies are routinely exploited in image processing algorithms. In video, an additional temporal redundancy exists due to smooth variation of the visual scene over time. Given a set of human activity videos, these redundancies are utilized to learn a compact representation for various actions and subsequently, activities. The spatial correlations are captured with a Mixture of Factor Analyzers (MFA) model. This is essentially a reduced dimensionality mixture-of-Gaussians graphical model which performs clustering of actions among the activity data in a low-dimensional fashion. The probabilistic structure underlying the activities is inferred using Expectation Maximization algorithm. The temporal aspect of each activities is stored as an action transition matrix. Given a hitherto unseen activity video, the embedded actions are extracted and recognition performed using probabilistic inference.

This new representation for human activities is intuitive, simple to learn and presents computational and theoretical advantages. Results on the recognition of various human form activities have been presented in this thesis. These highlight the suitability of the developed framework for applications involving whole body activity recognition, including real-time applications. However, the framework is not limited to whole body activities alone and can be used for recognizing hand gestures and other human activities. A discussion on the future directions for the proposed framework is also presented in this thesis.

 

Year of completion:  2004
 Advisor :

C. V. Jawahar


Related Publications

  • S. S. Ravi Kiran, Karteek Alahari and C. V. Jawahar, Recognizing Human Activities from Constituent Actions, Proceedings of the National Conference on Communications (NCC), Jan. 2005, Kharagpur, India, pp. 351-355. [PDF]

  • C. V. Jawahar, MNSSK Pavan Kumar and S. S. Ravikiran - A Bilingual OCR system for Hindi-Telugu Documents and its Applications, Proceedings of the International Conference on Document Analysis and Recognition(ICDAR) Aug. 2003, Edinburgh, Scotland, pp. 408--413. [PDF]

  • MNSSK Pavan Kumar, S. S. Ravikiran, Abhishek Nayani, C. V. Jawahar and P. J. Narayanan - Tools for Developing OCRs for Indian Scripts, Proceedings of the Workshop on Document Image Analysis and Retrieval(DIAR:CVPR'03), Jun. 2003, Madison, WI. [PDF]

 


Downloads

thesis

 ppt

 

Depth Image Representation for Image Based Rendering


Sashi Kumar Penta

Conventional approaches to render a scene require geometric description such as polygonal meshes, etc and appearance descriptions such as lights and material properties. To render high quality images from such descriptions, we require very accurate models and materials. Creating such accurate geometric models of real scenes is a difficult and time consuming problem. Image Based Rendering (IBR) holds a lot of promise for navigating through a real world scene without modeling it manually. Different representations have been proposed for IBR in the literature. In this thesis, we explore the Depth Image representation consisiting of depth maps and texture images from a number of viewpoints as a rich and viable representation for IBR.

main

We discuss various aspects of this Depth Image representation including its capture, representation, compression, and rendering. We present algorithms for efficient and smooth rendering of new views using the Depth Image representation. We show several examples of using the representation to model and render the complex scenes. We present a fast rendering algorithm using Depth Images on programmable GPUs. Compression of multiple images has attracted lot of attention in the past. Compression of multiple depth maps of the same scene has not been explored in the literature. We propose a method for compressing multiple depth maps in this paper using a geometric proxy. Different quality of rendering and compression ratios can be achieved by varying different parameters. Experiments show the effectiveness of the compression technique on several model data.

 

Year of completion:  2005
 Advisor :

P. J. Narayanan


Related Publications

  • Sashi Kumar Penta and P. J. Narayanan, - Compression of Multiple Depth-Maps for IBR, The Visual Computer, International Journal of Computer Graphics, Vol. 21, No.8-10, September 2005, pp. 611--618. [PDF]

  • P. J. Narayanan, Sashi Kumar P and Sireesh Reddy K, Depth+Texture Representation for Image Based Rendering, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 113--118. [PDF]

  • Pooja Verlani, Aditi Goswami, P. J. Narayanan, Shekhar Dwivedi and Sashi Kumar Penta - Depth Images: Representations and Real-time Rendering in Symposium on 3D Data Processing, Visualization and Transmission, June 14-16, 2006 (3DPVT 2006).


Downloads

thesis

 ppt

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification


MNSSK Pavan Kumar

Pattern recognition is an important area of research in information science with widespread application in image analysis, language processing, data mining, bioinformatics, etc. Most of the research in this area was centered around building efficient classifiers for recognition problems involving two classes of samples. However, most of the real world pattern recognition problems are not simple binary problems; they have multiple classes to recognize. It is only in the recent past that researchers have started to focus on building classifiers for multiple class classification. There are two popular directions for developing multiple class classifiers --- one is to extend the theory of binary classification directly to adopt to the multiple classes, and the other is to build large multiclass systems with efficient binary classifiers as components. The latter paradigm has received much attention and popularity due to its empirical success in various applications such as Optical Character Recognition, Biometrics, Data Mining etc. This Thesis focuses on the classifier combination paradigm to design efficient and accurate classifiers for multiclass recognition of large number of classes.

Decision Directed Acyclic Graph (DDAG) classifiers integrate a set of pairwise classifiers using a graph based architecture. Accuracy of the DDAG can be improved by appropriate design of the individual nodes. An optimal feature selection scheme is employed for improving the performance of the nodes. This feature selection scheme extracts separate set of features for each node, which increases the time taken for a sample to be classified. A novel approach for the computation of the condendsed representation of the large set of features using LDA followed by PCA is proposed to overcome this problem. Nodes with low performance are boosted using a popular boosting algorithm called Adaboost. Improvement in performance is demonstrated on Character Recognition and other datasets.

The arrangement of nodes in a DDAG is shown to affect the classification performance of the overall classifier. Popular DDAG algorithm is very accurate; however provides no more information than the class label. We modify this algorithm to handle a `reject-class' for improving the performance. The design of this DDAG is posed as an optimization problem in terms of misclassification probabilities. The problem being NP-Hard, approximate algorithms are proposed to design an improved DDAG. It is experimentally shown that the proposed algorithm provides a design close to the average case performance of the DDAG.

Although DDAGs are an attractive way to build multiclass classifiers, they suffer from the large number of component classifiers they employ. A Binary Hierarchical Classifier (BHC) is an alternate approach, which uses very few component classifiers unlike a DDAG. We propose a new scheme to divide the set of available classes into overlapping partitions so as to maximize the margin at each node of the BHC, and thereby improving the performance. BHC and DDAGs are the classifiers with complementary advantages. A DDAG has high accuracy but has high storage and classification time complexity. BHC is extremely efficient in storage, with a reasonably high accuracy. We exploit the advantages of both these classifiers by designing a new hybrid classifier. This employs binary partitions at most of the nodes where the classifications are relatively easy, and DDAGs are used for classifying a complex subset of classes wherever appropriate. The hybrid architecture is shown to perform better than the BHC, with a performance close to that of the DDAG, but with a great reduction in the number of nodes required compared to that of a complete DDAG.

This thesis presents a spectrum of classifier design algorithms for improving performance by feature selection, component classifier selection and combination architecture design. The problems are formulated in their generality, analyzed and the proposed solutions are are empirically evaluated on popular datasets. Experimental results demonstrate that these techniques are promising.

 

Year of completion:  2005
 Advisor :

C. V. Jawahar


Related Publications

  • M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Design of Hierarchical Classifier with Hybrid Architectures, Proceedings of First International Conference on Pattern Recognition and Machine Intelligence(PReMI 2005) Kolkata, India. December 2005, pp 276-279. [PDF]

  • M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Configurable Hybrid Architectures for Character Recognition Applications, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 1199-1203. [PDF]

  • MNSSK Pavan Kumar and C. V. Jawahar - On Improving Design of Multiclass Classifiers, Proceedings of the International Conference on Advances in Pattern Recognition(ICAPR), Dec. 2003, Calcutta, India, pp. 109--112. [PDF]

  • C. V. Jawahar, MNSSK Pavan Kumar and S. S. Ravikiran - A Bilingual OCR system for Hindi-Telugu Documents and its Applications, Proceedings of the International Conference on Document Analysis and Recognition(ICDAR) Aug. 2003, Edinburgh, Scotland, pp. 408--413. [PDF]


Downloads

thesis

 ppt

 

Modelling and Recognition of Dynamic Events in Video


Karteek Alahari

Computer Vision algorithms, which mainly focussed on analyzing image data till the early 1980's, have now matured to handle video data more efficiently. In the past, computational barriers have limited the complexity of video processing applications. As a consequence, most systems were either too slow to be practical, or succeeded by restricting themselves to very controlled situations. With the availability of faster computing resources over the past couple of decades, video processing applications have gained popularity in the computer vision research community. Moreover, the advances in data capturing, storage, and communication technologies have made vast amounts of video data available to consumer and enterprise applications. This has naturally created a demand for video analysis research.

Video sequences typically consist of long-temporal objects - called events - which usually extend over tens or hundreds of frames. They provide useful cues for analysis of video information, including, eventbased video indexing, browsing, retrieval, clustering, segmentation, recognition, summarization, etc. The state-of-the-art techniques seldom use the event information inherent in videos for all these problems. They either simply recognize the events or use primitive features to address other video analysis issues. Furthermore, due to the large volume of video data we need efficient models to capture the essential content in the events. This involves removing the acceptable statistical variability across all the videos. These requirements create the need for learning-based approaches for video analysis.

In this thesis, we aim to address the video analysis problems by modelling and recognizing the dynamic events in them. We propose a model to learn efficient representation of events for analyzing continuous video sequences and demonstrate its applicability for summarizing them. Further, we observe that all parts of a video sequence may not be equally important for the classification task. Based on the characteristics of each part we compute its potential in influencing the decision criterion. Another observation we make is that, a feature set appropriate for one event may be completely irrelevant for another. Hence, an adaptive feature selection scheme is essential. We present an approach to learn an optimal combination of spatial and temporal based on the events being analyzed. Finally, we describe some of our work on unsupervised framework for video analysis.

 

Year of completion:  2006
 Advisor :

C. V. Jawahar & P. J. Narayanan


Related Publications

  • Karteek Alahari, Satya Lahari Putrevu and C.V. Jawahar - Learning Mixtures of Offline and Online Features for Handwritten Stroke Recognition, Proc. 18th IEEE International Conference on Pattern Recognition(ICPR'06), Hong Kong, Aug 2006, Vol. III, pp.379-382. [PDF]

  • Karteek Alahari and C.V. Jawahar - Dynamic Events as Mixtures of Spatial and Temporal Features, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.540-551, 2006. [PDF]

  • Karteek Alahari and C.V. Jawahar - Discriminative Actions for Recognising Events, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.552-563, 2006. [PDF]

  • Karteek Alahari, Satya Lahari P and C. V. Jawahar - Discriminant Substrokes for Online Handwriting Recognition, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 499-503. [PDF]

  • S. S. Ravi Kiran, Karteek Alahari and C. V. Jawahar, Recognizing Human Activities from Constituent Actions, Proceedings of the National Conference on Communications (NCC), Jan. 2005, Kharagpur, India, pp. 351-355. [PDF]

  • Karteek Alahari, Ravi Kiran Sarvadevabhatla, C. V. Jawahar - A Spatiotemporal Model for Recognizing Human Activities from Constituent Actions, in Pattern Recognition, Journal of Pattern Recognition Society. (submitted)

Downloads

thesis

 ppt

 

 

Analysis of Retinal Angiogram Images


B. R. Siva Chandra (homepage)

Diabetes is occuring in an ever increasing percentage of the human population. Though generally non- fatal, it can lead to diseases of other vital organs of the human body. Diabetic Retinopathy (DR) is one such disease which affects the human retina. If not treated in time, the affected patient can lose his/ her sight. With a growing number of patients affected with diabetes, the need is for fast and automatic com- puter aided tools which can aid in the diagnosis of DR. Currently, DR is diagnosed by a manual analysis of retinal angiogram images (RAIs). This process is tedious and depends on the subjective perception of the doctors and technicians. In this thesis, we propose a modular framework for computer aided analysis of RAIs which can be used to build analysis systems which can automatically detect diseases like the DR and assign an objective measure to the extent of the disease. The framework consists four independent modules: 1) The Pre-processing Module - For rectification of the problems and defects affecting a RAI; 2) The Structure Analysis Module - For extraction of the structure of the retina; 3) The Disease Analy- sis Module - For extracting the candidate regions affected by a particular disease; 4) The Classification Module - For classifying the candidate ‘disease-regions’ into true positives and false positives. Depend- ing on the desired output, one can choose to incorporate some or all of these modules into the analysis system.

Non-uniform illumination is a common problem affecting RAIs and needs to be addressed. A technique for correcting non-uniform illumination forms a part of the pre-processing module. In this thesis, a technique for illumination correction, which models the illumination effect as a multiplicative degradation, is presented.

The most important of the structural features of the retina are the blood vessels. Blood vessels can be detected by modeling them as topographic ridges. In this thesis, a novel curvature estimation technique is presented, using which a ridge detection algorithm is formulated for single scale as well as multiple scales.

DR leads to two different kinds of pathologies in the human retina. These are: a.) Microaneurysms, (MAs) and b.) Capillary Non-Perfusion (CNP). In this thesis, a novel curvature based technique for detection of MAs is presented. Likewise, a novel technique for segmentation of regions of CNP, from RAIs obtained using a laser camera, is presented. This segmentation technique uses a special property of the images obtained using a laser camera.

To showcase the proposed framework, a tool called the ‘CNP Analyser’that was developed is presented. This tool can detect the regions of CNP from RAIs obtained using a laser camera. The proposed illumination correction technique and the CNP segmentation technique are incorporated into this tool. A measure of the extent of CNP is derived using the percentage area of the regions of CNP.

 

Year of completion:  2005
 Advisor :

Jayanthi Sivaswamy


Related Publications


Downloads

thesis

 ppt

 

More Articles …

  1. Geometric Grouping of Planar Patterns in a Perspective View
  2. GSWall: A Scalable Tiled-Display Wall
  3. Document Annotation and Retrieval Systems
  4. Towards Understanding Texture Processing
  • Start
  • Prev
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.