Thesis Students

Motion in Multiple Views

Sujit Kuthirummal

Mathematically, an image is the projection of the 3D world onto the 2D plane of the camera. This projection results in the loss of information present in the third dimension, popularly referred to as the depth or the z dimension. It is easy to see that a plurality of projections can compensate for this loss and this has led to the study of the geometry that underlies multiple views of the same scene.

Multiview analysis of scenes is an active area in Computer Vision today. The structure of points and lines as seen in two views attracted the attention of computer vision researchers like Longuet-Higgins and Oliver Faugeras in the eighties and early nineties. Similar studies on the underlying constraints in three or more views followed. The mathematical structure underlying multiple views has been analysed with respect to projective, affine, and euclidean frameworks of the world with amazing results. These multiview relations have been used for visual recognition of 3D objects under changing view positions, object tracking by means of image stabilization processing, view synthesis of 3D objects from 2D views of the same without recovering their 3D structure and many other applications.

Multiple view situations in Computer Vision have been analyzed with two objectives: to derive scene-independent constraints relating multiple views and to derive view-independent constraints relating multiple scene points. While the first approach seeks to model the configuration of cameras, the second attempts to characterize the configuration of points in the 3D world. The focus of our work is related to the second approach -- to derive constraints on the configuration of the points being imaged in a manner that does not depend on the viewpoint or imaging parameters. Non-rigid motion is difficult to analyze in this scheme. The case of multiple objects moving with different velocities or accelerations can be considered very close to the case of non-rigid motion. We have developed constraints on the projections of such point configurations which can be categorized into two classes: constraints that are time-dependent -- which are functions of time, and constraints that are time-independent -- which hold at every time instant.

Bennet and Hoffman showed that polynomials to characterize a configuration of stationary points in a view-independent manner can be constructed from 2 views of 4 points under orthographic projection. This was extended by Carlsson to the case of scaled orthographic projection using 2 views of 5 points. Shashua and Levin generalized this to the case of an affine projection model for a time-dependent view-independent constraint on the projections of 5 points moving with different but constant velocities.

We show that the projection of a point moving with constant velocity in the world moves with constant velocity in the image when the camera is affine. This result is used to formulate a view and time-independent constraint on the velocities of the projections of 4 points whose computation needs 2 views. This is a significant theoretical contribution as it needs fewer points and accommodates a more general imaging model. In the same manner, points moving with constant acceleration in the world move with constant acceleration in the image as well. We derive time-dependent and time-independent constraints on respectively the velocities and accelerations of the projections of 4 points. These view-independent constraints can be used to recognize a configuration of 4 moving points and also to align frames of synchronized videos.

Though, points moving with independent uniform velocities or accelerations model many non-rigid motion conditions, it is desirable to have a technique that accommodates general non-linear motion. We make the observation that as a point moves in the world it traces out a contour in the world and the trajectory traced out by its projection in an image would be the projection of the world contour. Thus, the contours traced out in different views would correspond. So the problem of analysing the non-rigid motion of a point can be transformed to the problem of analysing the contour traced out by its projections in various views. When the non-rigid motion in the world is restricted to a plane, that is when the motion is planar, the contours traced out in the views can be thought to be projections of a planar shape, the shape being the trajectory in the world. We discuss how planar shape recognition techniques can be used to recognize and analyse the contours.

We then combined the motion constraints with properties of a contour to devise a mechanism for recognizing a deforming contour, points on whose boundary move with independent uniform linear velocity or acceleration. Given two views of the deforming contour in a reference view, we can now recognize the same contour in any view at any time instant. We have also derived novel view-dependent parameterizations of the motion of the projections of points moving with uniform linear motion in the world, which enable us to arrive at view-independent constraints on the projections of points in motion that are simpler than the ones reported in literature.

Year of completion:	2003
Advisor :	C. V. Jawahar & P. J. Narayanan

Related Publications

Sujit Kuthirummal, C. V. Jawahar and P. J. Narayanan - Constraints on Coplanar Moving Points, Proceedings of the European Conference on Computer Vision (ECCV) LNCS 3024, May 2004, Prague, Czech Republic, pp. 168--179. [PDF]
Sujit Kuthirummal, C. V. Jawahar and P. J. Narayanan - Fourier Domain Representation of Planar Curves for Recognition in Multiple Views, Pattern Recognition, Vol. 37, No. 4, April 2004, pp. 739--754. [PDF]
Sujit Kuthirummal, C.V. Jawahar and P.J. Narayanan - Algebraic Constraints on Moving Points in Multiple Views, Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing(ICVGIP), Dec. 2002, Ahmedabad, India, pp. 311--316. [PDF]
Sujit Kuthirummal, C.V. Jawahar and P.J. Narayanan - Multiview Constraints for Recognition of Planar Curves in Fourier Domain, Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing(ICVGIP), Dec. 2002, Ahmedabad, India, pp. 323--328. [PDF]
Sujit Kuthirummal, C. V. Jawahar and P. J. Narayanan, Video Frame Alignment in Multiple Views, Proceedings of the International Conference on Image Processing(ICIP), Sep. 2002, Rochester, NY, Vol. 3, pp. 357--360. [PDF]
Sujit Kuthirummal, C. V. Jawahar and P. J. Narayanan, Planar Shape Recognition across Multiple Views, Proceedings of the International Conference on Pattern Recognition(ICPR), Aug. 2002, Quebec City, Canada, pp. 482--488. [PDF]
Sujit Kuthirummal, C. V. Jawahar and P.J. Narayanan - Frame Alignment using Multi View Constraints, Proceedings of the National Conference on Communications (NCC), Jan. 2002, Mumbai, India, pp. 523--527. [PDF]

Downloads

ppt

A Probabilistic Learning Approach for Modelling Human Activities

Ravi Kiran Sarvadevabhatla

An understanding of methods which impart computers with an ability to learn akin to humans forms the motivation for one of the most active disciplines of computer science - Machine Learning. Machine Learning draws on concepts and results from many fields such as artificial intelligence, statistics and information theory. One discipline which has benefited from machine learning techniques is Computer Vision. Computer Vision deals with the conversion of image(s) into descriptions. Quite often, statistical methods are used to understand image data using models constructed with the aid of geometry, physics and learning theory.

Real-world images are often characterized by uncertainty (`visual noise') that accompanies the data. Probability theory provides a proper framework to model this uncertainty. An attractive view of Computer Vision systems is to view them as models which ``learn'' descriptions while addressing uncertainty using probability. Therefore, machine learning algorithms such as Maximum-Likelihood estimation, Mixture Modelling, Expectation-Maximization, which are rooted in probabilistic reasoning, have becoming popular in Computer Vision. An emerging trend is towards graph based representations called Graphical Models. These representations model the dependencies embedded among visual data and address uncertainty via probabilistic inference.

The automatic deduction of the structure of a possibly dynamic 3-D world from 2-D images is an important problem in Computer Vision, particularly when the 2-D images contain people. There has been considerable progress in the areas of Computer Vision such as object recognition, image understanding and scene reconstruction from image(s). This progress, coupled with the improvements in computational power, has prompted a new research focus of making machines that can see people, recognize them and interpret their activities. This has spawned applications in various domains including surveillance, sign language recognition and Human-Computer Interaction ( HCI ).

In this thesis, a new framework to solve the problem of recognizing dynamic activities is presented, in the spirit of aforementioned probabilistic learning. An activity is defined as a finite spatio-temporal change in the state of an entity (e.g. a human being waving hands ). Activities, in turn, are composed of spatio-temporal units called actions (e.g. `sitting' and `standing up' are two predominant actions within the activity 'squatting') . Many human activities contain common actions which causes image data to be highly correlated and containing redundancies. Two dimensional image redundancies are routinely exploited in image processing algorithms. In video, an additional temporal redundancy exists due to smooth variation of the visual scene over time. Given a set of human activity videos, these redundancies are utilized to learn a compact representation for various actions and subsequently, activities. The spatial correlations are captured with a Mixture of Factor Analyzers (MFA) model. This is essentially a reduced dimensionality mixture-of-Gaussians graphical model which performs clustering of actions among the activity data in a low-dimensional fashion. The probabilistic structure underlying the activities is inferred using Expectation Maximization algorithm. The temporal aspect of each activities is stored as an action transition matrix. Given a hitherto unseen activity video, the embedded actions are extracted and recognition performed using probabilistic inference.

This new representation for human activities is intuitive, simple to learn and presents computational and theoretical advantages. Results on the recognition of various human form activities have been presented in this thesis. These highlight the suitability of the developed framework for applications involving whole body activity recognition, including real-time applications. However, the framework is not limited to whole body activities alone and can be used for recognizing hand gestures and other human activities. A discussion on the future directions for the proposed framework is also presented in this thesis.

Year of completion:	2004
Advisor :	C. V. Jawahar

Related Publications

S. S. Ravi Kiran, Karteek Alahari and C. V. Jawahar, Recognizing Human Activities from Constituent Actions, Proceedings of the National Conference on Communications (NCC), Jan. 2005, Kharagpur, India, pp. 351-355. [PDF]
C. V. Jawahar, MNSSK Pavan Kumar and S. S. Ravikiran - A Bilingual OCR system for Hindi-Telugu Documents and its Applications, Proceedings of the International Conference on Document Analysis and Recognition(ICDAR) Aug. 2003, Edinburgh, Scotland, pp. 408--413. [PDF]
MNSSK Pavan Kumar, S. S. Ravikiran, Abhishek Nayani, C. V. Jawahar and P. J. Narayanan - Tools for Developing OCRs for Indian Scripts, Proceedings of the Workshop on Document Image Analysis and Retrieval(DIAR:CVPR'03), Jun. 2003, Madison, WI. [PDF]

Downloads

ppt

Depth Image Representation for Image Based Rendering

Sashi Kumar Penta

Conventional approaches to render a scene require geometric description such as polygonal meshes, etc and appearance descriptions such as lights and material properties. To render high quality images from such descriptions, we require very accurate models and materials. Creating such accurate geometric models of real scenes is a difficult and time consuming problem. Image Based Rendering (IBR) holds a lot of promise for navigating through a real world scene without modeling it manually. Different representations have been proposed for IBR in the literature. In this thesis, we explore the Depth Image representation consisiting of depth maps and texture images from a number of viewpoints as a rich and viable representation for IBR.

We discuss various aspects of this Depth Image representation including its capture, representation, compression, and rendering. We present algorithms for efficient and smooth rendering of new views using the Depth Image representation. We show several examples of using the representation to model and render the complex scenes. We present a fast rendering algorithm using Depth Images on programmable GPUs. Compression of multiple images has attracted lot of attention in the past. Compression of multiple depth maps of the same scene has not been explored in the literature. We propose a method for compressing multiple depth maps in this paper using a geometric proxy. Different quality of rendering and compression ratios can be achieved by varying different parameters. Experiments show the effectiveness of the compression technique on several model data.

Year of completion:	2005
Advisor :	P. J. Narayanan

Related Publications

Sashi Kumar Penta and P. J. Narayanan, - Compression of Multiple Depth-Maps for IBR, The Visual Computer, International Journal of Computer Graphics, Vol. 21, No.8-10, September 2005, pp. 611--618. [PDF]
P. J. Narayanan, Sashi Kumar P and Sireesh Reddy K, Depth+Texture Representation for Image Based Rendering, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 113--118. [PDF]

Pooja Verlani, Aditi Goswami, P. J. Narayanan, Shekhar Dwivedi and Sashi Kumar Penta - Depth Images: Representations and Real-time Rendering in Symposium on 3D Data Processing, Visualization and Transmission, June 14-16, 2006 (3DPVT 2006).

Downloads

ppt

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification

MNSSK Pavan Kumar

Pattern recognition is an important area of research in information science with widespread application in image analysis, language processing, data mining, bioinformatics, etc. Most of the research in this area was centered around building efficient classifiers for recognition problems involving two classes of samples. However, most of the real world pattern recognition problems are not simple binary problems; they have multiple classes to recognize. It is only in the recent past that researchers have started to focus on building classifiers for multiple class classification. There are two popular directions for developing multiple class classifiers --- one is to extend the theory of binary classification directly to adopt to the multiple classes, and the other is to build large multiclass systems with efficient binary classifiers as components. The latter paradigm has received much attention and popularity due to its empirical success in various applications such as Optical Character Recognition, Biometrics, Data Mining etc. This Thesis focuses on the classifier combination paradigm to design efficient and accurate classifiers for multiclass recognition of large number of classes.

Decision Directed Acyclic Graph (DDAG) classifiers integrate a set of pairwise classifiers using a graph based architecture. Accuracy of the DDAG can be improved by appropriate design of the individual nodes. An optimal feature selection scheme is employed for improving the performance of the nodes. This feature selection scheme extracts separate set of features for each node, which increases the time taken for a sample to be classified. A novel approach for the computation of the condendsed representation of the large set of features using LDA followed by PCA is proposed to overcome this problem. Nodes with low performance are boosted using a popular boosting algorithm called Adaboost. Improvement in performance is demonstrated on Character Recognition and other datasets.

The arrangement of nodes in a DDAG is shown to affect the classification performance of the overall classifier. Popular DDAG algorithm is very accurate; however provides no more information than the class label. We modify this algorithm to handle a `reject-class' for improving the performance. The design of this DDAG is posed as an optimization problem in terms of misclassification probabilities. The problem being NP-Hard, approximate algorithms are proposed to design an improved DDAG. It is experimentally shown that the proposed algorithm provides a design close to the average case performance of the DDAG.

Although DDAGs are an attractive way to build multiclass classifiers, they suffer from the large number of component classifiers they employ. A Binary Hierarchical Classifier (BHC) is an alternate approach, which uses very few component classifiers unlike a DDAG. We propose a new scheme to divide the set of available classes into overlapping partitions so as to maximize the margin at each node of the BHC, and thereby improving the performance. BHC and DDAGs are the classifiers with complementary advantages. A DDAG has high accuracy but has high storage and classification time complexity. BHC is extremely efficient in storage, with a reasonably high accuracy. We exploit the advantages of both these classifiers by designing a new hybrid classifier. This employs binary partitions at most of the nodes where the classifications are relatively easy, and DDAGs are used for classifying a complex subset of classes wherever appropriate. The hybrid architecture is shown to perform better than the BHC, with a performance close to that of the DDAG, but with a great reduction in the number of nodes required compared to that of a complete DDAG.

This thesis presents a spectrum of classifier design algorithms for improving performance by feature selection, component classifier selection and combination architecture design. The problems are formulated in their generality, analyzed and the proposed solutions are are empirically evaluated on popular datasets. Experimental results demonstrate that these techniques are promising.

Year of completion:	2005
Advisor :	C. V. Jawahar

Related Publications

M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Design of Hierarchical Classifier with Hybrid Architectures, Proceedings of First International Conference on Pattern Recognition and Machine Intelligence(PReMI 2005) Kolkata, India. December 2005, pp 276-279. [PDF]
M. N. S. S. K. Pavan Kumar and C. V. Jawahar, Configurable Hybrid Architectures for Character Recognition Applications, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 1199-1203. [PDF]
MNSSK Pavan Kumar and C. V. Jawahar - On Improving Design of Multiclass Classifiers, Proceedings of the International Conference on Advances in Pattern Recognition(ICAPR), Dec. 2003, Calcutta, India, pp. 109--112. [PDF]
C. V. Jawahar, MNSSK Pavan Kumar and S. S. Ravikiran - A Bilingual OCR system for Hindi-Telugu Documents and its Applications, Proceedings of the International Conference on Document Analysis and Recognition(ICDAR) Aug. 2003, Edinburgh, Scotland, pp. 408--413. [PDF]

Downloads

Modelling and Recognition of Dynamic Events in Video

Karteek Alahari

Computer Vision algorithms, which mainly focussed on analyzing image data till the early 1980's, have now matured to handle video data more efficiently. In the past, computational barriers have limited the complexity of video processing applications. As a consequence, most systems were either too slow to be practical, or succeeded by restricting themselves to very controlled situations. With the availability of faster computing resources over the past couple of decades, video processing applications have gained popularity in the computer vision research community. Moreover, the advances in data capturing, storage, and communication technologies have made vast amounts of video data available to consumer and enterprise applications. This has naturally created a demand for video analysis research.

Video sequences typically consist of long-temporal objects - called events - which usually extend over tens or hundreds of frames. They provide useful cues for analysis of video information, including, eventbased video indexing, browsing, retrieval, clustering, segmentation, recognition, summarization, etc. The state-of-the-art techniques seldom use the event information inherent in videos for all these problems. They either simply recognize the events or use primitive features to address other video analysis issues. Furthermore, due to the large volume of video data we need efficient models to capture the essential content in the events. This involves removing the acceptable statistical variability across all the videos. These requirements create the need for learning-based approaches for video analysis.

In this thesis, we aim to address the video analysis problems by modelling and recognizing the dynamic events in them. We propose a model to learn efficient representation of events for analyzing continuous video sequences and demonstrate its applicability for summarizing them. Further, we observe that all parts of a video sequence may not be equally important for the classification task. Based on the characteristics of each part we compute its potential in influencing the decision criterion. Another observation we make is that, a feature set appropriate for one event may be completely irrelevant for another. Hence, an adaptive feature selection scheme is essential. We present an approach to learn an optimal combination of spatial and temporal based on the events being analyzed. Finally, we describe some of our work on unsupervised framework for video analysis.

Year of completion:	2006
Advisor :	C. V. Jawahar & P. J. Narayanan

Related Publications

Karteek Alahari, Satya Lahari Putrevu and C.V. Jawahar - Learning Mixtures of Offline and Online Features for Handwritten Stroke Recognition, Proc. 18th IEEE International Conference on Pattern Recognition(ICPR'06), Hong Kong, Aug 2006, Vol. III, pp.379-382. [PDF]
Karteek Alahari and C.V. Jawahar - Dynamic Events as Mixtures of Spatial and Temporal Features, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.540-551, 2006. [PDF]
Karteek Alahari and C.V. Jawahar - Discriminative Actions for Recognising Events, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.552-563, 2006. [PDF]
Karteek Alahari, Satya Lahari P and C. V. Jawahar - Discriminant Substrokes for Online Handwriting Recognition, Proceedings of Eighth International Conference on Document Analysis and Recognition(ICDAR), Seoul, Korea 2005, Vol 1, pp 499-503. [PDF]
S. S. Ravi Kiran, Karteek Alahari and C. V. Jawahar, Recognizing Human Activities from Constituent Actions, Proceedings of the National Conference on Communications (NCC), Jan. 2005, Kharagpur, India, pp. 351-355. [PDF]

Karteek Alahari, Ravi Kiran Sarvadevabhatla, C. V. Jawahar - A Spatiotemporal Model for Recognizing Human Activities from Constituent Actions, in Pattern Recognition, Journal of Pattern Recognition Society. (submitted)

Downloads

ppt

Motion in Multiple Views

Related Publications

Downloads

A Probabilistic Learning Approach for Modelling Human Activities

Related Publications

Downloads

Depth Image Representation for Image Based Rendering

Related Publications

Downloads

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification

Related Publications

Downloads

Modelling and Recognition of Dynamic Events in Video

Related Publications

Downloads

More Articles …