Thesis Students

Kernel Methods and Factorization for Image and Video Analysis

Ranjeeth Kumar Dasineni (homepage)

Image and Video Analysis is one of the most active research areas in computer science with a large number of applications in security, surveillance, broadcast video processing etc. Prior to the past two decades, the primary focus in this domain was on efficient processing of image and video data. However, with the increase in computational power and advancements in Machine Learning, the focus has shifted to a wide range of other problems. Machine learning techniques have been widely used to perform higher level tasks such as recognizing faces from images, facial expression analysis in videos, printed document recognition and video understanding which require extensive analysis of data. The field of Machine Learning itself, witnessed the evolution of Kernel Methods as a principled and efficient approach to analyze nonlinear relationships in the data. The new algorithms are computationally efficient and statistically stable. This is in stark contrast with the previous methods used for nonlinear problems, such as neural networks and decision trees, which often suffered from overfitting and computational expense. In addition, kernel methods provide a natural way to treat heterogeneous data (like categorical data, graphs and sequences) under a unified framework. These advantages led to their immense popularity in many fields, such as computer vision, data mining and bioinformatics. In computer vision, the use of kernel methods such as support vector machine, kernel principal component analysis and kernel discriminant analysis resulted in remarkable improvements in performance at tasks such as classification, recognition and feature extraction. Like Kernel Methods, Factorization techniques enabled elegant solutions to many problems in computer vision such as eliminating redundancy in representation of data and analysis of their generative processes. Structure from Motion and Eigen Faces for feature extraction are examples of successful applications of factorization in vision. However, factorization, so far, has been used on the traditional matrix representation of image collection and videos. This representation fails to completely exploit the structure in 2D images as each image is represented using a single 1D vector. Tensors are more natural representations for such data and recently gained wide attention in computer vision. Factorization becomes an even more useful tool with such representations.

While both Kernel Methods and Factorization both aid in analysis of the data and detection of inherent regularities, they do so in orthogonal manner. The central idea in kernel methods is to work with new sets of features derived from the input set of features. Factorization, on the other hand, operates by eliminating redundant or irrelevant information. Thus, they form a complementary set of tools to analyze data. This thesis addresses the problem of effective manipulation of dimensionality of representation of visual data, using these tools, for solving problems in image analysis. The purpose of this thesis is three fold: i) Demonstrating useful applications of kernel methods to problems in image analysis. New kernel algorithms are developed for feature selection and time series modeling. These are used for biometric authentication using weak features, planar shape recognition and handwritten character recognition. ii) Using the tensor representation and factorization of tensors to solve challenging problems in facial video analysis. These are used to develop simple and efficient methods to perform expression transfer, expression recognition and face morphing. iii) Investigating and demonstrating the complementary nature of Kernelization and Factorization and how they can be used together for analysis of the data. (more...)

Year of completion:	December 2007
Advisor :	C. V. Jawahar

Related Publications

Ranjeeth Kumar and C.V. Jawahar - Kernel Approach to Autoregressive Modeling Proc. of The Thirteen National Conference on Communications(NCC 2007), Kanpur, 2007 . [PDF]
Ranjeeth Kumar and C.V. Jawahar - Class-Specific Kernel Selection for Verification Problems Proc. of The Six International Conference on Advances in Pattern Recognition(ICAPR 2007), Kolkatta, 2007. [PDF]
S. Manikandan, Ranjeeth Kumar and C.V. Jawahar - Tensorial Factorization Methods for Manipulation of Face Videos, The 3rd International Conference on Visual Information Engineering 26-28 September 2006 in Bangalore, India. [PDF]
Ranjeeth Kumar, S. Manikandan and C. V. Jawahar - Task Specific Factors for Video Characterization, 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.376-387, 2006. [PDF]

Ranjeeth Kumar, S. Manikandan and C. V. Jawahar - Face Video Alteration Using Tensorial Methods, in Pattern Recognition, Journal of Pattern Recognition Society (Submitted)

Downloads

ppt

Imaging and Depth Estimation in an Optimization Framework

Avinash Kumar (homepage)

Computer Vision is the process of obtaining information about a scene by processing the images of the scene. This reverse process can be mathematically formulated in terms of various unknown variables. Given the images, these variables will satisfy some constraints. For example the variables could be the intensity values at each pixel location in an image and the constraint being that these values have to lie between $0$ and $255$. An objective function can be formed in terms of variable and constraints. This function has the property that a set of variables which will minimize this function i.e. the global minima of this function, will be the desired solution. Often in Computer Vision, the number of possible solutions is large due to which the objective function has a number of local minima. The exhaustive search for global minima becomes computationally intensive. This thus becomes a Combinatorial Optimization problem in vision. Recently, a number of techniques based on Minimum Cut on Graphs have been proposed which are fast and efficeint. They lead to a polynomial time approximately global minima. The solutions obtained using these techniques on benchmark problems have performed better than optimization techniques developed earlier. This has led to renewed interest in the field of Optimization in computer vision in recent times. For a vision problem, obtaining a global minima using efficient optimization methods is not enough if it does not correspond to the desired solution. The formulation of an accurate objective function is also an important aspect. In this thesis, we have first proposed new objective functions for problems in Imaging and Depth estimation from images and then formulated them as optimization problems. The two main problems of imaging which have been addressed are Omnifocus Imaging and Background Subtraction. Omnifocus imaging is important as it helps to generate an image which has a large depth of field which means that everything being imaged is in focus. This is critical to many high level vision problems like object recognition which require better quality sharp images. The input to this problem is a set of images which are focussed at different depths. These images are called as multifocus images and are captured from a Non-Frontal Imaging Camera (NICAM). A new calibration technique for calibrating this camera is also proposed. This helps in registration of input images from NICAM. A Focus measure in omnifocus imaging finds the best focussed image from the set of multifocus input images. We have proposed a new Generative focus measure in this thesis. The removal of Background from images is another imaging technique where the unwanted regions in the image are removed. We have proposed a new objective function for background removal for the machine vision problem of monitoring Intermodal Freight trains. The objective function is optimized using Graph Cuts based optimization. We have also developed another techniques based on finding Edges in an image and Gaussian Mixture Modeling for background removal in such videos. In Depth estimation, the thesis has proposed range estimation from images generated from a NICAM in an optimization framework. The constraint used in this framework is that the nearby points in the three dimensional world have got the same depth. Thus the proposed optimization framework allows for accurate and smooth depth estimation in real world scenes. We show results on real data sets.

Year of completion:	2007
Advisor :	C. V. Jawahar

Related Publications

Y.C. Lal, C.P.I. Barkan, J. Drapa, N. Ahuja, J.M. Hart, P. J. Narayanan, C. V. Jawahar, A. Kmar, L.R. Milhon and M. Stehly - Machine vision analysis of the energy efficiency of intermodal freight trains Proceeding of the Institution Mechanical Engineers, Part F: Journal of Rail and Rapid Transit, ISSN 09540-4097, Volume 221, Number 3/2007, Pages 353-364, 2007. [PDF]
Avinash Kumar, Narendra Ahuja, John M. Hart, U.K.Visesh, P.J. Narayanan and C.V. Jawahar - A Vision System for Monitoring Intermodal Freight Trains Proc. of IEEE Workshop on Applications of Computer Vision(WACV 2007), Austin, Texas, USA, 2007. [PDF]

Downloads

Automatic Writer Identification and Verification using Online Handwriting

Sachin Gupta (homepage)

Automatic person identification is one of the major concerns in this era of automation. However, this is not a new problem and our society has adopted several different ways to authenticate the identity of a person such as signature and possessing a document. With the advent of electronic communication media (Internet), the interactions are becoming more and more automatic and thus the problem of identity theft has became even more severe. Even, the traditional modes of person authentication systems such as Possessions and Knowledge are not able to solve this problem. Possessions include physical possessions such as keys, passports, and smart cards. Knowledge is a piece of information that is memorized, such as a password and is supposed to be kept a secret. Knowledge and possession based methods are more focused on "what you know" or "what you possess" rather than "who you are". Due to inability of knowledge and possession based authentication methods to handle the security concerns, biometrics research have gained significant momentum in the last decade as the security concerns are increasing due to increasing automation of every field. Biometrics refers to authentication of a person using a physiological and behavioral trait of the individual that distinguish him from others. Biometric authentication has various advantages over knowledge and possession based identification methods including ease of use and non repudiation. In this thesis, we address the problem of handwriting biometrics. Handwriting is a behavioral biometric as it is generated as the consequence of an action performed by a person. Handwriting identification also has a long history. Signature (a specific instance of handwriting) has been used for authentication of legal documents for a long time.

This thesis addresses the various problems related to automatic handwriting identification. Most of the writer identification work is being done manually till date as a lot of context dependent information, such as, source of documents, nature of handwriting, etc. is difficult to model mathematically. However, they can be easily analyzed by human experts. Still, an automatic handwriting analysis system is useful as it can remove subjectivity from the process of handwriting identification and can be used for expert advice in various court cases. The final aim of this research is to design efficient algorithms for automatic feature extraction and recognition of the writer from a given handwritten document with as less human intervention as possible.

Specifically, we propose efficient solutions to three different applications of handwriting identification. First we look at the problem of determining the authorship of an arbitrary piece of online handwritten text. We then analyze the discriminative information from online handwriting to propose an efficient and accurate approach for text-dependent writer verification for practical and low security applications. We also look at the problem of repudiation in handwritten documents for forensic document examination. After introducing the problem of repudiation in handwritten documents, we propose an algorithm for repudiation detection in the handwritten documents. Handwriting identification is quite different from handwriting recognition; the other popular sub-field of automatic handwriting analysis. Handwriting recognition tries to identify the content of a handwritten text and tries to minimize variations due to writing style. On the other hand, in the case of handwriting identification, variations due to style is sought out.

Year of completion:	February 2008
Advisor :	Anoop M. Namboodiri

Related Publications

Sachin Gupta and Anoop M. Namboodiri - Text-Dependent Writer Verification Using Boosting Proceedings of International Conference of Frontiers in Handwriting Recognition, Montreal, Canada, 2008 [PDF]
Sachin Gupta and Anoop M. Namboodiri - Repudiation Detection in Handwritten Documents Proc of The 2nd International Conference on Biometics (ICB'07), PP. 356-365 Seoul, Korea, 27-29 August, 2007. [PDF]
Anoop M. Namboodiri and Sachin Gupta - Text Independent Writer Identification from Online Handwriting, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congreee Atlantia, France. [PDF]

Downloads

Word Hashing for Efficient Search In Document Image Collections

Anand Kumar

A large numbers of document image collections are now being scanned and made available over the Internet or in digital braries. Effective access to such information sources is limited by the lack of efficient retrieval schemes. The use of text search methods requires efficient and robust optical character recognizers (OCR), which are presently unavailable for Indian languages. Word spotting - word image matching - may instead be used to retrieve word images in response to a word image query. The approaches used for word spotting so far, dynamic time warping and/or nearest neighbor search tend to be slow for large collection of books. Direct matching of images is inefficient due to the complexity of matching and thus impractical for large databases. In general, indexing and retrieval methods for document images cluster similar words and build indexes with the representatives of the clusters. The time required for building such a clustering based index is very high. Such indexing methods are time inefficient with the use of complex image matching procedures required in the clustering step. This problem is solved by directly hashing word image representations.

An efficient mechanism for indexing and retrieval in large document image collections is presented in this thesis. First, document images are segmented to get words. Then features are computed at word level and indexed. Word retrieval is done very efficiently with \emph{content-sensitive hashing} (CSH), which uses an approximate nearest neighbor search technique called locality sensitive hashing (LSH). The word images are hashed into hash tables using features computed at word level. Content-sensitive hash functions are used to hash words such that the probability of grouping similar words in the same index of the hash table is high. The sub-linear time CSH scheme makes the search very fast without degrading accuracy. Experiments on a collection of Kalidasa's - the classical Indian poet of antiquity - books in Telugu demonstrate that the word images may be searched in a few milliseconds. The approach thus makes searching document image collections practical. The search time is reduced significantly by hashing the words. (more...)

Year of completion:	2008
Advisor :	C. V. Jawahar & R. Manmatha

Related Publications

Anand Kumar, C.V. Jawahar & R. Manmatha - Efficient Search in Document Image Collections Proceedings of 8th Asian Conference on Computer Vision (ACCV'07),Part I, LNCS 4843, pp. 586.595 Tokyo Japan, 18-22 November, 2007. [PDF]
C.V. Jawahar and Anand Kumar - Content-level Annotation of Large Collection of Printed Document Images Proc of 9th International Conference on Document Analysis and Recognition, Brazil, 23-26 September, 2007. [PDF]
Anand Kumar, A. Balasubramanian, Anoop M. Namboodiri and C.V. Jawahar - Model-Based Annotation of Online Handwritten Datasets, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congreee Atlantia, France. [PDF]

Downloads

Proxy Based Compression of Depth Movies

Pooja Verlani (homepage)

Sensors for 3D data are common today. These include multicamera systems, laser range scan- ners, etc. Some of them are suitable for the real-time capture of the shape and appearance of dynamic events. The 2-1/2 D model of aligned depth map and image, called a Depth Image, has been 1 popular for Image Based Modeling and Rendering (IBMR). Capturing the 2-1/2D geometric structure and photometric appearance of dynamic scenes is possible today. Time varying depth and image sequences, called Depth Movies, can extend IBMR to dynamic events. The captured event con- tains aligned sequences of depth maps and textures and are often streamed to a distant location for immersive viewing. The applications of such systems include virtual-space tele-conferencing, remote 3D immersion, 3D entertainment, etc. We study a client-server model for tele-immersion where captured or stored depth movies from a server is sent to multiple, remote clients on demand. Depth movies consist of dynamic depth maps and texture maps. Multiview image compression and video compression have been studied earlier, but there has been no study about dynamic depth map compression. This thesis contributes towards dynamic depth map compression for efficient transmission in a server-client 3D teleimmersive environment. The dynamic depth maps data is heavy and need efficient compression schemes. Immersive applications requires time-varying se- quences of depth images from multiple cameras to be encoded and transmitted. At the remote site of the system, the 3D scene is generated back by rendering the whole scene. Thus, depth movies of a generic 3D scene from multiple cameras become very heavy to be sent over network considering the available bandwidth.

This thesis presents a scheme to compress depth movies of human actors using a parametric proxy model for the underlying action. We use a generic articulated human model as the proxy to represent the human in action and the various joint angles of the model to parametrize the proxy for each time instant. The proxy represents a common prediction of the scene structure. The difference between the captured depth and the depth of the proxy is called as the residue and is used to represent the scene exploiting the spatial coherence. A few variations of this algorithm are presented in this thesis. We experimented with bit-wise compression of the residues and analyzed the quality of the generated 3D scene. Differences in residues across time are used to exploit temporal coherence. Intra-frame coded frames and difference-coded frames provide random access and high compression. We show results on several synthetic and real actions to demonstrate the compression ratio and resulting quality using a depth-based rendering of the decoded scene. The performance achieved is quite impressive. We present the articulation fitting tool, the com- pression module with different algorithms and the server-client system with several variants for the user. The thesis first explains the concepts about 3D reconstruction by image based rendering and modeling, compressing such 3D representations, teleconferencing, later we proceed towards the concept of depth images and movies, followed by the main algorithms, examples, experiments and results.

Year of completion:	2008
Advisor :	P. J. Narayanan

Related Publications

Pooja Verlani, P. J. Narayanan - Proxy-Based Compression of 2-1/2D Structure of Dynamic Events for Tele-immersive Systems Proceedings of 3D Data Processing, Visualization and Transmission June 18-20, 2008, Georgia Institute of Technology, Atlanta, GA, USA. [PDF]
Pooja Verlani, P. J. Narayanan - Parametric Proxy-Based Compression of Multiple Depth Movies of Humans Proceedings of Data Compression Conference 2008 March 25 to March 27 2008, Salt Lake City, Utah. [PDF]
Pooja Verlani, Aditi Goswami, P.J. Narayanan, Shekhar Dwivedi and Sashi Kumar Penta - Depth Image: representations and Real-time Rendering, Third International Symposium on 3D Data Processing, Visualization and Transmission,North Carolina, Chappel Hill, June 14-16, 2006. [PDF]

Kernel Methods and Factorization for Image and Video Analysis

Related Publications

Downloads

Imaging and Depth Estimation in an Optimization Framework

Related Publications

Downloads

Automatic Writer Identification and Verification using Online Handwriting

Related Publications

Downloads

Word Hashing for Efficient Search In Document Image Collections

Related Publications

Downloads

Proxy Based Compression of Depth Movies

Related Publications

Downloads

More Articles …