Thesis Students

Imaging and Depth Estimation in an Optimization Framework

Avinash Kumar (homepage)

Computer Vision is the process of obtaining information about a scene by processing the images of the scene. This reverse process can be mathematically formulated in terms of various unknown variables. Given the images, these variables will satisfy some constraints. For example the variables could be the intensity values at each pixel location in an image and the constraint being that these values have to lie between $0$ and $255$. An objective function can be formed in terms of variable and constraints. This function has the property that a set of variables which will minimize this function i.e. the global minima of this function, will be the desired solution. Often in Computer Vision, the number of possible solutions is large due to which the objective function has a number of local minima. The exhaustive search for global minima becomes computationally intensive. This thus becomes a Combinatorial Optimization problem in vision. Recently, a number of techniques based on Minimum Cut on Graphs have been proposed which are fast and efficeint. They lead to a polynomial time approximately global minima. The solutions obtained using these techniques on benchmark problems have performed better than optimization techniques developed earlier. This has led to renewed interest in the field of Optimization in computer vision in recent times. For a vision problem, obtaining a global minima using efficient optimization methods is not enough if it does not correspond to the desired solution. The formulation of an accurate objective function is also an important aspect. In this thesis, we have first proposed new objective functions for problems in Imaging and Depth estimation from images and then formulated them as optimization problems. The two main problems of imaging which have been addressed are Omnifocus Imaging and Background Subtraction. Omnifocus imaging is important as it helps to generate an image which has a large depth of field which means that everything being imaged is in focus. This is critical to many high level vision problems like object recognition which require better quality sharp images. The input to this problem is a set of images which are focussed at different depths. These images are called as multifocus images and are captured from a Non-Frontal Imaging Camera (NICAM). A new calibration technique for calibrating this camera is also proposed. This helps in registration of input images from NICAM. A Focus measure in omnifocus imaging finds the best focussed image from the set of multifocus input images. We have proposed a new Generative focus measure in this thesis. The removal of Background from images is another imaging technique where the unwanted regions in the image are removed. We have proposed a new objective function for background removal for the machine vision problem of monitoring Intermodal Freight trains. The objective function is optimized using Graph Cuts based optimization. We have also developed another techniques based on finding Edges in an image and Gaussian Mixture Modeling for background removal in such videos. In Depth estimation, the thesis has proposed range estimation from images generated from a NICAM in an optimization framework. The constraint used in this framework is that the nearby points in the three dimensional world have got the same depth. Thus the proposed optimization framework allows for accurate and smooth depth estimation in real world scenes. We show results on real data sets.

Year of completion:	2007
Advisor :	C. V. Jawahar

Related Publications

Y.C. Lal, C.P.I. Barkan, J. Drapa, N. Ahuja, J.M. Hart, P. J. Narayanan, C. V. Jawahar, A. Kmar, L.R. Milhon and M. Stehly - Machine vision analysis of the energy efficiency of intermodal freight trains Proceeding of the Institution Mechanical Engineers, Part F: Journal of Rail and Rapid Transit, ISSN 09540-4097, Volume 221, Number 3/2007, Pages 353-364, 2007. [PDF]
Avinash Kumar, Narendra Ahuja, John M. Hart, U.K.Visesh, P.J. Narayanan and C.V. Jawahar - A Vision System for Monitoring Intermodal Freight Trains Proc. of IEEE Workshop on Applications of Computer Vision(WACV 2007), Austin, Texas, USA, 2007. [PDF]

Downloads

Automatic Writer Identification and Verification using Online Handwriting

Sachin Gupta (homepage)

Automatic person identification is one of the major concerns in this era of automation. However, this is not a new problem and our society has adopted several different ways to authenticate the identity of a person such as signature and possessing a document. With the advent of electronic communication media (Internet), the interactions are becoming more and more automatic and thus the problem of identity theft has became even more severe. Even, the traditional modes of person authentication systems such as Possessions and Knowledge are not able to solve this problem. Possessions include physical possessions such as keys, passports, and smart cards. Knowledge is a piece of information that is memorized, such as a password and is supposed to be kept a secret. Knowledge and possession based methods are more focused on "what you know" or "what you possess" rather than "who you are". Due to inability of knowledge and possession based authentication methods to handle the security concerns, biometrics research have gained significant momentum in the last decade as the security concerns are increasing due to increasing automation of every field. Biometrics refers to authentication of a person using a physiological and behavioral trait of the individual that distinguish him from others. Biometric authentication has various advantages over knowledge and possession based identification methods including ease of use and non repudiation. In this thesis, we address the problem of handwriting biometrics. Handwriting is a behavioral biometric as it is generated as the consequence of an action performed by a person. Handwriting identification also has a long history. Signature (a specific instance of handwriting) has been used for authentication of legal documents for a long time.

This thesis addresses the various problems related to automatic handwriting identification. Most of the writer identification work is being done manually till date as a lot of context dependent information, such as, source of documents, nature of handwriting, etc. is difficult to model mathematically. However, they can be easily analyzed by human experts. Still, an automatic handwriting analysis system is useful as it can remove subjectivity from the process of handwriting identification and can be used for expert advice in various court cases. The final aim of this research is to design efficient algorithms for automatic feature extraction and recognition of the writer from a given handwritten document with as less human intervention as possible.

Specifically, we propose efficient solutions to three different applications of handwriting identification. First we look at the problem of determining the authorship of an arbitrary piece of online handwritten text. We then analyze the discriminative information from online handwriting to propose an efficient and accurate approach for text-dependent writer verification for practical and low security applications. We also look at the problem of repudiation in handwritten documents for forensic document examination. After introducing the problem of repudiation in handwritten documents, we propose an algorithm for repudiation detection in the handwritten documents. Handwriting identification is quite different from handwriting recognition; the other popular sub-field of automatic handwriting analysis. Handwriting recognition tries to identify the content of a handwritten text and tries to minimize variations due to writing style. On the other hand, in the case of handwriting identification, variations due to style is sought out.

Year of completion:	February 2008
Advisor :	Anoop M. Namboodiri

Related Publications

Sachin Gupta and Anoop M. Namboodiri - Text-Dependent Writer Verification Using Boosting Proceedings of International Conference of Frontiers in Handwriting Recognition, Montreal, Canada, 2008 [PDF]
Sachin Gupta and Anoop M. Namboodiri - Repudiation Detection in Handwritten Documents Proc of The 2nd International Conference on Biometics (ICB'07), PP. 356-365 Seoul, Korea, 27-29 August, 2007. [PDF]
Anoop M. Namboodiri and Sachin Gupta - Text Independent Writer Identification from Online Handwriting, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congreee Atlantia, France. [PDF]

Downloads

Word Hashing for Efficient Search In Document Image Collections

Anand Kumar

A large numbers of document image collections are now being scanned and made available over the Internet or in digital braries. Effective access to such information sources is limited by the lack of efficient retrieval schemes. The use of text search methods requires efficient and robust optical character recognizers (OCR), which are presently unavailable for Indian languages. Word spotting - word image matching - may instead be used to retrieve word images in response to a word image query. The approaches used for word spotting so far, dynamic time warping and/or nearest neighbor search tend to be slow for large collection of books. Direct matching of images is inefficient due to the complexity of matching and thus impractical for large databases. In general, indexing and retrieval methods for document images cluster similar words and build indexes with the representatives of the clusters. The time required for building such a clustering based index is very high. Such indexing methods are time inefficient with the use of complex image matching procedures required in the clustering step. This problem is solved by directly hashing word image representations.

An efficient mechanism for indexing and retrieval in large document image collections is presented in this thesis. First, document images are segmented to get words. Then features are computed at word level and indexed. Word retrieval is done very efficiently with \emph{content-sensitive hashing} (CSH), which uses an approximate nearest neighbor search technique called locality sensitive hashing (LSH). The word images are hashed into hash tables using features computed at word level. Content-sensitive hash functions are used to hash words such that the probability of grouping similar words in the same index of the hash table is high. The sub-linear time CSH scheme makes the search very fast without degrading accuracy. Experiments on a collection of Kalidasa's - the classical Indian poet of antiquity - books in Telugu demonstrate that the word images may be searched in a few milliseconds. The approach thus makes searching document image collections practical. The search time is reduced significantly by hashing the words. (more...)

Year of completion:	2008
Advisor :	C. V. Jawahar & R. Manmatha

Related Publications

Anand Kumar, C.V. Jawahar & R. Manmatha - Efficient Search in Document Image Collections Proceedings of 8th Asian Conference on Computer Vision (ACCV'07),Part I, LNCS 4843, pp. 586.595 Tokyo Japan, 18-22 November, 2007. [PDF]
C.V. Jawahar and Anand Kumar - Content-level Annotation of Large Collection of Printed Document Images Proc of 9th International Conference on Document Analysis and Recognition, Brazil, 23-26 September, 2007. [PDF]
Anand Kumar, A. Balasubramanian, Anoop M. Namboodiri and C.V. Jawahar - Model-Based Annotation of Online Handwritten Datasets, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congreee Atlantia, France. [PDF]

Downloads

Proxy Based Compression of Depth Movies

Pooja Verlani (homepage)

Sensors for 3D data are common today. These include multicamera systems, laser range scan- ners, etc. Some of them are suitable for the real-time capture of the shape and appearance of dynamic events. The 2-1/2 D model of aligned depth map and image, called a Depth Image, has been 1 popular for Image Based Modeling and Rendering (IBMR). Capturing the 2-1/2D geometric structure and photometric appearance of dynamic scenes is possible today. Time varying depth and image sequences, called Depth Movies, can extend IBMR to dynamic events. The captured event con- tains aligned sequences of depth maps and textures and are often streamed to a distant location for immersive viewing. The applications of such systems include virtual-space tele-conferencing, remote 3D immersion, 3D entertainment, etc. We study a client-server model for tele-immersion where captured or stored depth movies from a server is sent to multiple, remote clients on demand. Depth movies consist of dynamic depth maps and texture maps. Multiview image compression and video compression have been studied earlier, but there has been no study about dynamic depth map compression. This thesis contributes towards dynamic depth map compression for efficient transmission in a server-client 3D teleimmersive environment. The dynamic depth maps data is heavy and need efficient compression schemes. Immersive applications requires time-varying se- quences of depth images from multiple cameras to be encoded and transmitted. At the remote site of the system, the 3D scene is generated back by rendering the whole scene. Thus, depth movies of a generic 3D scene from multiple cameras become very heavy to be sent over network considering the available bandwidth.

This thesis presents a scheme to compress depth movies of human actors using a parametric proxy model for the underlying action. We use a generic articulated human model as the proxy to represent the human in action and the various joint angles of the model to parametrize the proxy for each time instant. The proxy represents a common prediction of the scene structure. The difference between the captured depth and the depth of the proxy is called as the residue and is used to represent the scene exploiting the spatial coherence. A few variations of this algorithm are presented in this thesis. We experimented with bit-wise compression of the residues and analyzed the quality of the generated 3D scene. Differences in residues across time are used to exploit temporal coherence. Intra-frame coded frames and difference-coded frames provide random access and high compression. We show results on several synthetic and real actions to demonstrate the compression ratio and resulting quality using a depth-based rendering of the decoded scene. The performance achieved is quite impressive. We present the articulation fitting tool, the com- pression module with different algorithms and the server-client system with several variants for the user. The thesis first explains the concepts about 3D reconstruction by image based rendering and modeling, compressing such 3D representations, teleconferencing, later we proceed towards the concept of depth images and movies, followed by the main algorithms, examples, experiments and results.

Year of completion:	2008
Advisor :	P. J. Narayanan

Related Publications

Pooja Verlani, P. J. Narayanan - Proxy-Based Compression of 2-1/2D Structure of Dynamic Events for Tele-immersive Systems Proceedings of 3D Data Processing, Visualization and Transmission June 18-20, 2008, Georgia Institute of Technology, Atlanta, GA, USA. [PDF]
Pooja Verlani, P. J. Narayanan - Parametric Proxy-Based Compression of Multiple Depth Movies of Humans Proceedings of Data Compression Conference 2008 March 25 to March 27 2008, Salt Lake City, Utah. [PDF]
Pooja Verlani, Aditi Goswami, P.J. Narayanan, Shekhar Dwivedi and Sashi Kumar Penta - Depth Image: representations and Real-time Rendering, Third International Symposium on 3D Data Processing, Visualization and Transmission,North Carolina, Chappel Hill, June 14-16, 2006. [PDF]

Downloads

Projected Texture for 3D Object Recognition

Avinash Sharma (homepage)

Three dimensional objects are characterized by their shape, which can be thought of as the variation in depth over the object, from a particular view point. These variations could be deterministic as in the case of rigid objects or stochastic for surfaces containing a 3D texture. These depth variations are lost during the process of imaging and what remains is the intensity variations that are induced by the shape and lighting, as well as focus variations. Algorithms that utilize 3D shape for classification tries to recover the lost 3D information from the intensity or focus variations or using additional cues from multiple images, structured lighting, etc. This process is computationally intensive and error prone. Once the depth information is estimated, one needs to characterize the object using shape descriptors for the purpose of classification. Image-based classification algorithms try to characterize the intensity variations of the image for recognition. As we noted, the intensity variations are affected by the illumination and pose of the object. The attempt of such algorithms is to derive descriptors that are invariant to the changes in lighting and pose. Although image based classification algorithms are more efficient and robust, their classification power is limited as the 3D information is lost during the imaging process. Our problem is to find an image-based recognition method, which utilize the shape of the object, without explicitly recovering the 3D shape of the object. This implicitly avoids the high computational cost of shape recovery while achieving high accuracies. The method should be robust to view variation, occlusion and also should invariant to scale and position of the object. It should also handle partially specular and a texture-less object surfaces. We propose the use of structured lighting patterns, which we refer to as {\em projected texture}, for the purpose of object recognition. The depth variations of the object induces deformations in the projected texture, and these deformations encode the shape information. The primary idea is to view the deformation pattern as a characteristic property of the object and use it directly for classification instead of trying to recover the shape explicitly. To achieve this we need to use an appropriate projection pattern and derive features that sufficiently characterize the deformations. The patterns required could be quite different depending on the nature of the object shape and its variation across the objects. Specifically, we look at three different recognition problems and propose appropriate projection patterns, deformation characterizations, and recognition algorithms for each. The first category of objects are of fixed shape and pose, where minor differences in shape are to be used for discriminating between classes. 3D hand geometry recognition is taken as the example of class of objects. The second class of recognition problem is that of category recognition of rigid objects from arbitrary view points. We propose a classification algorithm based on popular bag-of-words paradigm for object recognition. Third problem is that of 3D texture classification, where the depth variation in surface is stochastic in nature. We propose a set of simple texture features that can capture the deformations in projected lines on 3D textured surfaces. The above mentioned approaches have been implemented, verified, tested, and compared on various datasets collected as well as available on the Internet. The analysis and comparative results demonstrate significant improvement over the existing approaches, in terms of accuracy and robustness. (more...)

Year of completion:	2008
Advisor :	Anoop M. Namboodiri

Related Publications

Avinash Sharma and Anoop M. Namboodiri - Object Category Recognition with Projected Texture IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 374-381, 16-19 Dec,2008, Bhubaneswar, India. [PDF]
Visesh Chari, Avinash Sharma, Anoop M Namboodiri and C.V. Jawahar - Frequency Domain Visual Servoing using Planar Contours IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 87-94, 16-19 Dec,2008, Bhubaneswar, India. [PDF]
Avinash Sharma and Anoop M. Namboodiri - Projected Texture for Object Classification Proceedings of the 10th European Confernece on Computer Vision (ECCV 2008), 12-18 Oct, 2008, France. [PDF]
Avinash Sharma, Nishant Shobhit and Anoop M. Namboodiri - Projected Texture for Hand Geometry based Authentication Proceedings of CVPR Workshop on Biometrics, 28 June, Anchorage, Alaska, USA. IEEE Computer Society 2008. [PDF]

Imaging and Depth Estimation in an Optimization Framework

Related Publications

Downloads

Automatic Writer Identification and Verification using Online Handwriting

Related Publications

Downloads

Word Hashing for Efficient Search In Document Image Collections

Related Publications

Downloads

Proxy Based Compression of Depth Movies

Related Publications

Downloads

Projected Texture for 3D Object Recognition

Related Publications

Downloads

More Articles …