CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Learning Semantic Interaction Among Indoor Objects


Swagatika Panda (homepage)

Robot manipulation in clutter with objects in physical contact remains a challenging problem till date. The challenge is posed by interaction involved among the objects at various levels of complexity. Understanding positional semantics of the environment plays an important role in such tasks. The interaction with surrounding objects in the environment must be considered in order to perform the task without causing the objects fall or get damaged. In our work, we learn the semantics in terms of support relationship among different objects in a cluttered environment by utilizing various photometric and geometric properties of the scene. To manipulate an object of interest, we use the inferred support relationship to derive a sequence in which its surrounding objects should be removed while causing minimal damage to the environment. We believe, this work can push the boundary of robotic applications in grasping, object manipulation and picking-from-bin, towards objects of generic shape and size and scenarios with physical contact and overlap.

In the first part of the thesis, we aim at learning semantic interaction among objects of generic shapes and sizes lying in clutter involving both direct and indirect physical contact. Three types of support relationships are inferred: "Support from below", "Support from side", and "Containment". Subsequently, the learned semantic interaction or support relationship is used to derive a sequence or order in which the objects surrounding the object of interest should be removed without causing damage to the environment. The generated sequence is called Support Order. We have proposed and analysed two alternative approaches for support inference. In the first approach "Multiple Object Support Inference", support relations between all possible pairs are inferred. In the second approach "Hierarchical Support Inference", given an object of interest, its support relationship with other graspable objects is inferred hierarchically. The support relationship is used to predict the "support order" or the order in which the surrounding objects need to be removed in order to manipulate the target object.

In the second part of the thesis, we attempt to learn the semantic interaction among different objects in clutter using multiple views. At first, support relationship among objects in each view is inferred. Then the inferred support relationships are combined to define support relationships across multiple views. The combined global support relationship is used to recover missing support relations and predict the support order. Support order is the order in which objects surrounding an object of interest should be removed. The support order predicted using global support relationship incorporates hidden objects and missing spatial support relations.

We have created two RGBD datasets consisting of various objects used in day-to-day life present in clutter. In "Indoor dataset for clutter", 50 cluttered scenes are captured from frontal view using 35 objects of different shapes and sizes. In "Indoor multiview dataset", 7 cluttered scene are captured. Each scene each captured from multiple views. In this dataset, total 67 images are captured using 9 objects of different shapes and sizes. The dataset is made publicly available for the research community around the world. We explore many different settings involving different kind of object-object interaction. We successfully learn support relationships and predict support order in these settings. It can play significant role in extending the scope of manipulation to cluttered environment involving both direct and indirect physical contact, and generic objects.

Keywords: Robotic Vision, Support Relations, Support Order, RGBD, Semantic Interaction, Clutter, Multiple Views

(more...)

 

Year of completion:  July 2014
 Advisor : Prof. C. V. Jawahar

Related Publications

  • Swagatika Panda, A.H. Abdul Hafez and C. V. Jawahar - Learning Semantic Interaction among Graspable Objects Proceedings of 5th International Conference on Pattern Recognition and Machines Intelligence, 10-14 Dec. 2013, Kolkata, India. [PDF]

  • Swagatika Panda, A.H. Abdul Hafez and C V Jawahar - Learning Support Order for Manipulation in Clutter Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 03-08 Nov. 2013, Tokyo, Japan. [PDF]


Downloads

thesis

ppt

Fingerprint Image Enhancement Using Unsupervised Hierarchical Feature Learning


Mihir Sahasrabudhe

The use of fingerprints is an important method for identification of individuals in today's world. They are also one of the most reliable biometric traits, besides the iris. Fingerprint recognition refers to various tasks that are associated with fingerprint identification, verification, feature extraction, indexing and classification. There are a lot of systems, in a variety of domains, that employ fingerprint recognition. That being a given, precision in fingerprint recognition is essential.

Identification using fingerprints is done using a feature extraction step, followed by matching of these features. The features that are extracted to be matched depend on the algorithm being used for identification. In large databases of fingerprints, like government records, fingerprints might be indexed before they are matched. This significantly reduces the time required to identify an individual from records, as comparing his/her prints with every entry in the database will take a enormous amount of time. In either case, feature extraction plays an important role. However, feature extraction is affected directly by the quality of the input image. A noisy or unclear fingerprint image might affect the extraction of features strongly. To counter noise in input images, an extra step of enhancement is introduced before feature extraction and matching are performed. The goal of extraction is to improve quality of ridges and valleys in the fingerprint by making them clearly distinguishable, but in the process, also preserve information. The enhancement algorithm should not only not omit or remove existing information from the fingerprint, but also not introduce any spurious features that were not present in the original image.

The considerable research into fingerprint recognition, and in fingerprint enhancement, has contributed to the large number of existing algorithms for image enhancement. These include pixel-wise enhancement, contextual filtering, and Fourier analysis, to name a few. Contextual filtering uses specific filters to convolve the input image with. These parameters are determined by the values of certain features at every pixel in the input image. This requires extraction of pre-defined features from the fingerprint. For instance, the filter used at a point is affected by the ridge orientation at that point. Hence, to decide the filter to be used, the ridge orientation at that point needs to be extracted. A similar case can be observed in other types of algorithms too.

In this thesis, we propose that unsupervised feature learning be applied to the fingerprint enhancement problem. We use two different scenarios and models to show that unsupervised feature learning indeed helps improve an existing algorithm, and also when applied directly to greyscale images, can complete with robust contextual filtering and Fourier analysis algorithms. Our motivation lies in the fact that there is vast amount of available data on fingerprints; and with the recent advent in deep learning and unsupervised feature learning, in particular, we can use this available data to learn structures in fingerprint images.

For the first model, we show that continuous restricted Boltzmann machines can be used to learn local fingerprint orientation field patterns, after which their learning can be used to correct noisy, local ridge orientations. This extra step is introduced between orientation field estimation and contextual filtering. We show that this step improves the performance of matching done on the enhanced images. In the second model, we use a 3-layered convolutional deep belief network to learn features directly from greyscale fingerprint images. We show that having a deep neural network (3 layers) significantly improves the quantitative and qualitative performance of the enhancement. The deep network helps in predicting noisy regions, that were otherwise not reconstructed by the first layer only.

In conclusion, we have explored a new direction to attack the fingerprint enhancement problem. We conjecture that it is possible to extend this work to other problems involving fingerprint recognition too. For instance, synthetic fingerprint generation might be accomplished using the convolutional deep belief network trained on fingerprint features. Our experiments show a several potential experiments for the future which can give promising results. (more...)

 

Year of completion:  December 2014
 Advisor : Anoop. M. Namboodiri

 


Related Publications

  • Mihir Sahasrabudhe, Anoop M. Namboodiri - Fingerprint Enhancement using Unsupervised Hierarchical Feature Learning Proceedings of the Ninth Indian Conference on Computer Vision, Graphics and Image Processing, 14-17 Dec 2014, Bangalore, India. [PDF]

  • Mihir Sahasrabudhe and Anoop M Namboodiri - Learning Fingerprint Orientation Fields Using Continuous Restricted Boltzmann Machines Proceedings of the 2nd Asian Conference Pattern Recognition, 05-08 Nov. 2013, Okinawa, Japan. [PDF]


Downloads

thesis

ppt

Secure Biometric Authentication with Fixed-Length Binary Representations.


Rohan Kulkarni (homepage)

Biometrics have been established to be extremely reliable at the task of identifying individuals and thus are at the core of several real-world systems ranging from employee attendance to access control systems in the military. With growing computing resources available to individuals, biometric authentication systems are being deployed in even wider range of commercial applications. The permanent nature of biometrics raises serious security concerns with these deployments. Also, losing one's biometric trait can compromise that individual's identity in all the systems he is enrolled in. Biometrics are of non-rigid nature, requiring a fuzzy matching process, thus making it difficult to directly borrow popular security techniques used elsewhere with passwords and key-cards. Thus, the research interest received by this field attempts to develop efficient and reliable biometric authentication systems while addressing the issues of security and privacy.

Binary biometric representations have been shown to provide significant improvement in efficiency without compromising the system performance for various modalities including fingerprints, palmprints and iris. Hence, this thesis is focused on developing secure and privacy preserving protocols for fixed-length binary biometric templates which use hamming distance as the dissimilarity measure. We propose a novel authentication protocol using a \textit{somewhat} homomorphic encryption scheme that provides template protection and ability to use masks while computing the hamming distance. The protocol operates on encrypted data, providing complete biometric privacy to individuals trying to authenticate, only revealing the final matching score to the server. It allows real-time authentication and retains matching accuracy of the underlying representation as demonstrated by our experiments on iris and palmprints.

We also propose a one-time biometric token based authentication protocol for widely used banking transactions. In the current scenario, the user is forced to trust the service provider with his sole banking credentials or credit card details for availing desired services. Often used one-time password based systems do provide additional transaction security, however the organizations using such systems are still incapable of differentiating between a genuine user trying to authenticate or an adversary with stolen credentials. Involving biometric security would certainly strengthen the authentication process. The proposed protocol upholds the requirements of secure authentication, template protection and revocability while providing user anonymity from the service provider. We demonstrate our system's security and performance using iris biometrics to authenticate individuals.

 

Year of completion:  December 2014
 Advisor : Anoop. M. Namboodiri

Related Publications

  • Rohan Kulkarni, Anoop M. Namboodiri - One-Time Biometric Token based Authentication Proceedings of the Ninth Indian Conference on Computer Vision, Graphics and Image Processing, 14-17 Dec 2014, Bangalore, India. [PDF]

  • Rohan Kulkarni and Anoop M Namboodiri - Secure Hamming Distrance based Biometric Authentication Proceedings of the 6th IAPR International Conference on Biometrics, 04-07 June 2013, Madrid, Spain. [PDF]


Downloads

thesis

 ppt

Word Recognition of Indic Scripts


Naveen TS (homepage)

Optical Character Recognition (OCR) problems are often formulated as isolated character (symbol) classification task followed by a post-classification stage (which contains modules like UNICODE generation, error correction etc. ) to generate the textual representation, for most of the Indian scripts. Such approaches are prone to failures due to (i) difficulties in designing reliable word-to-symbol segmentation module that can robustly work in presence of degraded (cut/fused) images and (ii) converting the outputs of the classifiers to a valid sequence of UNICODES. In this work, we look at two important aspects of word recognition -- word image to text string conversion and error detection and correction in words represented as UNICODES. In this thesis, we propose a formulation, where the expectations on the two critical modules of a traditional OCR (i.e, segmentation and isolated character recognition) is minimized. And the harder recognition task is modelled as learning of an appropriate sequence to sequence transcription scheme. We thus formulate the recognition as a direct transcription problem. Given many examples of feature sequences and their corresponding UNICODE representations, our objective is to learn a mapping which can convert a word directly into a UNICODE sequence. This formulation has multiple practical advantages: (i) This reduces the number of classes significantly for the Indian scripts. (ii) It removes the need for a word-to-symbol segmentation. (ii) It does not require strong annotation of symbols to design the classifiers, and (iii) It directly generates a valid sequence of UNICODES. We test our method on more than 5000 pages of printed documents for multiple languages. We design a script independent, segmentation free architecture which works well for 7 Indian scripts. Our method is compared against other state-of-the-art OCR systems and evaluated using a large corpora.

Second contribution of this thesis is in investigating the possibility of error detection and correction in highly inflectional languages. We take Malayalam and Telugu as the examples. Error detection in OCR output using dictionaries and statistical language models (SLMs) have become common practice for some time now, while designing post-processors. Multiple strategies have been used successfully in English to achieve this. However, this has not yet translated towards improving error detection performance in many inflectional languages, especially Indian languages. Challenges such as large unique word list, lack of linguistic resources, lack of reliable language models, etc. are some of the reasons for this. In this thesis, we investigate the major challenges in developing error detection techniques for highly inflectional Indian languages. We compare and contrast several attributes of English with inflectional languages such as Telugu and Malayalam. We make observations by analysing statistics computed from popular corpora and relate these observations to the error detection schemes. We propose a method which can detect errors for Telugu and Malayalam, with an F-Score comparable to some of the less inflectional languages like Hindi. Our method learns from the error patterns and SLMs.

 

Year of completion:  January 2014
 Advisor : Prof. C. V. Jawahar

 


Related Publications

  • Praveen Krishnan, Naveen Sankaran, Ajeet Kumar Singh and C. V. Jawahar - Towards a Robust OCR System for Indic Scripts Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 7-10 April 2014, Tours-Loire Valley, France. [PDF]

  • Naveen Sankaran, Aman Neelappa and C V Jawahar - Devanagari Text Recognition: A Transcription Based Formulation Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF]

  • Naveen Sankaran and C V Jawahar - Error Detection in Highly Inflectional Languages Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF]

  • Naveen Sankarana, C V Jawahar - Recognition of Printed Devanagari Text Using BLSTM Neural Network Proceedings of 21st International Conference on Pattern Recognition, 11-15 Nov. 2012, pp.322-325Vol. 21 ISBN 978-4-9906441-1-6, Japan. [PDF]

 


Downloads

 

thesis

 ppt

Modeling Scene Text and Texture by Decomposing into Component Images


Siddharth Kherada (homepage)

Separation of images into its constituent components based on source of data, frequency distribution, or nature of data has been a widely used technique in the field of image processing and computer vision. Many problems are solved by partitioning images into components and working on each component separately. Common examples include breaking down of image into Red, Green and Blue channels/components for ease of representation or into Luminance and Chrominance for better compression. In this thesis, we explore the separation of natural images into appropriate components for the purpose of representation as well as recognition. We first introduce a framework where separation of images into direct and global components helps in modeling of 3D textures. These 3D textures are often described by parametric functions for each pixel, that models the variation in its appearance with respect to varying lighting direction. However, parametric models such as Polynomial Texture Maps(PTMs) tend to smoothen the changes in appearance. Therefore we propose a technique to effectively model natural material surfaces and their interactions with changing light conditions. We show that the direct and global components of an image have different characteristics, and when modeled separately, leads to a more accurate and compact model of the 3D surface texture. Direct component is mainly affected by structural properties of the surface and is therefore deals with phenomena like shadows and specularity, which are sharply varying functions. The global component is used to model overall luminance and color values, a smoothly varying function. For a given lighting position, both components are computed separately and combined to render a new image. This method models sharp shadows and specularities, while preserving the structural relief and surface color. Thus rendered image have enhanced photorealism as compared to images rendered by existing single pixel models such as PTMs.

We then look at separating an image based on its sources of illumination or albedo variations for the purpose of scene text segmentation. Extracting text from scene images is a challenging task due to the variations in color, size, and font of the text and the results are often affected by complex backgrounds, different lighting conditions, shadows and reflections. A robust solution to this problem can significantly enhance the accuracy of scene text recognition algorithms leading to a variety of applications such as scene understanding, automatic localization and navigation, and image retrieval. We propose a method to extract and binarize text from images that contains complex background. We use Independent Component Analysis (ICA) to map out the text region, which is inherently uniform in nature, while removing shadows, specularity and reflections, which are included in the background. The technique identifies the text regions from the components extracted by ICA using a simple global thresholding method to isolate the foreground text. We show the results of our algorithm on some of the most complex word images from the ICDAR 2003 Robust Word Recognition Dataset and compare with previously reported methods.

 

Year of completion:  December 2014
 Advisor : Anoop M. Namboodiri

 


Related Publications

  • Siddharth Kherada and Anoop M Namboodiri - An ICA based Approach for Complex Color Scene Text Binarization Proceedings of the 2nd Asian Conference Pattern Recognition, 05-08 Nov. 2013, Okinawa, Japan. [PDF]

  • Siddharth Kherada, Prateek Pandey, Anoop M. Namboodiri - Improving Realism of 3D Texture using Component Based Modeling Proceedings of IEEE Workshop on Applications of Computer Vision 9-11 Jan. 2012, ISSN 1550-5790 E-ISBN 978-1-4673-0232-6, Print ISBN 978-1-4673-0233-3, pp. 41-47, Breckenridge, CO, USA. [PDF]


Downloads

thesis

ppt

 

More Articles …

  1. Skyline Segmentation Using Shape-constrained MRFS
  2. Enhancing Bag of Words Image Representations
  • Start
  • Prev
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.