CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Multi-modal Semantic Indexing for Image Retrieval.


Pulla Lakshmi Chandrika (homepage)

Many image retrieval schemes generally rely only on a single mode, (either low level visual features or embedded text) for searching in multimedia databases. In text based approach, the annotated text is used for indexing and retrieval of images. Though they are very powerful in matching the context of the images but the cost of annotation is very high and the whole process suffers from the subjectivity of descriptors. In content based approach, the indexing and retrieval of images is based on the visual content of the image such as color, texture, shape, etc. While these methods are robust and effective they are still bottlenecked by semantic gap. That is, there is a significant gap between the high-level concepts (which human perceives) and the low-level features (which are used in describing images). Many approaches(such as semantic analysis) have been proposed to bridge this semantic gap between numerical image features and richness of human semantics

Semantic analysis techniques were first introduced in text retrieval, where a document collection can be viewed as an unsupervised clustering of the constituent words and documents around hidden or latent concepts. Latent Semantic Indexing (LSI), probabilistic Latent Semantic Analysis (pLSA), Latent Dirichlet Analysis (LDA) are the popular techniques in this direction. With the introduction of bag of words (BoW) methods in computer vision, semantic analysis schemes became popular for tasks like scene classification, segmentation and content based image retrieval. This has shown to improve the performance of visual bag of words in image retrieval. Most of these methods rely only on text or image content.

Many popular image collections (eg. those emerging over Internet) have associated tags, often for human consumption. A natural extension is to combine information from multiple modes for enhancing effectiveness in retrieval.

The enhancement in performance of semantic indexing techniques heavily depends on the right choice of number of semantic concepts. However, all of them require complex mathematical computations involving large matrices. This makes it difficult to use it for continuously evolving data, where repeated semantic indexing (after addition of every new image) is prohibitive. In this thesis we introduce and extend, a bipartite graph model (BGM) for image retrieval. BGM is a scalable datastructure that aids semantic indexing in an efficient manner. It can also be incrementally updated. BGM uses tf-idf values for building a semantic bipartite graph. We also introduce a graph partitioning algorithm that works on the BGM to retrieve semantically relevant images from a database. We demonstrate the properties as well as performance of our semantic indexing scheme through a series of experiments.

Then , we propose two techniques: Multi-modal Latent Semantic Indexing (MMLSI) and Multi-Modal Probabilistic Latent Semantic Analysis (MMpLSA). These methods are obtained by directly extending their traditional single mode counter parts. Both these methods incorporate visual features and tags by generating simultaneous semantic contexts. The experimental results demonstrate an improved accuracy over other single and multi-modal methods.

We also propose, a tri-partite graph based representation of the multi model data for image retrieval tasks. Our representation is ideally suited for dynamically changing or evolving datasets, where repeated semantic indexing is practically impossible. We employ a graph partitioning algorithm for retrieving semantically relevant images from the database of images represented using the tripartite graph. Being "just in time semantic indexing", our method is computationally light and less resource intensive. Experimental results show that the data structure used is scalable. We also show that the performance of our method is comparable with other multi model approaches, with significantly lower computational and resources requirements (more...)

 

Year of completion:  December 2013
 Advisor : C. V. Jawahar

Related Publications

  • Chandrika Pulla and C.V. Jawahar - Tripartite Graph Models for Multi Modal Image Retrieval Proceedings of 21st British Machine Vision Conference (BMVC'10),31 Aug. - 3 Sep. 2010, Aberystwyth, UK. [PDF]

  • Chandrika Pulla, Suman Karthik and C.V. Jawahar - Efficient Semantic Indexing for Image Retrieval Proceedings of 20th International Conference on Pattern Recognition (ICPR'10),23-26 Aug. 2010, Istanbul, Turkey. [PDF]

  • Pulla Chandrika and C.V. Jawahar - Multi Modal Semantic Indexing for Image Retrieval Proceedings of ACM International Conference on Image and Video Retrieval(CIVR'10), pp.342-349, 5-7 July, 2010, Xi'an, China. [PDF]

  • Suman Karthik, Chandrika Pulla and C.V. Jawahar - Incremental Online Semantic Indexing for Image Retrieval in Dynamic Databases Proceedings of International Workshop on Semantic Learning Applications in Multimedia (SLAM: CVPR 2009), 20-25 June 2009, Miami, Florida, USA. [PDF]


Downloads

thesis

ppt

Minutiae Local Structures for Fingerprint Indexing and Matching


Akhil Vij (homepage)

Human beings use specific characteristics of people such as their facial features, voice and gait to recognize people who are familiar to us in our daily life. The fact that many of the physiological and behavioral characteristics are sufficiently distinctive and can be used for automatic identification of people has led to the emergence of \emph{biometric recognition} as a prominent research field in recent years. Several biometric technologies have been developed and successfully deployed around the world such as fingerprints, face, iris, palmprint, hand geometry, and signature. Out of all these biometric traits, fingerprints are the most popular because of their ease of capture, distinctiveness and persistence over time, as well as the low cost and maturity of sensors and algorithms.

This thesis is focused on improving the efficiency of fingerprint recognition systems using local minutiae based features. Initially, we tackle the problem of large scale fingerprint matching called fingerprint identification. Large size of databases (sometimes containing billions of fingerprints) and significant distortions between different impressions of the same finger are some of the major challenges in identification. A naive solution involves explicit comparison of a probe fingerprint image/template against each of the images/templates stored in the database. A better approach to speed up this process is to index the database, where a light-weight comparison is used to reduce the database to a smaller set of candidates for detailed comparison.

In this thesis, we propose a novel hash-based indexing method to speed up fingerprint identification in large databases. For each minutia point, its local neighborhood information is computed with features defined based on the geometric arrangements of its neighboring minutiae points. The features proposed are provably invariant to distortions such as translation, rotation and scaling. These features are used to create an affine invariant local descriptor called an Arrangement Vector, which completely describes the local neighborhood of a minutiae point. To account for missing and spurious minutiae, we consider subsets of the neighboring minutiae and hashes of these structures are used in the indexing process. Experiments conducted on FVC 2002 databases show that the approach is quite effective and gives better results than the existing state-of-the-art approach using similar affine features.

We then extend our indexing framework to solve the problem of matching of two fingerprints. We extend the proposed arrangement vector by adding more features to it and making it more robust. We come up with a novel fixed-length descriptor for a minutia that captures its distinctive local geometry. This distinctive representation of each minutiae neighborhood allows us to compare two minutiae points and determine their similarity. Given a fingerprint database, we then use unsupervised K-means clustering to learn prominent neighborhoods from the database. Each fingerprint is represented as a collection of these prominent neighborhoods. This allows us to come up with a binary fixed length representation for a fingerprint that is invariant to global distortions, and handle small local non-linear distortions. The representation is also robust to missing or spurious minutiae points. Given two fingerprints, we represent each of them as fixed length binary vectors. The matching problem then reduces to a sequence of bitwise operations, which is very fast and can be easily implemented on smaller architectures such as smart phones and embedded devices. We compared our results with the two existing state-of-the-art fixed length fingerprint representations from the literature, which demonstrates the superiority of the proposed representation.

In addition, the proposed representation can be derived using only the minutiae positions and orientation of a fingerprint. This makes it applicable to existing template databases that often contain only this information. Most of the other existing methods in the literature use some additional information such as orientation flow and core points, which need the original image for computation. The new proposed binary representation is also suitable for biometric template protection schemes and is small enough to be stored on smart cards. (more...)

Year of completion:  August 2012
 Advisor : C. V. Jawahar

 

Related Publications

  • Akhil Vij, Anoop Namboodiri - Fingerprint Indexing Based on Local Arrangements of Minutiae Neighborhoods IEEE Computer Vision and Pattern Recognition Workshops (CVPRW), June, 2012 [PDF]

Downloads

thesis

ppt

A novel approach for segmentation and registration of Echo-cardiographic Images


Vidhyadhari Gondle (homepage)

Echo-cardiographic images provide a wealth of information about the heart(size, shape, blood flow rate, etc) and are therefore used to assess the functioning of heart. Automated analysis of echo-cardiographic images are aimed at extracting a displacement field which represents the heart motion. Such a field is critical for extracting higher order information which is needed for diagnosis of heart diseases. However, these images are very noisy which poses a huge challenge to image analysis.

Most of the methods used for the analysis of the echo-cardiographic images are designed in such a way that they are very specific to the noise present in the echo-cardiographic images. These methods can be categorized into two categories: (i) de-noise the signal prior to analysis and (ii) formulate input as a noisy signal to model the noise using statistical noise model. In this thesis we propose algorithms for analysis of echo-cardiographic images which do not require any pre-processing step or explicit handling of noise present in the images.

We present novel algorithms for segmentation and registration of echo-cardiographic images in this thesis. These two algorithms are designed based upon noise-robust image representation. This image representation is obtained by computing a local feature descriptor at every pixel location. The feature descriptor is derived using the Radon-Transform to effectively characterise local image context. The advantage of this representation is that, in addition to being robust to noise, it provides a good detail of the distribution of the pixel intensities in the image. Next, an unsupervised clustering is performed in the feature space to segment regions in the image. This feature-space representation is also used to extract hierarchical information for image registration.

The performance of the proposed methods is tested on both synthetic and real images. A comparison against well established feature descriptors is carried out to demonstrate the strengths and applicability of the proposed representation. Overall, the results indicate promise in the strategy of doing segmenta- tion of noisy data in image.

In this thesis, the algorithms are designed in such a way that the algorithm works efficiently even in presence of high level of speckle noise and doesn't require any pre-processing. Moreover it can be easily adapted to any other modality. The main contributions of this thesis are: 1. Noise-robust representation of an image in feature space. 2. Segmentation of an image using feature space. 3. Registration of images using hierarchical information. (more...)

 

Year of completion:  December 2013
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Vidhyadhari G. and Jayanthi Sivaswamy - Echo-Cardiographic Segmentation: Via Feature-Space Clustering Proceedings of Seventh National Conference on Communications (NCC 2011),28-30 Jan, 2011, Bangalore, India. [PDF]


Downloads

thesis

ppt

Mining Characteristic Patterns From Visual Data


Abhinav Goel

Recent years has seen the emergence of thousands of photo sharing websites on the Internet where billions of photos are being uploaded every day. All this visual content boasts a large amount of information about people, objects and events all around the globe. It is a treasure trove of useful information and is readily available at the click of a button. At the same time, significant effort has been made in the field of text mining, giving birth to powerful algorithms to extract meaningful information scalable to large datasets. This thesis leverages the strengths of text mining methods to solve real world computer vision problems. Leveraging such techniques to interpret images is accompanied by its own set of challenges. The variability in feature representations of images makes it difficult to match images of the same object. Also, there is no prior knowledge about the position or scale of the objects that have to be mined from the image. Hence, there are infinite candidate windows which have to be searched.

The work at hand tackles these challenges in three real world settings. We first present a method to identify the owner of a photo album taken off a social networking site. We consider this as a problem of prominent person mining. We introduce a new notion of prominent persons, where information about the location, appearance and social context is incorporated into the mining algorithm to be effectively able to mine the most prominent person. A greedy solution based on an eigenface representation is proposed and We mine prominent persons in a subset of dimensions in the eigenface space. We present excellent results on multiple datasets - both synthetic as well as real world datasets downloaded from the Internet.

We next explore the challenging problem of mining patterns from architectural categories. Our min- ing method avoids the large numbers of pair-wise comparisons by recasting the mining in a retrieval setting. Instance retrieval has emerged as a promising research area with buildings as the popular test subject. Given a query image or region, the objective is to find images in the database containing the same object or scene. There has been a recent surge in efforts in finding instances of the same building in challenging datasets such as the Oxford 5k dataset, Oxford 100k dataset and the Paris dataset. We leverage the instance retrieval pipeline to solve multiple problems in computer vision. Firstly, we ascend one level higher from instance retrieval and pose the question: Are Buildings Only Instances? Buildings located in the same geographical region or constructed in a certain time period in history often follow a specific method of construction. These architectural styles are characterized by certain features which distinguish them from other styles of architecture. We explore, beyond the idea of buildings as instances, the possibility that buildings can be categorized based on the architectural style. Certain characteristic features distinguish an architectural style from others. We perform experiments to evaluate how characteristic information obtained from low-level feature configurations can help in classification of buildings into architectural style categories. Encouraged by our observations, we mine characteristic features with semantic utility for different architectural styles from our dataset of European monuments. These mined features are of various scales, and provide an insight into what makes a particular architectural style category distinct. The utility of the mined characteristics is verified from Wikipedia.

We finally generalize the mining framework into an efficient mining scheme applicable to a wider varieties of object categories. Often the location and spatial extent of an object in an image is unknown. The matches between objects belonging to the same category are also approximate. Mining objects in such a setting is hard. Recent methods model this problem as learning a separate classifier for each category. This is computationally expensive since a large number of classifiers are required to be trained and evaluated, before one can mine a concise set of meaningful objects. On the other hand, fast and efficient solutions have been proposed for the retrieval of instances (same object) from large databases. We borrow, from instance retrieval pipeline, its strengths and adapt it to speed up category mining. For this, we explore objects which are “near-instances”. We mine several near-instance object categories from images obtained from Google Street View. Using an instance retrieval based solution, we are able to mine certain categories of near-instance objects much faster than an Exemplar SVM based solution. (more...)

 

Year of completion:  August 2012
 Advisor : C. V. Jawahar

Related Publications

  • Abhinav Goel, Mayank Juneja and C V Jawahar - Are Buildings Only Instances? Exploration in Architectural Style Categories Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • Abhinav Goel and C.V. Jawahar - Whose Album is this? Proceedings of 3rd National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, ISBN 978-0-7695-4599-8, pp.82-85 15-17 Dec. 2011, Hubli, India. [PDF]

  • Abhinav Goel, Mayank Juneja, C.V. Jawahar - Leveraging Instance Retrieval for Efficient Category mining Computer Vision and Pattern Recognition Workshops, 2013 [PDF]

Downloads

thesis

ppt

Instance Retrieval and Image Auto-Annotations on Mobile Devices


Jay Guru Panda (homepage)

UseCaseImage matching is a well studied problem in the computer vision community. Starting from template matching techniques, the methods have evolved to achieve robust scale, rotation and translation invariant matching between two similar images. To this end, people have chosen to represent images in the form of a set of descriptors extracted at salient local regions that are detected in a robust, invariant and repeatable manner. For efficient matching, a global descriptor for the image is computed either by quantizing the feature space of local descriptors or using separate techniques to extract global image features. With this, effective indexing mechanisms are employed to perform efficient retrieval on large image databases.

Successful systems have been put in place in desktop and cloud environments to enable image search and retrieval. The retrieval takes fraction of a second on a powerful desktop or a server. However, such techniques are typically not well suited for less powerful computing devices such as mobile phones or tablets. These devices have small storage capacity and the memory usage is also limited. Computer vision algorithms run slower, even when optimized for the architecture of mobile processors. These handheld devices, or so-called smart devices are increasingly used for simple tasks that seem too trivial for a desktop or a laptop and can be easily accessed on a smaller display. Further, they are more popularly used for taking pictures (gradually replacing the space of digital cameras) owing to the improved embedded camera sensors. Hence, a user is more likely to use a query image from the mobile phone, rather than from the desktop. This increases the scope of applications that demand real-time search and retrieval result delivered on a mobile phone.

Many applications (or apps) on mobile smart phones communicate with the cloud to perform tasks that are infeasible on the device. People have attempted to retrieve images in this cloud-based model by either sending the image or its features to the server and receiving back relevant information. We are interested to solve this problem on the device itself with all the necessary computations happening on the mobile processor. It allows a user to not bother for a consistent network connection and the communication overheads associated with the search process. We address the range of applications that need simple text annotations to describe the image queried on the mobile. An interesting use case is a tourist/student/historian visiting a heritage site and can get all information about the monuments and structures on his mobile phone. Once the app is initialized on the device, the camera is opened and just pointing the camera or with a single click all the useful info about the monument is displayed on the screen instantly. The app doesn’t use the internet for communicating with any server and should do all computations on the mobile phone itself. Our methods optimize the process of instance retrieval to enable quick and light-weight processing on a mobile phone or a tablet. (more...)

 

Year of completion:  December 2013
 Advisor : C. V. Jawahar

Related Publications

  • Jayaguru Panda, Michael S Brown and C V Jawahar - Offline Mobile Instance Retrieval with a Small Memory Footprint Proceedings of International Conference on Computer Vision, 1-8th Dec.2013, Sydney, Australia. [PDF]

  • Jayaguru Panda, Shashank Sharma, C V Jawahar - Heritate App: Annotating Images on Mobile Phones Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • J Panda, C. V. Jawahar - Heritage App: Annotating Images on Mobile Phones IAPR Second Asian Conference on Pattern Recognition (ACPR2013), Okinawa (Japan), November, 2013 [PDF]

Downloads

thesis

ppt

More Articles …

  1. A Framework for Community Detection from Social Media
  2. Efficient Texture Mapping by Homogeneous Patch Discovery.
  3. Image Mosaicing of Neonatal Retinal Images.
  4. Combining Data Parallelism and Task Parallelism for Efficient Performance on Hybrid CPU and GPU Systems
  • Start
  • Prev
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.