Thesis Students

Efficient Image Retrieval Methods For Large Scale Dynamic Image Databases

Suman Karthik

The commoditization of imaging hardware has led to an exponential growth in image and video data, making it difficult to access relevant data when it is required. This has led to a great amount of research into multimedia retrieval and Content Based Image Retrieval (CBIR) in particular. Yet, CBIR has not found widespread acceptance in real world systems. One of the primary reasons for this is the inability of traditional CBIR systems to scale effectively to Large Scale image databases. The introduction of the Bag of Words model for image retrieval has changed some of these issues for the better, yet bottlenecks remain and their utility is limited when it comes to Highly Dynamic image databases (image databases where the set of images is constantly changing). In this thesis, we focus on developing methods that address the scalability issues of traditional CBIR systems and adaptability issues of Bag of Words based image retrieval systems.

cbir Traditional CBIR systems find relevant images by finding nearest neighbors in a high dimensional feature space. This is computationally expensive, and does not scale as the number of images in the database grow. We address this problem by posing the image retrieval problem as a text retrieval task. We do this by transforming the images into text documents called the Virtual Textual Description (VTD). Once this transformation is done, we further enhance the performance of the system by incorporating a novel relevance feedback algorithm called discriminative relevance feedback. Then we use the virtual textual description of images to index and retrieve images efficiently using a data structure called the Elastic Bucket Trie(EBT).

IVQ Contemporary bag of visual words approaches to image retrieval perform one-time offline vector quantization to create the visual vocabulary. However, these methods do not adapt well to dynamic image databases whose nature constantly changes as new data is added. In this thesis, we design, present and examine with experiments a novel method for incremental vector quantization(IVQ) to be used in image and video retrieval systems with dynamic databases.

BGM Semantic indexing has been invaluable in improving the performance of bag of words based image retrieval systems. However, contemporary approaches to semantic indexing for bag of words image retrieval do not adapt well to dynamic image databases. We introduce and examine with experiments a bipartite graph model (BGM), which is a scalable datastructure that aids in on-line semantic indexing and a cash flow algorithm that works on the BGM to retrieve semantically relevant images from the database. We also demonstrate how traditional text search engines can beused to build scalable image retrieval systems.

Year of completion:	July 2009
Advisor :	C. V. Jawahar

Related Publications

Suman Karthik, Chandrika Pulla and C.V. Jawahar - Incremental Online Semantic Indexing for Image Retrieval in Dynamic Databases Proceedings of International Workshop on Semantic Learning Applications in Multimedia (SLAM: CVPR 2009), 20-25 June 2009, Miami, Florida, USA. [PDF]

Suman Karthik, C.V. Jawahar - Analysis of Relevance Feedback in Content Based Image Retrieval, Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision (ICARCV), 2006, Singapore. [PDF]
Suman Karthik, C.V. Jawahar, Virtual Textual Representation for Efficient Image Retrieval, Proceedings of the 3rd International Conference on Visual Information Engineering(VIE), 26-28 September 2006 in Bangalore, India. [PDF]
Suman Karthik, C.V. Jawahar, Effecient Region Based Indexing and Retrieval for Images with Elastic Bucket Tries, Proceedings of the International Conference on Pattern Recognition(ICPR), 2006. [PDF]

Downloads

High Quality Image Reconstruction by Optimal Capture and Accurate Registration

Himanshu Arora (homepage)

Generating high-resolution and high quality images from degraded images has a variety of applications in space imaging, medical imaging, commercial videography and recognition algorithms. Recently, camera-equipped cell-phones are widely being used for day-to-day photography. Limitations of the image capturing hardware and environmental attenuations introduce degradations into the captured image. The quality of reconstructed image obtained from most of the algorithms are highly sensitive to accurate computation of different underlying parameters such as blur, noise, geometric deformation, etc. Variations in blur, illumination and noisy conditions together with occlusions further make the computation of these underlying parameters difficult.

One of the ways of generating a high-quality image is by fusing multiple images, which are displaced at sub-pixel levels. This method is also popularly known as multi-frame Super-resolution (SR). Most multi-image SR algorithms assume that the exact registration and blur parameters between the constituent frames are known. Traditional approaches for image registration are either sensitive to image degradations such as variations in blur, illumination and noise, or are limited in the class of image transformations that can be estimated. These conditions are frequently violated in real-world imaging, where specular surfaces, close light sources, small sensors and lenses create large variations in illumination, noise, and blur within the scene. Interestingly, these are the exact situations, where one would like to employ SR algorithms.

We explore an alternate solution to the problem of robustness in the registration step of a SR algorithm. We present an accurate registration algorithm that uses the local phase information, which is robust to the above degradations. The registration step is formulated as optimization of the local phase alignment at various spatial frequencies. We derive the theoretical error rate of the estimates in presence of non-ideal band-pass behavior of the filter and show that the error converges to zero over iterations. We also show the invariance of local phase to a class of blur kernels. Experimental results on images taken under varying conditions are demonstrated.

rightzoom Recently, Lin and Shum has shown an upper limit on multi-frame SR techniques. For practical purposes this limit is very small. Another class of SR algorithms formulate the high-quality image generation as an inference problem. High-resolution image is inferred from a set of learned training patches. This class of algorithm works well for natural structures but for many man-made structures this technique does not produce accurate results. We propose to capture the images at optimal zoom from the perspective of image super-resolution. The images captured at this zoom has sufficient amount of information so that it can be magnified further by using any SR algorithm which promotes step edges and certain features. This can have a variety of applications in consumer cameras, large-scale automated image mosaicing, robotics and improving the recognition accuracy of many computer vision algorithms. Existing efforts are limited to image a pre-determined object at the right zoom. In the proposed approach, we learn the patch structures at various down-sampling factors. To further enhance the output we impose the local context around the patch in a MRF framework. Several constraints are introduced to minimize the extent of zoom-in.

Projector-Camera systems are used for various applications in computer vision, immersive environments, visual servoing, etc. Due to gaps between neighboring pixels on the projector's image plane and variations in scene depth, the image projected onto a scene shows pixelation and blurring artifacts. In certain vision and graphics applications, it is required that high quality composition of the scene and the projected image is captured, excluding the artifacts, while retaining the scene characteristics. The localization and restoration of projected pixels is difficult due to various factors such as spatially varying blur, background texture, noise, shapes of scene objects, and color transformations of projector and camera. We extend the usage of local phase, computed using the Gabor filter, to isolate each of the projected pixels distinctively. The local phase is also invariant to a class of blur kernels. For restoration of the captured images, we reproject the projected pixels such that these artifacts are absent. To improve the quality further we propose a mechanism to virtualize a high-resolution projector.

Year of completion:	2009
Advisor :	Anoop M. Namboodiri

Related Publications

Himanshu Arora and Anoop M. Namboodiri - How much zoom is the righ zoom from the perspective of Super-Resolution? IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 142-149, 16-19 Dec,2008, Bhubaneswar, India. [PDF]
Himanush Arora and Anoop M. Namboodiri - Projected Pixel Localization and Artifact Removal in Captured Images IEEE TENCON, November 18-21,2008, Hyderabad, India. [PDF]
Himanshu Arora, Anoop M. Namboodiri and C. V. Jawahar - Robust Image Registration with Illumination Blur and Noice Variations for Super_Resolution IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008),Mar 30-April 4, Las Vegas, USA. [PDF]
Himanshu Arora, Anoop M. Namboodiri and C.V. Jawahar - Accurate Image Registration from Local Phase Information Proc. of The Thirteen National Conference on Communications(NCC 2007), Kanpur, 2007. [PDF]

Downloads

ppt

Towards Large Scale Image Based Robot Navigation

Supreeth Achar (homepage)

Automatic mapping of environments and autonomous navigation are two of the central problems in mobile robotics. A number of different sensor modalities can be used to solve these problems. Vision sensors are relatively low in cost and have the potential to provide very rich information about the environment. However, active range finding sensors tend to be more widely used because interpretation of images has been a very challenging, computationally intensive task. Recent advances in computer vision algorithms and improvements in hardware capabilities have greatly increased the scope for using vision in robotics, even in unstructured, real world environments. Approaches to applying computer vision to robotics problems fall into two classes, model based approaches and image based approaches. Model based approaches use some sort of metric representation of the environment as a map. This map can be built manually and provided to the robot or it can be built automatically using visual SLAM methods. Building accurate reconstructions becomes increasingly difficult as the size of the environment increases. Image based methods for robot navigation avoid the need for metric modeling. Instead of inferring a 3D model of the environment and using this model to determine where the robot is and what is should do, image based methods work directly over images. Images of the environment captured from various positions in the environment are stored as nodes in a topological graph structure in which edges encode the connectivity between pairs of poses.

Localization is the process of determining the position of a robot from sensor readings. In an image based framework, localization entails finding a previously seen image stored in the topological graph that closely matches the current view. If such an image exists, then it can be inferred that the robot is close to the pose where the stored image was captured. In this thesis, a robot localization method is developed that builds upon bag of words techniques used in content based image retrieval. Dynamic scene elements such as pedestrians and traffic which could corrupt localization results are handled with the help of a motion detection algorithm. Prior information regarding robot pose is used in conjunction with visual word frequency distributions to assign weights to words that give higher importance to features that are more informative regarding the pose of the robot. The method was validated in a localization experiment over a six kilometer path through city roads. block

Image based methods are ideally suited to learning the rough structure and connectivity of an environment, especially detection of loop closures. Loop closures can be found using the same image retrieval based techniques that are used for localization. Loop closure detection is a major problem in SLAM because data at the end of a loop has to be associated with that at the beginning and then the entire map built by the robot needs to be updated. Chapter 3 presents a method for inferring a metric map of an environment from an appearance based topological graph representation. Building a topological graph as a preliminary step makes information regarding the connectivity of the environment available to the SLAM algorithm, making the making process significantly easier. The offline SLAM process can be visualized as an optimization over a graph. Each robot pose along the trajectory and feature in the map is represented by a node. Each edge links two nodes connected by a measurement, either a measurement of a feature position with respect to some point on the robot trajectory or an estimate of the motion between two points along the trajectory. Due to noise and modeling errors, these measurements will be inconsistent with each other. The optimization finds the a consistent trajectory and map that maximizes the log-likelihood of the measurements under assumptions of Gaussian sensor and motion model noise.

In natural outdoor environments, the deformable nature of objects can make it difficult to establish correspondences between points in two images of the same scene taken from different views. Most existing closed loop methods for controlling the position of a manipulator or robot using visual servoing require accurate feature correspondences. Chapter 4 presents a visual servoing algorithm that does away with the need for feature correspondence between the current and target views by modeling the distribution of features in an image as a mixture of Gaussians. The L2 norm of the distance between two such Gaussian mixtures can be expressed in the closed form. A control law is derived which which minimizes the distance function between the two Gaussian mixtures, causing the robot to move from its current pose to the target pose. Because the control law is defined in terms of a statistic over the feature distribution and not over the positions of the individual features, the control is robust to errors in measurement.

Year of completion:	August 2009
Advisor :	C. V. Jawahar

Related Publications

Supreeth Achar and C. V. Jawahar - Adaptation and Learning for Image Based Navigation IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 103-110, 16-19 Dec,2008, Bhubaneswar, India. [PDF]
D. Santosh, Supreeth Achar, C.V. Jawahar - Autonomous Image-based Exploration for Mobile Robot Navigation Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'2008). May 19th-23rd 2008, Pasadena California. [PDF]
A.H. Abdul Hafez, Supreeth Achar, and C. V. Jawahar - Visual Servoing based on Gaussian Mixture Models Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'2008). May 19th-23rd 2008, Pasadena Californi. [PDF]

Downloads

Exploring Irregular Memory Access Applications on the GPU

Mohammed Suhail Rehman (homepage)

General Purpose Computation on Graphics Processors (GPGPU) is a recent development in high performance computing. Graphics hardware has evolved over the past decade or so from fixed function stream processors to highly programmable computation engines that offer the maximum price-to-performance ratio among current parallel hardware. The GeForce GTX 280 from NVIDIA comes with a peak performance of 1 TFLOP and costs about 400 US$. This peak performance, however is hard to realize for many applications on the GPU. Applications that generally see good performance gains (compared to implementations on similar parallel hardware or CPU) are those which are extremely parallel and have data locality which match well with the memory subsystem of such hardware. A lot of research has been devoted to finding ways to optimally utilize this massively parallel hardware to accelerate a diverse range of applications.

There exists a range of applications which are parallelizable but do not possess ’good’ memory access characteristics. These applications are generally classified as irregular memory access applications. Developing techniques to implement such applications have been a challenge on such massively parallel hardware. These applications are traditionally not well suited for most modern parallel architectures and even modest gains in performance are appreciated. Development of suitable primitives for this class of applications is crucial in advancing the acceptance of such massively parallel hardware in the high performance computing community.

We introduce another technique to solve this problem efficiently on the GPU for large data sets. This technique is a recursive formulation of a recent algorithm for shared memory systems by Helman and J´aJ´a. This recursive formulation allowed us to exploit the massive parallelism on the GPU as we were able to minimize the number of idle threads during computation. The optimal parameters for this algorithm to run on the GPU are also discussed, leading the fastest single-chip implementation of the list ranking problem, at the time of printing. Our optimized algorithm can rank a random list of 8 million elements in about 50 milliseconds which translates to about 15x speedup over a sequential CPU implementation and 5-6x speedup over the Cell Broadband Engine.

We then present an implementation of the list ranking primitive to accelerate a few Euler-tour technique based algorithms - these include tree rooting (parent finding), subtree size calculation, tree traversals and vertex level computation. A major chunk of the actual runtime for these implemenations is the list ranking step, and we obtain highly efficient algorithms for processing trees on the GPU as a result. We see a sustained performance benefit of about 9-10x over sequential BFS on GPU and about 2x over BFS implemented on CUDA. Optimizing such applications for the GPU is not a trivial task. We explain some of the characteristics of irregular application development for the GPU as well as techniques to model and assess performance for the GPU. Our research has yielded a set of simple equations can aid in quickly estimating the runtime of a particular implementation on the GPU. We discuss list ranking in this context and provide a few pointers in maximizing performance for irregular applications.

Year of completion:	June 2010
Advisor :	P. J. Narayanan, Kishore Kothapalli

Related Publications

M. Suhail Rehman, Kishore Kothapalli and P.J. Narayanan - Fast and Scalable List Ranking on the GPU Proceedings of 23rd International Conference on Supercomputing(ICS 09), 8-12 June, 2009, New York, USA. [PDF]

Kishore Kothapalli, Rishabh Mukherjee, M. Suhail Rehman, P.J Naryanan, Kannan Srinathan, Suryakant Patidar - A Performance Prediction Model for the CUDA GPGPU Platform, 16th Annual International Conference on High Performance Computing (HiPC 2009). Kochi, India, December 2009 . [PDF]

Downloads

Learning Non-Linear Kernel Combinations Subject to General Regularization: Theory and Applications

B. Rakesh Babu

Kernel methods are among the important recent developments in the field of machine learning with applications in computer vision, speech recognition, bio-informatics, etc. This new class of algorithms combine the stability and efficiency of linear algorithms with the descriptive power of nonlinear features. Kernel methods allow data to be mapped (implicitly) to a different space, which is often very high dimensional compared to the input space, so that complex patterns in the data become simpler to detect and learn. Kernel function maps the data implicitly into a different space. Support Vector Machines (SVMs) are one of the kernel methods which is widely successful for classification task. The performance of algorithm depends on the choice of the kernel. Sometimes, finding the right kernel is a complicated task. To overcome this, learning the kernel is the new paradigm which is developed in the recent years. For this, the kernel is parameterized as a weighted linear combination of base kernels. The weights of the kernel are jointly optimized with the objective of the task.

Learning both the SVM parameters and the kernel parameters is a Multiple Kernel Learning (MKL) problem. Many formulations of MKL are presented in literature. However, all these methods restrict to linear combination of base kernels. In this thesis, we show how the existing optimization techniques of MKL formulations can be extended to learn non-linear kernel combinations subject to general regularization on the kernel parameters. Although, this leads to non-convex problem, the proposed method retains all the efficiency of existing large scale optimization algorithms. We name the new MKL formulation as Generalized Multiple Kernel Learning (GMKL). We highlight the advantages of GMKL by tackling problems like feature selection and learning discriminative parts for object categorization problem. Here, we show how the proposed formulation can lead to better results not only as compared to traditional MKL but also as compared to state-of-the-art wrapper and filter methods for feature selection. In the problem of learning discriminative parts for object categorization, our objective is to determine minimal sets of pixels and image regions required for the task. We use the Multiple kernel learning to select the most relevant pixels and regions for classification. We then show how the framework can be used to enhance our understanding of the object categorization problem at hand, determine the importance of context and highlight artifacts in the training data. We also tackle the problem of recognizing characters in images of natural scenes in MKL framework. Traditionally it is not be handled well by OCR techniques. We assess the performance of various features ( using bag-of-visual-words representation ) based on nearest neighbor and SVM classification. Besides this, we investigate the appropriate representation schemes for recognition using MKL.

In short, the contributions of this thesis are: (i) Proposing new MKL formulation that can learn non-linear kernel combinations subject to general regularization on the kernel parameters. (ii) Exploring the utility of multiple kernel learning formulations for feature selection and to the problem of learning informative parts for object category recognition. (iii) Recognition of character images taken in natural scenes using the state of the art object recognition schemes. And also exploring the appropriate representation schemes for recognition using MKL.

Year of completion:	2010
Advisor :	C. V. Jawahar, Dr. Manik Varma

Related Publications

Manik Varma and Bodla Rakesh Babu - More Generality in Efficient Multiple Kernal Learning Proceedings of International Conference on Machine Learning(ICML 09), 14-18 June, 2009, Montreal, Quebec. [PDF]

T. E. de Campos and B. R. Babu and M. Varma, Character recognition in natural images,in Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, February 2009.

Efficient Image Retrieval Methods For Large Scale Dynamic Image Databases

Related Publications

Downloads

High Quality Image Reconstruction by Optimal Capture and Accurate Registration

Related Publications

Downloads

Towards Large Scale Image Based Robot Navigation

Related Publications

Downloads

Exploring Irregular Memory Access Applications on the GPU

Related Publications

Downloads

Learning Non-Linear Kernel Combinations Subject to General Regularization: Theory and Applications

Related Publications

Downloads

More Articles …