Thesis Students

Mixed-Resolution Patch-Matching

Harshit Sureka (homepage)

The problem of matching small image regions or patches between images is central to many image editing and computer vision tasks. In this thesis, we propose two related solutions to the problem of fast and global patch-matching. First, we present a multi-resolution spatial pyramid based solution, which we call Pyramid Patch-Matching (PPM). It uses a simple coarse-to-fine framework powered by exhaustive search at the coarsest resolution and local directed search at finer resolutions. PPM quickly finds accurate patch correspondences between images or image regions. The search step in our algorithm is flexible to accommodate the use of other nearest-neighbor algorithms within its framework. A proposed increase K technique enables PPM to store more number of matches than required at coarse resolutions which maintains a wide search range. This improves accuracy by up-to 2%. Its effect is most significant when less number of matches are required. The {\em early-termination} technique preempts the distance calculation between patches, if the distance exceeds that of the farthest match currently stored. This speeds up the algorithm by up-to 1.7x. Following the downsampling rule, the {\em reduce patch-size} technique matches smaller patches than required at coarser resolutions of the pyramid. Empowered by these techniques, PPM achieves state-of-the-art accuracy outperforming previous standards of Coherency Sensitive Hashing (CSH) and PatchMatch without compromising on execution-time. PPM finds up-to 4% more of the ground-truth matches and achieves lower error values compared to CSH, the current standard.

Several parameters are provided to tune the search for good matches based upon the application. We perform experiments to study effects of these parameters on matching accuracy and computation time. While consistently achieving lower error values for RGB distance between matched patches, the simplicity of our algorithm also enables fast parallel implementations, providing further advantage over previous randomized, hashing or tree-based iterative approaches. On CPU cores, it achieves a near-linear time speed-up w.r.t. the number of cores and a GPU implementation provides a further speed-up of up to 70x compared to a CPU implementation run on a quad-core machine (250x w.r.t. single core).

We observe that the matching accuracy of PPM suffers from low confidence matching of patch-vectors at coarse resolutions. To solve this, we propose a novel way to mix information by concatenating corresponding patch-vectors from finer resolution levels with patch-vectors of coarse levels. We call these {\em Mixed-Resolution Vectors} (MR-vectors). Unlike traditional patch-vectors that contain information from a single resolution level, MR-vectors contain more relevant information for matching. We build over our PPM algorithm by substituting MR-vectors in place of traditional patch-vectors. We call this the Mixed-Resolution Patch-Matching (MRPM) algorithm. MRPM further improves the accuracy of matches found by PPM with a marginal increase in time. It captures up-to 4\% more of the ground-truth matches (8% more than CSH) and achieves near-optimal match accuracy. We also show that the MR-approach has computational benefits over an approach which uses an increased search range. We describe the effects of changing the number of resolution levels mixed and show that a balance between accuracy and time is achieved at mixing 2 levels. We perform various experiments to compare MRPM with PPM and other previous approaches, in all of which MRPM consistently performs better. Fast parallel multi-core and GPU implementations of MRPM follow from parallel implementations of PPM. We develop applications of image denoising, image summarization and auto-crop based on our MRPM algorithm. Results with better visual quality are obtained when MR-vectors are used. The computational bottleneck of these algorithms is the search for nearest neighbors of patches. We demonstrate the utility of MRPM as a fast patch-matching engine which empowers these applications. We open up new possibilities of research to explore the scope of MR-vectors which we believe has applications in several other image editing and computer vision tasks.

Year of completion:	December 2012
Advisor :	P. J. Narayanan

Related Publications

Harshit Surekha and P J Narayanan - Mixed-Resolution Patch-Matching Proceedings of 12th European Conference on Computer Vision, 7-13 Oct. 2012, Vol. ECCV 2012, Part-VI, LNCS 7577, pp. 187-198, Firenze, Italy. [PDF]

Downloads

Private Outlier Detection and Content based Encrypted Search

Nisarg Raval (homepage)

Today, our daily lives are constantly driven and influenced by services like social networking, personalized advertisements, cloud infrastructure etc. This makes any form of personal or professional data vulnerable to both security and privacy threats. User looses control over the data as soon as its published on the Internet. Recently, popular photo sharing service Instagram tried to change its privacy policy, giving itself the right to sell users’ information including their photographs to advertisers without any notification or compensation. With incidents like AOL and Netflix debacle, organizations are refraining from releasing customer data even after anonymization. This hinders future research and developments in many ways due to unavailability of customer data, behavior and patterns. Techniques like k-anonymity and l-diversity help to preserve user privacy to some extent but fails to provide any solid guarantees. Recently proposed, differential privacy provides promising results in this area with theoretical bounds on information leak and preservation of privacy even against the auxiliary information. However, differential privacy considers individual’s record independent of other data and can not be applied to linked data like social graph, time series etc. Furthermore, all these techniques compromise utility of the data while maintaining privacy.

At present, two approaches are widely used to ensure security of the data. First one is based on Secure Multiparty Computations (SMC), where users communicate with each other using SMC protocol to securely evaluate any function on their data. SMC gives provable security without compromising utility of the data. However, the solutions based on the general protocol of SMC requires enormous computational and communication overhead, thus limiting the practical deployment of the secure algorithms. The second approach is to encrypt the data at the user end and perform any operations in the encrypted domain to preserve the confidentiality of the data. This leads to difficulty in performing various operations like search, retrieval and computations. Homomorphic encryption allows secure computations over encrypted data but its impractical due to high computational cost.

In this thesis, we focus on developing privacy preserving, communication and computationally efficient algorithms in the domain of Data Mining and Information Retrieval. Specifically, we design efficient privacy preserving solutions for two widely used tasks - outlier detection and content based similarity search over encrypted data. Private computation of outliers among data collected from different sources has many applications in the area of data mining. First, we propose a novel algorithm of distance based outlier detection using Locality Sensitive Hashing (LSH) in a distributed setting where data is horizontally partitioned among data owners. We show that the proposed algorithm is scalable in case of large datasets with high dimensionality. Then, we extend our distributed outlier algorithm to propose an algorithm for private outlier detection that broke the previous known bounds of quadratic cost. We compare our algorithm with the state of the art technique and show that our algorithm is better both in terms of computational as well as communication complexity.

Next, we explore the properties of hierarchical index structures and propose an algorithm for content based similarity search over encrypted data that do not reveal user query to the server. To the best of our knowledge, we are the first one to propose a definition and a scheme for the problem of Content Similarity Searchable Symmetric Encryption (CS-SSE). Finally, we extend our CS-SSE scheme for Private Content Based Image Retrieval, which is essential for many applications like searching health records using CT scans and patent search for logos. We also show that our algorithm achieves optimal computational cost with similar accuracy and privacy when compared to the existing techniques.

Year of completion:	2013
Advisor :	C. V. Jawahar and Dr. Kannan Srinathan

Related Publications

Downloads

Document Image Retrieval using Bag of Visual Words Model.

Ravi Shekhar (homepage)

With the development of computer technology, storage has become more affordable. So, all traditional libraries are making their collections electronically available on internet or digital media. To search in these collections, better indexing structure is needed. It can be done manually or automatically. But manual indexing is time consuming and expensive. So, automation is the only feasible option. First approach in this direction was to convert document images into text by OCR and then perform traditional search. This method works well for European languages whose optical character recognizers (OCRs) are robust but did not work reliably for Indian and non-European languages like Hindi, Telugu and Malayalam etc. Even for European languages, OCRs are not available in the case of highly degraded document images.

To overcome limitations of OCR, word spotting was emerged as an alternative. In word spotting, given word image is represented as features and matching is done based on feature distance. Main challenges in word spotting are extraction of proper features and feature comparison mechanism. Profile feature is one of the popular ways to represent word image and Euclidean distance is used for comparison. To overcome word length, all images are scaled to same size, due to which lots of information is lost. Later, DTW based approach was proposed for matching which does not require scaling of images. The problem with DTW based approach is that it is very slow and can not be used for practical and large scale application.

In first part of this thesis, we explain a Bag of Visual Words (BoVW) based approach to retrieve similar word images from a large database, efficiently and accurately. We show that a text retrieval system can be adapted to build a word image retrieval solution. This helps in achieving scalability. We demonstrate the method on more than 1 Million word images with a sub-second retrieval time. We validate the method on four Indian languages, and report a mean average precision of more than 0.75. We represent the word image as histogram of visual words present in the image. Visual words are quantized representation of local regions, and for this work, SIFT descriptors at interest points are used as feature vectors. To address the lack of spatial structure in the BoVW representation, we re-rank the retrieved list. %This significantly improves the performance. This provides significant improvement in the performance.

Later, we have also performed enhancements over BoVW approach. Enhancements are in terms of query expansion, text query support and Locality constrained Linear Coding (LLC) based retrieval system. In query expansion, we have used initial results to modify our initial query and obtained better results. In BoVW model, query is given by example but users are generally interested in query as text like ``Google'', ``Bing'' etc. Text query is also supported by same model as BoVW. Later, LLC is used to achieve high recall. LLC scheme learns a data representation using nearest codeword and achieves improvement over performance.

In most of the scalable document search, features like SIFT, SURF are used. These are originally designed for natural images. Natural images have lots of variation and extra information in terms of gray images. Document images are binary images, so need features which can be specifically designed for them. We have proposed patch based features using profile features. We have compared our proposed feature with SIFT and obtained similar performance. Our feature has advantage that it is faster to compute compared to SIFT, which makes our pre-processing step very fast.

We have demonstrated that recognition free approach is feasible for large scale word image retrieval. In future work, similar approach can be used for hand written, camera based retrieval and natural scene text retrieval etc. Also, a better and efficient descriptor is needed specifically designed for document words.

Year of completion:	June 2013
Advisor :	C. V. Jawahar

Related Publications

Downloads

Learning Representations for Computer Vision Tasks.

Siddhartha Chandra (homepage)

Learning representations for computer vision tasks has been the holy grail for the vision community for long. Research in computer vision is dedicated towards developing machines that understand image data, which takes a variety of forms such as images, video sequences, views from multiple cameras, high dimensional data from medical scanners and so on. Good representations of the data intend to discover the hidden structure in it; better insights into the nature of the data can help choose or create better features, learn better similarity measures between data points, build better predictive and descriptive models, and ultimately drive better decisions from data. Research into this field has shown that good representations are more often than not task specific: there is no single universal set of features that solves all the problems in computer vision. Consequently, feature learning for computer vision tasks is not a problem, rather a set of problems, a full fledged field of research per se.

In this thesis, we seek to learn good, semantically meaningful representations for some of the popular computer vision tasks, such as visual classification, action recognition and so on. We study and employ a variety of existing feature learning approaches, and devise novel strategies for learning representations on these tasks. Additionally we compare our methods with the traditional approaches. We discuss the design choices we make, and the effects of varying the parametric variables in our approaches. We provide empirical evidence to show our representations are better at solving the tasks at hand than the traditional ones.

To solve the task of action recognition, we devise a novel PLS kernel that employs Partial Least Squares (PLS) regression to derive a scalar measure of similarity between two video sequences. We use this similarity kernel to solve the tasks of hand gesture recognition and action classification. We demonstrate that our approach significantly outperforms the state of the art approaches on two popular datasets: Cambridge hand gesture dataset and UCF sports action dataset.

We use a variety of approaches to tackle the popular task of visual classification. We describe a novel hierarchical feature learning strategy that uses low level Bag of Words visual words to create ``higher level" features by making use of the spatial context in images. Our model uses a novel {\em Naive Bayes Clustering} algorithm to convert a 2-D symbolic image at one level to a 2-D symbolic image at the next level with richer features. On two popular datasets, Pascal VOC 2007 and Caltech 101, we demonstrate the superiority of our representations to the traditional BoW and deep learning representations.

Driven by the hypothesis that most data, such as images, lies in multiple non-linear manifolds, we propose a novel non-linear subspace clustering framework that uses $K$ Restricted Boltzmann Machines ({\sc K-RBMs}) to learn non-linear manifolds in the raw image space. We solve the coupled problem of finding the right non-linear manifolds in the input space and associating image patches with those manifolds in an iterative Expection Maximization (EM) like algorithm to minimize the overall reconstruction error. Our clustering framework is comparable to the state of the art clustering approaches on a variety of synthetic and real datasets. We further employ K-RBMs for feature learning from raw images. Extensive empirical results over several popular image classification datasets show that such a framework outperforms the traditional feature representations such as the SIFT based Bag-of-Words (BoW) and convolutional deep belief networks.

This thesis is an account of our efforts to do our bit to contribute to this fascinating field. We admit that research in this field will continue for a long time, for solving computer vision is still a distant dream. We hope that we have earned the right to say one day, in retrospect, that we were on the right track.

(more...)

Year of completion:	March 2013
Advisor :	C. V. Jawahar

Related Publications

Downloads

A Study of X-Ray Image Perception for Pneumoconiosis Detection

Varun Jampani

Pneumoconiosis is an occupational lung disease caused by the inhalation of industrial dust. Despite the increasing safety measures and better work place environments, pneumoconiosis is deemed to be the most common occupational disease in the developing countries like India and China. Screening and assessment of this disease is done through radiological observation of chest x-rays. Several studies have shown the significant inter and intra reader observer variation in the diagnosis of this disease, showing the complexity of the task and importance of the expertise in diagnosis.

The present study is aimed at understanding the perceptual and cognitive factors affecting the reading of chest x-rays of pneumoconiosis patients. Understanding these factors helps in developing better image acquisition systems, better training regimen for radiologists and development of better computer aided diagnostic (CAD) systems. We used an eye tracking experiment to study the various factors affecting the assessment of this diffused lung disease. Specifically, we aimed at understanding the role of expertize, contralateral symmetric (CS) information present in chest x-rays on the diagnosis and the eye movements of the observers. We also studied the inter and intra observer fixation consistency along with the role of anatomical and bottom up saliency features in attracting the gaze of observers of different expertize levels, to get better insights into the effect of bottom up and top down visual saliency on the eye movements of observers.

The experiment is conducted in a room dedicated to eye tracking experiments. Participants consisting of novices (3), medical students (12), residents (4) and staff radiologists (4) were presented with good quality PA chest X-rays, and were asked to give profusion ratings for each of the 6 lung zones. Image set consisting of 17 normal full chest x-rays and 16 single lung images are shown to the participants in random order. Time of the diagnosis and the eye movements are also recorded using a remote head free eye tracker.

chest heatmap chest original

Results indicated that Expertise and CS play important roles in the diagnosis of pneumoconiosis. Novices and medical students are slow and inefficient whereas, residents and staff are quick and efficient. A key finding of our study is that the presence of CS information alone does not help improve diagnosis as much as learning how to use the information. This learning appears to be gained from focused training and years of experience. Hence, good training for radiologists and careful observation of each lung zone may improve the quality of diagnostic results. For residents, the eye scanning strategies play an important role in using the CS information present in chest radiographs; however, in staff radiologists, peripheral vision or higher-level cognitive processes seems to play role in using the CS information.

There is a reasonably good inter and intra observer fixation consistency suggesting the use of similar viewing strategies. Experience is helping the observers to develop new visual strategies based on the image content so that they can quickly and efficiently assess the disease level. First few fixations seem to be playing an important role in choosing the visual strategy, appropriate for the given image.

Both inter-rib and rib regions are given equal importance by the observers. Despite reading of chest x-rays being highly task dependent, bottom up saliency is shown to have played an important role in attracting the fixations of the observers. This role of bottom up saliency seems to be more in lower expertize groups compared to that of higher expertize groups. Both bottom up and top down influence of visual fixations seems to change with time. The relative role of top down and bottom up influences of visual attention is still not completely understood and it remains the part of future work.

Based on our experimental results, we have developed an extended saliency model by combining the bottom up saliency and the saliency of lung regions in a chest x-ray. This new saliency model performed significantly better than bottom-up saliency in predicting the gaze of the observers in our experiment. Even though, the model is a simple combination of bottom-up saliency maps and segmented lung masks, this demonstrates that even basic models using simple image features can predict the fixations of the observers to a good accuracy.

Experimental analysis suggested that the factors affecting the reading of chest x-rays of pneumoconiosis are complex and varied. A good understanding of these factors definitely helps in the development of better radiological screening of pneumoconiosis through improved training and also through the use of improved CAD tools. The presented work is an attempt to get insights into what these factors are and how they modify the behavior of the observers.

Year of completion:	January 2013
Advisor :	Jayanthi Sivaswamy

Related Publications

Downloads

Private Outlier Detection and Content based Encrypted Search

Related Publications

Downloads

Document Image Retrieval using Bag of Visual Words Model.

Related Publications

Downloads

Learning Representations for Computer Vision Tasks.

Related Publications

Downloads

A Study of X-Ray Image Perception for Pneumoconiosis Detection

Related Publications

Downloads

More Articles …