CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Scalable Clustering for Vision using GPUs.


K Wasif Mohiuddin (homepage)

Clustering algorithms have wide applications in Computer Vision, Data mining, Data Visualization, etc. Clustering is an important step for indexing and searching of documents, images, video, etc. Clustering large numbers of high-dimensional vectors is very computation intensive. CPUs are unable to handle such load and consume sometimes days and even weeks to cluster large data. GPUs are being used for general purpose computing of wide range of problems which require high computational power. Today's GPUs deliver as high as 1.5 TFLOPs. The GPU has evolved over the time not only by increasing number of cores but also major architectural changes like faster access of data, more shared memory, threads per block, etc. Such changes have enabled the GPU programmers to exploit its architectural features to the fullest and achieve high performance. In this thesis, we focus on K-Means algorithm which is a widely used unsupervised clustering algorithm. We develop a GPU based K-Means implementation for large datasets; also we have used this implementation to develop a video organizing application.

GPU K-Means: We present the design and implementation of the K-Means clustering algorithm on the modern GPU. All steps are performed entirely on the GPU efficiently in our approach. We also present a load balanced Multi-node, Multi-GPU implementation which can handle up to 6 million, 128-dimensional vectors. We use efficient memory layout for all steps to get high performance. The GPU accelerators are now present on high-end workstations and low-end laptops. Scalability in the number and dimensionality of the vectors, the number of clusters, as well as in the number of cores available for processing are important for usability to different users. Our implementation scales linearly or near-linearly with different problem parameters. We achieve up to 2 times increase in speed compared to the best GPU implementation for $K$-Means on a single GPU. We obtain a speed up of upto 170 on a single Nvidia Fermi GPU compared to a standard sequential implementation. We are able to execute single iteration of $K$-Means in 136 seconds on off-the-shelf GPUs to cluster 6 million vectors of 128 dimensions into 4K clusters and in 2.5 seconds to cluster 125K vectors of 128 dimensions into 2K clusters.

cluster2 cluster1

Video Organizer: Video data is increasing rapidly along with the capacity of storage devices owned by a lay user. Users have moderate to large personal collections of videos and would like to keep them in an organized manner based on its content. Video organizing tools for personal users are way behind even the primitive image organizing tools. We present a mechanism in this thesis to help ordinary users organize their personal collection of videos based on categories they choose. We cluster the PHOG features extracted from selected key frames using K-Means to form a representation for each user-selected category during the learning phase. During the organization phase, labels from a K-NN classifier on these cluster centers for each key frame are aggregated to give a label to the video while categorizing. Video processing is computationally intensive. To perform the computationally intensive steps involved, we exploit the CPU as well as the GPU that is common even on personal systems. Effective use of the parallel hardware on the system is the only way to make the tool scale reasonably to large collections that will be available soon. Our tool is able to organize a set of 100 sport videos of total duration of 1375 minutes in about 9.5 minutes. The process of learning the categories from 12 annotated videos of duration 165 minutes took 75 seconds on a GTX 580 card. These were on a standard desktop with an off-the-shelf GPU. The labeling accuracy is about 96% on all videos.

The ideas, approaches proposed in this thesis have been implemented and validated with experimental results. For large data-sets we developed a scalable, efficient K-Means clustering on GPU along with a Multi GPU framework and used it to develop a video organizer application providing high accuracy. (more...)

 

Year of completion:  August 2012
 Advisor : P. J. Narayanan

Related Publications

  • K. Wasif Mohiuddin and P.J. Narayanan - Scalable Clustering Using Multiple GPUs Proceedings of International Conference on High Performance Computing 18-21 Dec. 2011, E-ISBN 978-1-4577-1949-3, Print ISBN 978-1-4577-1951-6, Bangalore, India. [PDF]

  • K. Wasif Mohiuddin and P.J. Narayanan - A GPU-Assisted Personal Video Organizer Proceedings of 3rd Workshop on GPU on Computer Vision at ICCV (International conference on Computer Vision) 04-11 Nov. 2011, Barcelona, Spain. [PDF]

 


Downloads

thesis

ppt

Cascaded Filtering of Biometric Identification Using Random Projection


Atif Iqbal (homepage)

Biometrics has a long history, and is possibly as old as the human race itself. It is often used as an advanced security measure to safeguard important artifacts, buildings, information, etc. Biometrics is increasingly being used for secure authentication of individuals and is making its presence felt in our lives. With the fast increase in computational power, such systems can now be deployed on a very large scale. However, efficiency in large scale biometric matching is still a concern as the problem of deduplication (removal of duplicates) of a biometric database with N entries is O(N2), which can be extremely challenging for large databases. To make this problem tractable, many indexing methods have been proposed that would speed up the comparison process. However, the expectation of accuracy in such systems combined with the nature of biometric data makes the problem a very challenging one.

Biometric identification often involves explicit comparison of a probe template against each template stored in a database. An effective approach to speed up the process is that of filtering, where a lightweight comparison is used to reduce the database to smaller set of candidates for explicit comparison. However, most existing filtering schemes use specific features that are hand-crafted for the biometric trait at each stage of the filtering. In this work, we show that a cascade of simple linear projections on random lines can achieve significant levels of filtering. Each stage of filtering consists of projecting the probe onto a specific line and removal of database samples outside a window around the probe. The approach provides a way of automatic generation of filters and avoids the need of developing specific features for different biometric traits. The method also provides us with a variety of parameters such as the projection lines, the number and order of projections, and the window sizes to customize the filtering process to a specific application. The experiments are performed on the fingerprints, palmprints and iris.

For both iris and palmprint datasets, the representation that we use (before projection) is the popularly used thresholded filter response from pre-defined regions of the image. Experimental results show that using an ensemble of projections reduce the search space by 60% without increasing the false negative identification rate in palmprint. However for stronger biometrics such as iris, the approach does not yield similar results. We further explore this problem to find a solution, specifically for the case of fingerprints.

The fundamental approach here is to explore the effectiveness of weak features in a cascade for filtering fingerprint databases. We start with a set of potential indexing features computed from minutiae triplets and minutiae quadruplets. We show that by using a set of random lines and the proposed fitness function, one can achieve better results that optimized projection methods such as PCA or LDA. Experimental results on fingerprint datasets show that using an ensemble of projections we can reduce the penetration to 26% at a hit rate of 99%. As each stage of the cascade is extremely fast, and filtering is progressive along the cascade, one can terminate the cascade at any point to achieve the desired performance. One can also combine this method with other indexing methods to improve the overall accuracy and speed. We present detailed experimental results on various aspects of the process on the FVC 2002 dataset.

The proposed approach is scalable to large datasets due to the use of random linear projections and direcly lends to pipelined processing. The method also allows the use of multiple existing features without affecting the computation time.

 

Year of completion:  2012
 Advisor : Anoop M. Namboodiri

Related Publications

  • Atif Iqbal and Anoop M. Namboodiri - Cascaded Filtering for Biometric Identification using Random Projections Proceedings of Seventh National Conference on Communications (NCC 2011),28-30 Jan, 2011, Bangalore, India. [PDF]

  • Atif Iqbal and Anoop M. Nambodiri - Cascaded Filtering for Fingerprint Identification using Random Projection, IEEE Compter Society Conference Computer Vision and Pattern Recognition Workshops, June 2012

Downloads

thesis

ppt

Mixed-Resolution Patch-Matching


Harshit Sureka (homepage)

The problem of matching small image regions or patches between images is central to many image editing and computer vision tasks. In this thesis, we propose two related solutions to the problem of fast and global patch-matching. First, we present a multi-resolution spatial pyramid based solution, which we call Pyramid Patch-Matching (PPM).  It uses a simple coarse-to-fine framework powered by exhaustive search at the coarsest resolution and local directed search at finer resolutions. PPM quickly finds accurate patch correspondences between images or image regions. The search step in our algorithm is flexible to accommodate the use of other nearest-neighbor algorithms within its framework. A proposed increase K technique enables PPM to store more number of matches than required at coarse resolutions which maintains a wide search range. This improves accuracy by up-to 2%. Its effect is most significant when less number of matches are required. The {\em early-termination} technique preempts the distance calculation between patches, if the distance exceeds that of the farthest match currently stored. This speeds up the algorithm by up-to 1.7x. Following the downsampling rule, the {\em reduce patch-size} technique matches smaller patches than required at coarser resolutions of the pyramid. Empowered by these techniques, PPM achieves state-of-the-art accuracy outperforming previous standards of Coherency Sensitive Hashing (CSH) and PatchMatch without compromising on execution-time. PPM finds up-to 4% more of the ground-truth matches and achieves lower error values compared to CSH, the current standard.

Several parameters are provided to tune the search for good matches based upon the application. We perform experiments to study effects of these parameters on matching accuracy and computation time. While consistently achieving lower error values for RGB distance between matched patches, the simplicity of our algorithm also enables fast parallel implementations, providing further advantage over previous randomized, hashing or tree-based iterative approaches. On CPU cores, it achieves a near-linear time speed-up w.r.t. the number of cores and a GPU implementation provides a further speed-up of up to 70x compared to a CPU implementation run on a quad-core machine (250x w.r.t. single core).

image

We observe that the matching accuracy of PPM suffers from low confidence matching of patch-vectors at coarse resolutions. To solve this, we propose a novel way to mix information by concatenating corresponding patch-vectors from finer resolution levels with patch-vectors of coarse levels. We call these {\em Mixed-Resolution Vectors} (MR-vectors). Unlike traditional patch-vectors that contain information from a single resolution level, MR-vectors contain more relevant information for matching. We build over our PPM algorithm by substituting MR-vectors in place of traditional patch-vectors. We call this the Mixed-Resolution Patch-Matching (MRPM) algorithm. MRPM further improves the accuracy of matches found by PPM with a marginal increase in time. It captures up-to 4\% more of the ground-truth matches (8% more than CSH) and achieves near-optimal match accuracy. We also show that the MR-approach has computational benefits over an approach which uses an increased search range. We describe the effects of changing the number of resolution levels mixed and show that a balance between accuracy and time is achieved at mixing 2 levels. We perform various experiments to compare MRPM with PPM and other previous approaches, in all of which MRPM consistently performs better. Fast parallel multi-core and GPU implementations of MRPM follow from parallel implementations of PPM. We develop applications of image denoising, image summarization and auto-crop based on our MRPM algorithm. Results with better visual quality are obtained when MR-vectors are used. The computational bottleneck of these algorithms is the search for nearest neighbors of patches. We demonstrate the utility of MRPM as a fast patch-matching engine which empowers these applications. We open up new possibilities of research to explore the scope of MR-vectors which we believe has applications in several other image editing and computer vision tasks.

 

Year of completion:  December 2012
 Advisor : P. J. Narayanan

Related Publications

  • Harshit Surekha and P J Narayanan -  Mixed-Resolution Patch-Matching Proceedings of 12th European Conference on Computer Vision, 7-13 Oct. 2012, Vol. ECCV 2012, Part-VI, LNCS 7577, pp. 187-198, Firenze, Italy. [PDF]


Downloads

thesis

ppt

Private Outlier Detection and Content based Encrypted Search


Nisarg Raval (homepage)

Today, our daily lives are constantly driven and influenced by services like social networking, personalized advertisements, cloud infrastructure etc. This makes any form of personal or professional data vulnerable to both security and privacy threats. User looses control over the data as soon as its published on the Internet. Recently, popular photo sharing service Instagram tried to change its privacy policy, giving itself the right to sell users’ information including their photographs to advertisers without any notification or compensation. With incidents like AOL and Netflix debacle, organizations are refraining from releasing customer data even after anonymization. This hinders future research and developments in many ways due to unavailability of customer data, behavior and patterns. Techniques like k-anonymity and l-diversity help to preserve user privacy to some extent but fails to provide any solid guarantees. Recently proposed, differential privacy provides promising results in this area with theoretical bounds on information leak and preservation of privacy even against the auxiliary information. However, differential privacy considers individual’s record independent of other data and can not be applied to linked data like social graph, time series etc. Furthermore, all these techniques compromise utility of the data while maintaining privacy.

At present, two approaches are widely used to ensure security of the data. First one is based on Secure Multiparty Computations (SMC), where users communicate with each other using SMC protocol to securely evaluate any function on their data. SMC gives provable security without compromising utility of the data. However, the solutions based on the general protocol of SMC requires enormous computational and communication overhead, thus limiting the practical deployment of the secure algorithms. The second approach is to encrypt the data at the user end and perform any operations in the encrypted domain to preserve the confidentiality of the data. This leads to difficulty in performing various operations like search, retrieval and computations. Homomorphic encryption allows secure computations over encrypted data but its impractical due to high computational cost.

In this thesis, we focus on developing privacy preserving, communication and computationally efficient algorithms in the domain of Data Mining and Information Retrieval. Specifically, we design efficient privacy preserving solutions for two widely used tasks - outlier detection and content based similarity search over encrypted data. Private computation of outliers among data collected from different sources has many applications in the area of data mining. First, we propose a novel algorithm of distance based outlier detection using Locality Sensitive Hashing (LSH) in a distributed setting where data is horizontally partitioned among data owners. We show that the proposed algorithm is scalable in case of large datasets with high dimensionality. Then, we extend our distributed outlier algorithm to propose an algorithm for private outlier detection that broke the previous known bounds of quadratic cost. We compare our algorithm with the state of the art technique and show that our algorithm is better both in terms of computational as well as communication complexity.

Next, we explore the properties of hierarchical index structures and propose an algorithm for content based similarity search over encrypted data that do not reveal user query to the server. To the best of our knowledge, we are the first one to propose a definition and a scheme for the problem of Content Similarity Searchable Symmetric Encryption (CS-SSE). Finally, we extend our CS-SSE scheme for Private Content Based Image Retrieval, which is essential for many applications like searching health records using CT scans and patent search for logos. We also show that our algorithm achieves optimal computational cost with similar accuracy and privacy when compared to the existing techniques.

 

Year of completion:  2013
 Advisor : C. V. Jawahar and Dr. Kannan Srinathan

Related Publications


Downloads

thesis

ppt

Document Image Retrieval using Bag of Visual Words Model.


Ravi Shekhar (homepage)

With the development of computer technology, storage has become more affordable. So, all traditional libraries are making their collections electronically available on internet or digital media. To search in these collections, better indexing structure is needed. It can be done manually or automatically. But manual indexing is time consuming and expensive. So, automation is the only feasible option. First approach in this direction was to convert document images into text by OCR and then perform traditional search. This method works well for European languages whose optical character recognizers (OCRs) are robust but did not work reliably for Indian and non-European languages like Hindi, Telugu and Malayalam etc. Even for European languages, OCRs are not available in the case of highly degraded document images.

To overcome limitations of OCR, word spotting was emerged as an alternative. In word spotting, given word image is represented as features and matching is done based on feature distance. Main challenges in word spotting are extraction of proper features and feature comparison mechanism. Profile feature is one of the popular ways to represent word image and Euclidean distance is used for comparison. To overcome word length, all images are scaled to same size, due to which lots of information is lost. Later, DTW based approach was proposed for matching which does not require scaling of images. The problem with DTW based approach is that it is very slow and can not be used for practical and large scale application.

In first part of this thesis, we explain a Bag of Visual Words (BoVW) based approach to retrieve similar word images from a large database, efficiently and accurately. We show that a text retrieval system can be adapted to build a word image retrieval solution. This helps in achieving scalability. We demonstrate the method on more than 1 Million word images with a sub-second retrieval time. We validate the method on four Indian languages, and report a mean average precision of more than 0.75. We represent the word image as histogram of visual words present in the image. Visual words are quantized representation of local regions, and for this work, SIFT descriptors at interest points are used as feature vectors. To address the lack of spatial structure in the BoVW representation, we re-rank the retrieved list. %This significantly improves the performance. This provides significant improvement in the performance.

Later, we have also performed enhancements over BoVW approach. Enhancements are in terms of query expansion, text query support and Locality constrained Linear Coding (LLC) based retrieval system. In query expansion, we have used initial results to modify our initial query and obtained better results. In BoVW model, query is given by example but users are generally interested in query as text like ``Google'', ``Bing'' etc. Text query is also supported by same model as BoVW. Later, LLC is used to achieve high recall. LLC scheme learns a data representation using nearest codeword and achieves improvement over performance.

In most of the scalable document search, features like SIFT, SURF are used. These are originally designed for natural images. Natural images have lots of variation and extra information in terms of gray images. Document images are binary images, so need features which can be specifically designed for them. We have proposed patch based features using profile features. We have compared our proposed feature with SIFT and obtained similar performance. Our feature has advantage that it is faster to compute compared to SIFT, which makes our pre-processing step very fast.

We have demonstrated that recognition free approach is feasible for large scale word image retrieval. In future work, similar approach can be used for hand written, camera based retrieval and natural scene text retrieval etc. Also, a better and efficient descriptor is needed specifically designed for document words.

 

Year of completion:  June 2013
 Advisor : C. V. Jawahar

Related Publications


Downloads

thesis

ppt

More Articles …

  1. Learning Representations for Computer Vision Tasks.
  2. A Study of X-Ray Image Perception for Pneumoconiosis Detection
  3. Robust Motion Estimation and Analysis based on Statistical Information
  4. Registration of Retinal Images
  • Start
  • Prev
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.