An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures
Pratyush Bhatt (homepage)
Nearest neighbor retrieval can be defined as the task of finding the objects that are most similar to a query from a given a database of objects. It find its application in areas ranging from medical domain, financial sector, computer vision, computational sciences, computational geometry, information retrieval, etc. With the expansion of internet, the amount of digitized data is increasing by leaps and bounds. Retrieval of nearest nearest neighbors accurately and efficiently becomes challenging in such a scenario as the database contain a large number of objects. The problem gets worsen when the underlying distance measure used to compute [dis]similarity is computationally expensive. In such a scenario, sequential scan of data would take a lot of time which is the biggest problem for any online retrieval system. For example in biometric authentication systems, a particular person biometric template is compared against all the registered samples in a database to identify the person. This process can be extremely time consuming in large databases even if the matching algorithm is extremely fast. For example, to do background check of a person who is crossing the border using the complete IAFIS,(a biometric person identification system at the U.S. border crossings), requires around 55 million comparisons. Even with the state of the art matching algorithms and computing facilities, this would take close to 10 minutes, which is not practical considering the millions of people who cross the border every month. Even for criminal investigations, it is desirable to get a quick and approximate search done immediately rather than the typical turn-around time of a few days for a search. This thesis proposes a novel method for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed technique is domain-independent, and can be applied in arbitrary spaces, including non- Euclidean and non-metric spaces. The main contributions of our work are :
-
A representation scheme for objects in a dataset that allows for fast retrieval of approximate nearest neighbors in non-euclidean space. The approach named Hierarchical LocalMaps (HLM),make use of manifold learning techniques to compute linear approximation of local neighborhoods.
-
Search mechanism combined with filter and refine approach is proposed that minimizes the number of exact distance computations for computationally expensive distance measure.
-
Study performance of our scheme on biometric data and study the parameters affecting its performance.
Results of k-nearest neighbor retrieval as well as classification results on UNIPEN dataset shows the advantages of using HLM over state-of-the-art approximate nearest neighbor retrieval algorithms. Classification result on CASIA iris dataset by using average gabor response for a block as the feature vector along with Euclidean distance as the soft biometric measure in conjugation with Daugman feature vector and hamming distance as the hard biometric shows the advantage of using a softer metric over a hard metric for indexing.
Year of completion: | 2010 |
Advisor : | Anoop M. Namboodiri |
Related Publications
Pratyush Bhatt and Anoop Namboodiri - Hierarchical Local Maps for Robust Approximate Nearest Neighbour Computation Proceedings of the 7th International Conference on Advances in Pattern Recognition (ICAPR 2009), Feb. 4-6, 2009, Kolkotta, India. [PDF]