Techniques for Organization and Visualization of Community Photo Collections
Kumar Srijan (homepage)
Due to the digital and information revolution we are witnessing presently, there are a huge and continously increasing number of images present on the Internet. For example, a query for ``Eiffel Tower" on Google Images returns more than two million images. The easy accessiblity of this data provides us with unique opportunities to mine the contents of these images not only to do automatic organization, but also for providing interactive interfaces to browse, explore and query. This task is challenging given the massive size and the continous growth of the collection. To add to this, these collections are taken in varying imaging conditions, with different cameras, at different resolutions, from different perspectives and have different degrees of occlusions present in them. Hence, for image collections even the simplest of tasks such as finding matching images turn out to be hard.
The Computer Vision community has been actively designing and redesigning algorithms to overcome these challenges. One of the most widespread and noticable idea employed is that of extracting robust, invariant and repeatable local features in the images, followed by the subsequent quantization of the feature space as visual words. The similarity of images is gauged by the correspondence and similarity of thier local features. Verifications of the matchings is done to eliminate spurious matches. Building a data structure such as an inverted index over these visual words can catalyse the process of discovery of matching features. This mining of similar images by matching features, forms the basis of all high level algorithms such as clustering, skeletonization, summarization etc. which help in the organization, exploration and querying of these image collections. This thesis presents two novel algorithms which help in achieving this goal.
First, we introduce a novel indexing scheme that makes it possible to do exhaustive pairwise matching in large image collections. The quantization of image features and thier indexing provide on a limited amount of leverage for speeding up the image matching process which depends upon the sparsity the posting lists. This sparsity is controlled by the number of visual words used which after a point cannot be increased arbitrarily without affecting recall. Our scheme, generates higher order features by pairing up nearby features and encoding their affine geometry. This provides a much larger feature space to index which can be subseqently reprojected to any desired size by defining appropriate hash functions. We implement our indexing scheme by providing an analogy with Bloom filters. The higher order features extracted in the images are inserted into their respective equally sized Bloom filters using a single hash function. This unformity in Bloom filters allows for only a single inverted index to be able to index the hash buckets of all the Bloom filters, and thus providing a simplified interface to implicity query all the Bloom filters. We choose the size of these Bloom filters to be in proportion to the size of the database. This enables us to do querying in constant time, since the average size of the posting lists becomes constant. Also, the use of such large implicit Bloom filters is able to sufficiently mitigate the negative effects of using a single hash functions. As a result, we are able to do exhaustive pairwise matching over large databases of upto 100K images in linear time complexity.
Second, we present a fast and easy to implement framework for browsing large image collections of landmarks and monumental sites. The existing framework ``Phototourism" would require doing a reconstruction of the whole scene by employing Structure from Motion package called Bundler. This requires pairwise matching required to generate tracks of matching features across images. Next an incremental approach is applied, starting with a seed reconstruction and adding more matching images into the reconstruction. This, however, requires continous refinement of the whole reconstruction using a computationally expensive procedure called bundle adjustment. The pairwise matching and bundle adjustment become the limiting factors in scaling this technique to large image collections.
To overcome the issues faced with ``Phototourism", our framework employs independent partial reconstructions of the scene. We use standard Bag of words model and indexing techniques to determine closest neighbours of each image in the collection, and do a local reconstruction corresponding to each image using only the neighbouring images. This requires us to only solve multiple simple reconstructions problems instead of one large reconstruction problem, making it computationally more tractable. Our browsing interface hops from one reconstruction to another to give the user an illusion of browsing a global reconstruction. Our approach also makes it easy to adapt to growing image collections, as adding an image only incurs a cost of creating a new independent reconstruction. We validate our approach with a Golkonda Fort image dataset consisting of 6K images.
In summary, the techniques presented in this thesis for organizing large image collections tries to solve the problem of doing exhausitive pairwise matching in image collection in a scalable manner, for which a novel indexing scheme is proposed. We also present a novel technique for overcoming the problems faced while doing ``Structure from Motion'' on large image collections. We hope that these techniques will find application for browsing and mining matching images in large image collections, and also in creating virtual experiences of several monuments and sites across the globe. (more...)
Year of completion: | July 2013 |
Advisor : | C. V. Jawahar |