CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us
  • Login

Instance Retrieval and Image Auto-Annotations on Mobile Devices


Jay Guru Panda (homepage)

UseCaseImage matching is a well studied problem in the computer vision community. Starting from template matching techniques, the methods have evolved to achieve robust scale, rotation and translation invariant matching between two similar images. To this end, people have chosen to represent images in the form of a set of descriptors extracted at salient local regions that are detected in a robust, invariant and repeatable manner. For efficient matching, a global descriptor for the image is computed either by quantizing the feature space of local descriptors or using separate techniques to extract global image features. With this, effective indexing mechanisms are employed to perform efficient retrieval on large image databases.

Successful systems have been put in place in desktop and cloud environments to enable image search and retrieval. The retrieval takes fraction of a second on a powerful desktop or a server. However, such techniques are typically not well suited for less powerful computing devices such as mobile phones or tablets. These devices have small storage capacity and the memory usage is also limited. Computer vision algorithms run slower, even when optimized for the architecture of mobile processors. These handheld devices, or so-called smart devices are increasingly used for simple tasks that seem too trivial for a desktop or a laptop and can be easily accessed on a smaller display. Further, they are more popularly used for taking pictures (gradually replacing the space of digital cameras) owing to the improved embedded camera sensors. Hence, a user is more likely to use a query image from the mobile phone, rather than from the desktop. This increases the scope of applications that demand real-time search and retrieval result delivered on a mobile phone.

Many applications (or apps) on mobile smart phones communicate with the cloud to perform tasks that are infeasible on the device. People have attempted to retrieve images in this cloud-based model by either sending the image or its features to the server and receiving back relevant information. We are interested to solve this problem on the device itself with all the necessary computations happening on the mobile processor. It allows a user to not bother for a consistent network connection and the communication overheads associated with the search process. We address the range of applications that need simple text annotations to describe the image queried on the mobile. An interesting use case is a tourist/student/historian visiting a heritage site and can get all information about the monuments and structures on his mobile phone. Once the app is initialized on the device, the camera is opened and just pointing the camera or with a single click all the useful info about the monument is displayed on the screen instantly. The app doesn’t use the internet for communicating with any server and should do all computations on the mobile phone itself. Our methods optimize the process of instance retrieval to enable quick and light-weight processing on a mobile phone or a tablet. (more...)

 

Year of completion:  December 2013
 Advisor : C. V. Jawahar

Related Publications

  • Jayaguru Panda, Michael S Brown and C V Jawahar - Offline Mobile Instance Retrieval with a Small Memory Footprint Proceedings of International Conference on Computer Vision, 1-8th Dec.2013, Sydney, Australia. [PDF]

  • Jayaguru Panda, Shashank Sharma, C V Jawahar - Heritate App: Annotating Images on Mobile Phones Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • J Panda, C. V. Jawahar - Heritage App: Annotating Images on Mobile Phones IAPR Second Asian Conference on Pattern Recognition (ACPR2013), Okinawa (Japan), November, 2013 [PDF]

Downloads

thesis

ppt

A Framework for Community Detection from Social Media.


Chandrashekar V (homepage)

The past decade has witnessed the emergence of participatory Web and social media, bringing together people in many creative ways. Millions of users are playing, tagging, working, and socializing online, demonstrating new forms of collaboration, communication, and intelligence that were hardly imaginable just a short time ago. Social Media refers to interaction among people in which they create, share and exchange information and ideas in virtual communities and networks. Social Media also helps reshape business models, sway opinions and emotions, and opens up numerous possibilities to study human interaction and collective behavior in an unparalled scale.

In the study of complex networks, a network is said to have community structure if the nodes can be easily grouped into sets of nodes (even overlapping) such that each set of nodes is densely connected internally. Community structure are quite common in real networks. Social Networks often include community groups based on common location, interests, occupation etc. Metabolic Networks have communities based on functional groupings. Citation Networks form communities by research topic. Being able to identify these sub-structures within a network can provide insight into how network function and topology affect each other.

In this thesis, we design an end-to-end framework for identifying communities from raw, noisy social media data. The framework is composed of two important phases. First, we introduce a new method of converting the raw, noisy social media data into a weighted entity-entity co-occurrence based consistency network. This includes a simple iterative noise removal procedure for cleaning the entity consistency network by removing noisy entity pairs. Secondly, we propose an approach for identifying coherent communities from the weighted entity network, by introducing novel notions of community-ness and community, based on eigenvector centrality.

We use this framework to solve three different problems from two distinct domains. The first problem involves detecting communities from raw social media data and showing the application of the communities discovered in a recommendation engine setting. We use the framework for converting the raw data into a clean network and propose a highly parallelizable seed based greedy algorithm to detect as many communities as possible from the weighted entity consistency network. Our framework for community detection is unsupervised, domain agnostic, noise robust, computationally efficient and can be used in different Web Mining applications like Recommendation Systems, Topic Detection, User Profiling etc. We also design an recommendation system to evaluate our framework with existing state-of-art frameworks on a variety of large real-world social media data - Flickr, IMDB, Wikipedia, Bibsonomy, Medline. Our results outperform other frameworks by a huge margin.

The second problem is, given a set of communities of discovered by traditional community detection methods, we need to identify loose communities among them and partition them into compact ones. Here, we use the second phase of our framework to identify such loose communities using our notion of community-ness and propose an algorithm for partitioning such loose communities into compact ones. We illustrate the results of our algorithm over Amazon Product and Flickr Tag data and compare its superiority over the traditional community detection methods in a recommendation engine setting.

The third problem is about showing the application of such framework in an Image Annotation scenario in presence of noisy labels. The problem of image annotation is defined to be, given an unknown image, we need to predict labels which best describes the semantics of the image. This problem is best solved in a supervised nearest neighbor setting, and we show how our framework can be used to address this problem, when the labels associated with training images can be noisy and redundant. (more...)

 

Year of completion:  August 2013
 Advisor : C. V. Jawahar & Shailesh Kumar

 

Related Publications

  • Chandra Shekar V, Shailesh Kumar and C. V. Jawahar - Image Annotation in Presense of Noisy Lables Proceedings of 5th International Conference on Pattern Recognition and Machines Intelligence, 10-14 Dec. 2013, Kolkata, India. [PDF]

  • Chandra Shekar V, Shailesh Kumar and C V Jawahar - Compacting Large and Loose Communities Proceedings of the 2nd Asian Conference Pattern Recognition, 05-08 Nov. 2013, Okinawa, Japan. [PDF]

  • Shailesh Kumar, Chandrashekar V, C. V. Jawahar - Logical Itemset Mining Proceedings of IEEE International Conference on Data Mining Workshop, 10-13 Dec. 2012, ISBN 978-1-4673-5164-5,Brussels, Belgium. [PDF]


Downloads

thesis

ppt

Efficient Texture Mapping by Homogeneous Patch Discovery


R. Vikram Pratap Singh (homepage)

All visible objects have shape and texture. The main aim of computer graphics is to represent and render real world objects efficiently and realistically. To make the objects look realistic from a geometric point of view, we have to make sure that the shape and texture of the object are accurate. In practice, shape is either hand crafted using 3D modeling tools such as Blender, or is acquired from real world objects using 3D reconstruction techniques. Texture is the second aspect of appearance that must be ensured to make the rendered objects look real. The texture has to be pasted on the surface in such a manner that it perceptual corresponds to the correct part of the mesh. This process of pasting the texture on the surface of a mesh model is called Texture mapping. Texture mapping can be done in two ways. First way is to texture a surface by synthesizing the texture directly on the surface. The second method is to wrap a synthesized texture around the surface and cut and merge the seams so that it fits correctly on the surface. To visualize this problem we can think of texture as a cloth. The first method can be thought of weaving around the body to fit it exactly like a sweater, while the second method is like cutting and stitching an already woven cloth according to the shape of the mesh model. In this thesis we propose a new method that follows the second approach. The primary goal of our method is to map a texture on to large mesh model at interactive rates, while maintaining the perceived quality.

The primary technique for mapping a flat texture (image) onto an arbitrary shaped mesh model is to parameterize the shape, which defines a mapping from points on the mesh surface onto a 2D plane. When parameterizing these mesh models, we try to maintain the geometric correspondence between the mesh vertices intact to reduce the distortion of the texture. Typically, parameterizing a mesh model involves solving a set of linear equations representing the geometric correspondence of the triangles. The approach involves defining an energy function for the mapping and searching for a global optimum which minimizes the distortions during the mapping. Such methods are capable to achieving texture mappings that has high perceptual quality. However, typical energy minimization procedures are computationally expensive and cannot be applied for real time applications or with large mesh models.

To complement the proposed texture mapping algorithm, we introduce a method to make a texture self tileable. This allows us to store only the texel structure, if the required texture is repetitive. We present qualitative and quantitative results in comparison with several other texture mapping algorithms. The proposed algorithm is robust in terms of the output quality and can find applications in different scenarios such as rapid prototyping, where you require interactive texture mapping rates and the ability to deal with dynamic mesh topology. It can also be used for applications such as large monument visualizations, where we need to deal with large and noisy mesh models that are generated using techniques such as multi-view stereo. (more...)


Some Results :

 

rabbitursulaDragonHorsePegasusBuddha

 

 

 

 

 

 

 

 

Year of completion:  June 2014
 Advisor :

Anoop M. Namboodiri


Related Publications

  • R. Vikram Pratap Singh, Anoop M Namboodiri - Efficient texture mapping by homogeneous patch discovery, ICVGIP 2012 (Oral).

Downloads

thesis

ppt

Image Mosaicing of Neonatal Retinal Images.


Akhilesh Bontala (homepage)

Image mosaicing is a data fusion technique used for increasing the field of view of an image. Deriving the mosaiced image entails integrating information from multiple images. Image mosaicing permits overcoming the limitations of a camera lens and help create a wide field of view image of a 3D scene and hence has a wide range of applications in various domains including medical imaging. This thesis concerns the task of mosaicing specific to neonatal retinal images for aiding the doctors in the diagnosis of Retinopathy of prematurity (ROP). ROP is a vascular disease that affects low birth-weight, premature, infants. The prognosis of ROP relies on information on the presence of abnormal vessel growth and fibrosis in periphery. Diagnosis is based on a series of images obtained from a camera (such as RetCam), to capture the complete retina. Typically, as many as 20 to 30 images are captured and examined for diagnosis. In this thesis, we present a solution for mosaicing the RetCam images so that a comprehensive and complete view of the entire retina can be obtained in a single image for ROP diagnosis. The task is challenging given that the quality of the images obtained is variable. Furthermore, the presence of large spatial shift across consecutive frames makes them virtually unordered.

We propose a novel, hierarchical system for efficiently mosaicing an unordered set of RetCam images. It is a two-stage approach in which the input images are first partitioned into subsets and images in each subset are spatially aligned and combined to create intermediate results. Given n images, the number of registrations required to generate a mosaic by conventional approaches to mosaicing is O(n2) whereas it is O(n) for the proposed system. These images are then again spatially aligned and combined to create a final mosaic. An alignment technique for low quality retinal images and a blending method for combining images based on vessel quality is also designed as part of this framework. Individual components of the system are evaluated and compared with other approaches. The overall system was also evaluated on a locally-sourced dataset consisting of neonatal retinal images of 10 infants with ROP. Quantitative results show that there is a substantial increase in the field of view and the vessel extent is also improved in the generated mosaics. The generated mosaics have been validated by the experts to provide sufficient information for the diagnosis of ROP. (more...)

 

Year of completion:  July 2014
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Akhilesh Bontala, Jayanthi Sivaswamy and Rajeev R Pappura -Image Mosaicing of Low Quality Neonatal Retinal Images Proceedings of IEEE International Symposium on Biomedical Imaging 2-5 May. 2012, ISBN 978-1-4577-1858-8, pp. 720-723, Barcelona, Spain. [PDF]


Downloads

thesis

ppt

Combining Data Parallelism and Task Parallelism for Efficient Performance on Hybrid CPU and GPU Systems


Aditya Deshpande (homepage)

In earlier times, computer systems had only a single core or processor. In these computers, the number of transistors on-chip (i.e. on the processor) doubled every two years and all applications enjoyed free speedup. Subsequently, with more and more transistors being packed on-chip, power consumption became an issue, frequency scaling reached its limits and industry leaders eventually adopted the paradigm of multi-core processors. Computing platforms of today have multiple cores and are parallel. CPUs have multiple identical cores. A GPU with dozens to hundreds of simpler cores is present on many systems. In future, other multiple core accelerators may also be used.

With the advent of multiple core processors, the responsibility of extracting high performance from these parallel platforms shifted from computer architects to application developers and parallel algorithmists. Tuned parallel implementations of several mathematical operations, algorithms on graphs or matrices on multi-core CPUs and on many-core accelerators like the GPU and CellBE, and their combinations were developed. Parallel algorithms developed for multi-core CPUs primarily focussed on decomposing the problem into a few independent chunks and using the cache efficiently. As an alternative to CPUs, Graphics Processing Units (GPUs) were the other most cost-effective and massively parallel platforms, that were widely available. Frequently used algorithmic primitives such as sort, scan, sparse matrix vector multiplication, graph traversals, image processing operations etc. among others were efficiently implemented on GPU using CUDA. These parallel algorithms on the GPU decomposed the problem into a sequence of many independent steps operating on different data elements and used shared memory effectively.

But the above operations -- statistical, or on graphs, matrices and list etc. -- constitute only portions of an end-to-end application and in most cases these operations also provide some inherent parallelism (task or data parallelism). The problems which lack such task or data parallelism are still difficult to map to any parallel platform, either CPU or GPU. In this thesis, we consider a few such difficult problems -- like Floyd-Steinberg Dithering (FSD) and String Sorting -- that do not have trivial data parallelism and exhibit strong sequential dependence or irregularity. We show that with appropriate design principles we can find data parallelism or fine-grained parallelism even for these tough problems. Our techniques to break sequentiality and addressing irregularity can be extended to solve other difficult data parallel problems in the future. On the problem of FSD, our data parallel approach achieves a speedup of 10X on high-end GPUs and a speedup of about 3-4X on low-end GPUs, whereas previous work by Zhang et al. dismiss the same algorithm as lacking enough parallelism for GPUs. On string sorting, we achieve a speedup of around 10-19X as compared to state-of-the-art GPU merge sort based methods and our code will be available as part of standard GPU Library (CUDPP).

 It is not enough to have a truly fine-grained parallel alogrithm for only a few operations. Any end-to-end application consists of many operations, some of which are difficult to execute on a fine-grained parallel platform like GPU. At the same time, computing platforms consist of CPU and a GPU which have complementary attributes. CPUs are suitable for some heavy processing by only a few threads i.e. they prefer task parallelism. GPUs is more suited for applications where large amount of data parallel operations are performed. Applications can achieve optimal performance by combining data parallelism on GPU with task parallelism on CPU. In this thesis, we examine two methods of combining data parallelism and task parallelism on a hybrid CPU and GPU computer system: (i) pipelining and (ii) work sharing. For pipelining, we study the Burrows Wheeler Compression (BWC) implementation in Bzip2 and show that best performance can be achieved by pipelining its different stages effectively. In contrast, a previous GPU implementation of BWC by Patel et al. performed all the tasks (BWT, MTF and Huffman encoding) on the GPU and it was 2.78X slower than CPU. Our hybrid BWC pipeline performs about 2.9X better than CPU BWC and thus, about 8X faster than Patel et al. For work sharing, we use FSD as an example and split the data parallel step between CPU and GPU. The Handover and Hybrid FSD algorithms, which use work sharing to exploit computation resources on both CPU and GPU, are faster than the CPU alone and GPU alone parallel algorithms.

In conclusion, we develop data parallel algorithms on the GPU for difficult problems of Floyd-Steinberg Dithering, String Sorting and Burrows Wheeler Transform. In earlier literature, simpler problems which provided some degree of data parallelism were adapted to the GPUs. The problems we solve on GPU involve challenging sequential dependency and/or irregularity. We show that in addition to developing fast data parallel algorithms on GPU, application developers should also use the CPU to execute tasks in parallel with GPU. This allows an application to fully utilize all resources of an end-user's system and provides them with maximum performance. With computing platforms poised to be predominantly hetergoneous, the use of our design principles will prove critical in obtaining good application level performance on these platforms. (more...)

Year of completion:  July 2014
 Advisor : Prof. P. J. Narayanan

Related Publications

  • Aditya Deshpande and P. J. Narayanan - Can GPUs Sort Strings Efficiently ? Proceedings of the IEEE Conference on High Performance Computing, 18-21 Dec. 2013, Bangalore, India. [PDF]

  • Aditya Deshpande, Ishan Misra and P J Narayanan - Hybrid Implementation of Error Diffusion Dithering Proceedings of 18th International Conference on High Performance Computing 18-21 Dec. 2011, E-ISBN 978-1-4577-1949-3, Print ISBN 978-1-4577-1951-6, pp. 1-10, Bangalore, India. [PDF]

  • Aditya Deshpande and P. J. Narayanan - Fast Burrows Wheeler Compression Using CPU and GPU (Under Review, ACM TOPC).

Downloads

thesis

ppt

More Articles …

  1. Repetition Detection and Shape Reconstruction in Relief Images.
  2. Solving Decomposition Problems in Computer Vision using Linear Optimization
  3. Learning Semantic Interaction Among Indoor Objects
  4. Fingerprint Image Enhancement Using Unsupervised Hierarchical Feature Learning
  • Start
  • Prev
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. MS Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.