CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

High Quality Image Reconstruction by Optimal Capture and Accurate Registration


Himanshu Arora (homepage)

Generating high-resolution and high quality images from degraded images has a variety of applications in space imaging, medical imaging, commercial videography and recognition algorithms. Recently, camera-equipped cell-phones are widely being used for day-to-day photography. Limitations of the image capturing hardware and environmental attenuations introduce degradations into the captured image. The quality of reconstructed image obtained from most of the algorithms are highly sensitive to accurate computation of different underlying parameters such as blur, noise, geometric deformation, etc. Variations in blur, illumination and noisy conditions together with occlusions further make the computation of these underlying parameters difficult.

One of the ways of generating a high-quality image is by fusing multiple images, which are displaced at sub-pixel levels. This method is also popularly known as multi-frame Super-resolution (SR). Most multi-image SR algorithms assume that the exact registration and blur parameters between the constituent frames are known. Traditional approaches for image registration are either sensitive to image degradations such as variations in blur, illumination and noise, or are limited in the class of image transformations that can be estimated. These conditions are frequently violated in real-world imaging, where specular surfaces, close light sources, small sensors and lenses create large variations in illumination, noise, and blur within the scene. Interestingly, these are the exact situations, where one would like to employ SR algorithms.

registrationAlgorithm

 


sr We explore an alternate solution to the problem of robustness in the registration step of a SR algorithm. We present an accurate registration algorithm that uses the local phase information, which is robust to the above degradations. The registration step is formulated as optimization of the local phase alignment at various spatial frequencies. We derive the theoretical error rate of the estimates in presence of non-ideal band-pass behavior of the filter and show that the error converges to zero over iterations. We also show the invariance of local phase to a class of blur kernels. Experimental results on images taken under varying conditions are demonstrated.

rightzoom Recently, Lin and Shum has shown an upper limit on multi-frame SR techniques. For practical purposes this limit is very small. Another class of SR algorithms formulate the high-quality image generation as an inference problem. High-resolution image is inferred from a set of learned training patches. This class of algorithm works well for natural structures but for many man-made structures this technique does not produce accurate results. We propose to capture the images at optimal zoom from the perspective of image super-resolution. The images captured at this zoom has sufficient amount of information so that it can be magnified further by using any SR algorithm which promotes step edges and certain features. This can have a variety of applications in consumer cameras, large-scale automated image mosaicing, robotics and improving the recognition accuracy of many computer vision algorithms. Existing efforts are limited to image a pre-determined object at the right zoom. In the proposed approach, we learn the patch structures at various down-sampling factors. To further enhance the output we impose the local context around the patch in a MRF framework. Several constraints are introduced to minimize the extent of zoom-in.

Projector-Camera systems are used for various applications in computer vision, immersive environments, visual servoing, etc. Due to gaps between neighboring pixels on the projector's image plane and variations in scene depth, the image projected onto a scene shows pixelation and blurring artifacts. In certain vision and graphics applications, it is required that high quality composition of the scene and the projected image is captured, excluding the artifacts, while retaining the scene characteristics. The localization and restoration of projected pixels is difficult due to various factors such as spatially varying blur, background texture, noise, shapes of scene objects, and color transformations of projector and camera. We extend the usage of local phase, computed using the Gabor filter, to isolate each of the projected pixels distinctively. The local phase is also invariant to a class of blur kernels. For restoration of the captured images, we reproject the projected pixels such that these artifacts are absent. To improve the quality further we propose a mechanism to virtualize a high-resolution projector.

 

Year of completion:  2009
 Advisor : Anoop M. Namboodiri

Related Publications

  • Himanshu Arora and Anoop M. Namboodiri - How much zoom is the righ zoom from the perspective of Super-Resolution? IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 142-149, 16-19 Dec,2008, Bhubaneswar, India. [PDF]

  • Himanush Arora and Anoop M. Namboodiri - Projected Pixel Localization and Artifact Removal in Captured Images IEEE TENCON, November 18-21,2008, Hyderabad, India. [PDF]

  • Himanshu Arora, Anoop M. Namboodiri and C. V. Jawahar - Robust Image Registration with Illumination Blur and Noice Variations for Super_Resolution IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008),Mar 30-April 4, Las Vegas, USA. [PDF]

  • Himanshu Arora, Anoop M. Namboodiri and C.V. Jawahar - Accurate Image Registration from Local Phase Information Proc. of The Thirteen National Conference on Communications(NCC 2007), Kanpur, 2007. [PDF]


Downloads

thesis

ppt

Towards Large Scale Image Based Robot Navigation


Supreeth Achar (homepage)

Automatic mapping of environments and autonomous navigation are two of the central problems in mobile robotics. A number of different sensor modalities can be used to solve these problems. Vision sensors are relatively low in cost and have the potential to provide very rich information about the environment. However, active range finding sensors tend to be more widely used because interpretation of images has been a very challenging, computationally intensive task. Recent advances in computer vision algorithms and improvements in hardware capabilities have greatly increased the scope for using vision in robotics, even in unstructured, real world environments. Approaches to applying computer vision to robotics problems fall into two classes, model based approaches and image based approaches. Model based approaches use some sort of metric representation of the environment as a map. This map can be built manually and provided to the robot or it can be built automatically using visual SLAM methods. Building accurate reconstructions becomes increasingly difficult as the size of the environment increases. Image based methods for robot navigation avoid the need for metric modeling. Instead of inferring a 3D model of the environment and using this model to determine where the robot is and what is should do, image based methods work directly over images. Images of the environment captured from various positions in the environment are stored as nodes in a topological graph structure in which edges encode the connectivity between pairs of poses.

Localization is the process of determining the position of a robot from sensor readings. In an image based framework, localization entails finding a previously seen image stored in the topological graph that closely matches the current view. If such an image exists, then it can be inferred that the robot is close to the pose where the stored image was captured. In this thesis, a robot localization method is developed that builds upon bag of words techniques used in content based image retrieval. Dynamic scene elements such as pedestrians and traffic which could corrupt localization results are handled with the help of a motion detection algorithm. Prior information regarding robot pose is used in conjunction with visual word frequency distributions to assign weights to words that give higher importance to features that are more informative regarding the pose of the robot. The method was validated in a localization experiment over a six kilometer path through city roads. block

Image based methods are ideally suited to learning the rough structure and connectivity of an environment, especially detection of loop closures. Loop closures can be found using the same image retrieval based techniques that are used for localization. Loop closure detection is a major problem in SLAM because data at the end of a loop has to be associated with that at the beginning and then the entire map built by the robot needs to be updated. Chapter 3 presents a method for inferring a metric map of an environment from an appearance based topological graph representation. Building a topological graph as a preliminary step makes information regarding the connectivity of the environment available to the SLAM algorithm, making the making process significantly easier. The offline SLAM process can be visualized as an optimization over a graph. Each robot pose along the trajectory and feature in the map is represented by a node. Each edge links two nodes connected by a measurement, either a measurement of a feature position with respect to some point on the robot trajectory or an estimate of the motion between two points along the trajectory. Due to noise and modeling errors, these measurements will be inconsistent with each other. The optimization finds the a consistent trajectory and map that maximizes the log-likelihood of the measurements under assumptions of Gaussian sensor and motion model noise.

In natural outdoor environments, the deformable nature of objects can make it difficult to establish correspondences between points in two images of the same scene taken from different views. Most existing closed loop methods for controlling the position of a manipulator or robot using visual servoing require accurate feature correspondences. Chapter 4 presents a visual servoing algorithm that does away with the need for feature correspondence between the current and target views by modeling the distribution of features in an image as a mixture of Gaussians. The L2 norm of the distance between two such Gaussian mixtures can be expressed in the closed form. A control law is derived which which minimizes the distance function between the two Gaussian mixtures, causing the robot to move from its current pose to the target pose. Because the control law is defined in terms of a statistic over the feature distribution and not over the positions of the individual features, the control is robust to errors in measurement.

 

Year of completion:  August 2009
 Advisor : C. V. Jawahar

Related Publications

  • Supreeth Achar and C. V. Jawahar - Adaptation and Learning for Image Based Navigation IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 103-110, 16-19 Dec,2008, Bhubaneswar, India. [PDF]

  • D. Santosh, Supreeth Achar, C.V. Jawahar - Autonomous Image-based Exploration for Mobile Robot Navigation Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'2008). May 19th-23rd 2008, Pasadena California. [PDF]

  • A.H. Abdul Hafez, Supreeth Achar, and C. V. Jawahar - Visual Servoing based on Gaussian Mixture Models Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'2008). May 19th-23rd 2008, Pasadena Californi. [PDF]


Downloads

thesis

ppt

Exploring Irregular Memory Access Applications on the GPU


Mohammed Suhail Rehman (homepage)

General Purpose Computation on Graphics Processors (GPGPU) is a recent development in high performance computing. Graphics hardware has evolved over the past decade or so from fixed function stream processors to highly programmable computation engines that offer the maximum price-to-performance ratio among current parallel hardware. The GeForce GTX 280 from NVIDIA comes with a peak performance of 1 TFLOP and costs about 400 US$. This peak performance, however is hard to realize for many applications on the GPU. Applications that generally see good performance gains (compared to implementations on similar parallel hardware or CPU) are those which are extremely parallel and have data locality which match well with the memory subsystem of such hardware. A lot of research has been devoted to finding ways to optimally utilize this massively parallel hardware to accelerate a diverse range of applications.

There exists a range of applications which are parallelizable but do not possess ’good’ memory access characteristics. These applications are generally classified as irregular memory access applications. Developing techniques to implement such applications have been a challenge on such massively parallel hardware. These applications are traditionally not well suited for most modern parallel architectures and even modest gains in performance are appreciated. Development of suitable primitives for this class of applications is crucial in advancing the acceptance of such massively parallel hardware in the high performance computing community.

We introduce another technique to solve this problem efficiently on the GPU for large data sets. This technique is a recursive formulation of a recent algorithm for shared memory systems by Helman and J´aJ´a. This recursive formulation allowed us to exploit the massive parallelism on the GPU as we were able to minimize the number of idle threads during computation. The optimal parameters for this algorithm to run on the GPU are also discussed, leading the fastest single-chip implementation of the list ranking problem, at the time of printing. Our optimized algorithm can rank a random list of 8 million elements in about 50 milliseconds which translates to about 15x speedup over a sequential CPU implementation and 5-6x speedup over the Cell Broadband Engine.

We then present an implementation of the list ranking primitive to accelerate a few Euler-tour technique based algorithms - these include tree rooting (parent finding), subtree size calculation, tree traversals and vertex level computation. A major chunk of the actual runtime for these implemenations is the list ranking step, and we obtain highly efficient algorithms for processing trees on the GPU as a result. We see a sustained performance benefit of about 9-10x over sequential BFS on GPU and about 2x over BFS implemented on CUDA. Optimizing such applications for the GPU is not a trivial task. We explain some of the characteristics of irregular application development for the GPU as well as techniques to model and assess performance for the GPU. Our research has yielded a set of simple equations can aid in quickly estimating the runtime of a particular implementation on the GPU. We discuss list ranking in this context and provide a few pointers in maximizing performance for irregular applications.

 

Year of completion:  June 2010
 Advisor : P. J. Narayanan, Kishore Kothapalli

Related Publications

  • M. Suhail Rehman, Kishore Kothapalli and P.J. Narayanan - Fast and Scalable List Ranking on the GPU Proceedings of 23rd International Conference on Supercomputing(ICS 09), 8-12 June, 2009, New York, USA. [PDF]

  • Kishore Kothapalli, Rishabh Mukherjee, M. Suhail Rehman, P.J Naryanan, Kannan Srinathan, Suryakant Patidar - A Performance Prediction Model for the CUDA GPGPU Platform, 16th Annual International Conference on High Performance Computing (HiPC 2009). Kochi, India, December 2009 . [PDF]

Downloads

thesis

ppt

Learning Non-Linear Kernel Combinations Subject to General Regularization: Theory and Applications


B. Rakesh Babu

Kernel methods are among the important recent developments in the field of machine learning with applications in computer vision, speech recognition, bio-informatics, etc. This new class of algorithms combine the stability and efficiency of linear algorithms with the descriptive power of nonlinear features. Kernel methods allow data to be mapped (implicitly) to a different space, which is often very high dimensional compared to the input space, so that complex patterns in the data become simpler to detect and learn. Kernel function maps the data implicitly into a different space. Support Vector Machines (SVMs) are one of the kernel methods which is widely successful for classification task. The performance of algorithm depends on the choice of the kernel. Sometimes, finding the right kernel is a complicated task. To overcome this, learning the kernel is the new paradigm which is developed in the recent years. For this, the kernel is parameterized as a weighted linear combination of base kernels. The weights of the kernel are jointly optimized with the objective of the task.

Learning both the SVM parameters and the kernel parameters is a Multiple Kernel Learning (MKL) problem. Many formulations of MKL are presented in literature. However, all these methods restrict to linear combination of base kernels. In this thesis, we show how the existing optimization techniques of MKL formulations can be extended to learn non-linear kernel combinations subject to general regularization on the kernel parameters. Although, this leads to non-convex problem, the proposed method retains all the efficiency of existing large scale optimization algorithms. We name the new MKL formulation as Generalized Multiple Kernel Learning (GMKL). We highlight the advantages of GMKL by tackling problems like feature selection and learning discriminative parts for object categorization problem. Here, we show how the proposed formulation can lead to better results not only as compared to traditional MKL but also as compared to state-of-the-art wrapper and filter methods for feature selection. In the problem of learning discriminative parts for object categorization, our objective is to determine minimal sets of pixels and image regions required for the task. We use the Multiple kernel learning to select the most relevant pixels and regions for classification. We then show how the framework can be used to enhance our understanding of the object categorization problem at hand, determine the importance of context and highlight artifacts in the training data. We also tackle the problem of recognizing characters in images of natural scenes in MKL framework. Traditionally it is not be handled well by OCR techniques. We assess the performance of various features ( using bag-of-visual-words representation ) based on nearest neighbor and SVM classification. Besides this, we investigate the appropriate representation schemes for recognition using MKL.

In short, the contributions of this thesis are: (i) Proposing new MKL formulation that can learn non-linear kernel combinations subject to general regularization on the kernel parameters. (ii) Exploring the utility of multiple kernel learning formulations for feature selection and to the problem of learning informative parts for object category recognition. (iii) Recognition of character images taken in natural scenes using the state of the art object recognition schemes. And also exploring the appropriate representation schemes for recognition using MKL.

 

Year of completion:  2010
 Advisor : C. V. Jawahar, Dr. Manik Varma

Related Publications

  • Manik Varma and Bodla Rakesh Babu - More Generality in Efficient Multiple Kernal Learning Proceedings of International Conference on Machine Learning(ICML 09), 14-18 June, 2009, Montreal, Quebec. [PDF]

  • T. E. de Campos and B. R. Babu and M. Varma, Character recognition in natural images,in Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, February 2009.

Downloads

thesis

ppt

De-identification for Privacy Protection in Surveillance Videos.


Prachi Agrawal (homepage)

me

 Advances in cameras and web technology have made it easy to capture and share large amounts of video data over to a large number of people. Also, an increasing number of video cameras observe public spaces like airports, train stations, shopping malls, streets, and other public places today. Video surveillance is an important tool to prevent, detect and reduce crime and disorder. It is also widely used to monitor people's conduct, detect their presence in an area and/or possibly study their actions too. However, many civil rights and privacy groups have expressed their concern that by allowing continual increases in surveillance of citizens, we will end up in a mass surveillance society, with extremely limited, or non-existent political and/or personal freedoms. With the widespread use of surveillance today, we are becoming used to ''being watched''. As time goes on, the more accustomed we become to video surveillance, we are not as likely to fight to maintain our right to privacy. Critics believe that in addition to its obvious function of identifying and capturing individuals who are committing undesirable acts, surveillance also functions to create in everyone a feeling of always being watched, so that they become self-policing. At the same time prices of hardware have fallen, and the capabilities of systems have grown dramatically. The day is not far when it would be easy to convert even the lowcost video surveillance units into ''intelligent'' human recognition systems. These raise concerns on the unintentional and unwarranted invasion of the privacy of individuals caught in the videos.

 

 

The data collected during video surveillance consist mainly of images and sounds which allow identification of people captured, whether directly or indirectly, in addition to monitoring their conduct. While there may be a possible security need to identify the individuals in these videos, identifying the action suffices in most cases. The actor needs to be identified only rarely and only to authorized personnel. The privacy concerns related to the processing and distribution of the collected videos are genuine and will grow with wider adaptation of video technology. To address these concerns, automated methods to de-identify individuals in these videos are necessary. De-identification does not aim at destroying all information involving the individuals. Its ideal goals are to obscure the identity of the actor without obscuring the action. There is a natural trade-off between protecting privacy and providing sufficient detail. The goal is to protect the privacy of the individuals while providing sufficient feel for the human activities in the space being imaged.

The policies on the privacy issues are still unclear and are fast evolving. The need of the hour is to be cautious as digital data has the potential to be replicated with little control, especially when placed on the cloud. This thesis outlines the scenarios in which de-identification is required and the issues brought out by those. We also present an approach to de-identify individuals from videos. Our approach involves tracking and segmenting individuals in a conservative voxel space involving x, y and time. A de-identification transformation is applied per frame using these voxels to obscure the identity. A robust de-identification scheme with a randomization module was designed in order to make reconstruction and comparison based identification attacks impossible. Face, silhouette, gait, and other characteristics need to be obscured, ideally. We show results of our scheme on a number of videos and for several variations of the transformations. We present the results of applying algorithmic identification on the transformed videos. We also present the results of a user-study to evaluate how well humans can identify individuals from the transformed videos. The results showed that as the parameter controlling the amount of deidentification increased, the actors became less identifiable (more privacy), while the video started losing the context and detail. The studies also suggest that an action can be recognized with more accuracy than the actor in a de-identified video, which is the guiding principle of de-identification. We also created a suitable dataset for the purpose of user studies, as the standard datasets available online are not challenging enough for such an evaluation.

 

Year of completion:  2010
 Advisor : P. J. Narayanan

Related Publications

  • Prachi Agrawal and P. J. Narayanan - Person De-identification in Videos Proceedings of the Ninth Asian Conference on Computer Vision(ACCV 09), 23-27 September, 2009, Xi'an, China. [PDF]

  • Prachi Agrawal and P. J. Narayanan, Person De-identification in Videos, in The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2010.

Demonstrations

  • The entire system was presented for the first time and a user study was conducted on the people who attended ExOR in November, 2009.
  • The system was made more user friendly. It was showcased for demonstration and another detailed user study was conducted on people who attended R & D Showcase in January, 2010.

Downloads

thesis

ppt 

More Articles …

  1. High Quality Rendering of Large Point-based Surfaces.
  2. Efficient Privacy Preserving Protocols for Visual Computation
  3. Speeding Up MRF Optimization Using Graph Cuts For Computer Vision
  4. Towards Efficient and Scalable Visual Processing in Images and Videos
  • Start
  • Prev
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.