CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Summer School 2026
    • Talks and Visits
    • Major Events
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Towards Large Scale Image Based Robot Navigation


Supreeth Achar (homepage)

Automatic mapping of environments and autonomous navigation are two of the central problems in mobile robotics. A number of different sensor modalities can be used to solve these problems. Vision sensors are relatively low in cost and have the potential to provide very rich information about the environment. However, active range finding sensors tend to be more widely used because interpretation of images has been a very challenging, computationally intensive task. Recent advances in computer vision algorithms and improvements in hardware capabilities have greatly increased the scope for using vision in robotics, even in unstructured, real world environments. Approaches to applying computer vision to robotics problems fall into two classes, model based approaches and image based approaches. Model based approaches use some sort of metric representation of the environment as a map. This map can be built manually and provided to the robot or it can be built automatically using visual SLAM methods. Building accurate reconstructions becomes increasingly difficult as the size of the environment increases. Image based methods for robot navigation avoid the need for metric modeling. Instead of inferring a 3D model of the environment and using this model to determine where the robot is and what is should do, image based methods work directly over images. Images of the environment captured from various positions in the environment are stored as nodes in a topological graph structure in which edges encode the connectivity between pairs of poses.

Localization is the process of determining the position of a robot from sensor readings. In an image based framework, localization entails finding a previously seen image stored in the topological graph that closely matches the current view. If such an image exists, then it can be inferred that the robot is close to the pose where the stored image was captured. In this thesis, a robot localization method is developed that builds upon bag of words techniques used in content based image retrieval. Dynamic scene elements such as pedestrians and traffic which could corrupt localization results are handled with the help of a motion detection algorithm. Prior information regarding robot pose is used in conjunction with visual word frequency distributions to assign weights to words that give higher importance to features that are more informative regarding the pose of the robot. The method was validated in a localization experiment over a six kilometer path through city roads. block

Image based methods are ideally suited to learning the rough structure and connectivity of an environment, especially detection of loop closures. Loop closures can be found using the same image retrieval based techniques that are used for localization. Loop closure detection is a major problem in SLAM because data at the end of a loop has to be associated with that at the beginning and then the entire map built by the robot needs to be updated. Chapter 3 presents a method for inferring a metric map of an environment from an appearance based topological graph representation. Building a topological graph as a preliminary step makes information regarding the connectivity of the environment available to the SLAM algorithm, making the making process significantly easier. The offline SLAM process can be visualized as an optimization over a graph. Each robot pose along the trajectory and feature in the map is represented by a node. Each edge links two nodes connected by a measurement, either a measurement of a feature position with respect to some point on the robot trajectory or an estimate of the motion between two points along the trajectory. Due to noise and modeling errors, these measurements will be inconsistent with each other. The optimization finds the a consistent trajectory and map that maximizes the log-likelihood of the measurements under assumptions of Gaussian sensor and motion model noise.

In natural outdoor environments, the deformable nature of objects can make it difficult to establish correspondences between points in two images of the same scene taken from different views. Most existing closed loop methods for controlling the position of a manipulator or robot using visual servoing require accurate feature correspondences. Chapter 4 presents a visual servoing algorithm that does away with the need for feature correspondence between the current and target views by modeling the distribution of features in an image as a mixture of Gaussians. The L2 norm of the distance between two such Gaussian mixtures can be expressed in the closed form. A control law is derived which which minimizes the distance function between the two Gaussian mixtures, causing the robot to move from its current pose to the target pose. Because the control law is defined in terms of a statistic over the feature distribution and not over the positions of the individual features, the control is robust to errors in measurement.

 

Year of completion:  August 2009
 Advisor : C. V. Jawahar

Related Publications

  • Supreeth Achar and C. V. Jawahar - Adaptation and Learning for Image Based Navigation IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 103-110, 16-19 Dec,2008, Bhubaneswar, India. [PDF]

  • D. Santosh, Supreeth Achar, C.V. Jawahar - Autonomous Image-based Exploration for Mobile Robot Navigation Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'2008). May 19th-23rd 2008, Pasadena California. [PDF]

  • A.H. Abdul Hafez, Supreeth Achar, and C. V. Jawahar - Visual Servoing based on Gaussian Mixture Models Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'2008). May 19th-23rd 2008, Pasadena Californi. [PDF]


Downloads

thesis

ppt

Exploring Irregular Memory Access Applications on the GPU


Mohammed Suhail Rehman (homepage)

General Purpose Computation on Graphics Processors (GPGPU) is a recent development in high performance computing. Graphics hardware has evolved over the past decade or so from fixed function stream processors to highly programmable computation engines that offer the maximum price-to-performance ratio among current parallel hardware. The GeForce GTX 280 from NVIDIA comes with a peak performance of 1 TFLOP and costs about 400 US$. This peak performance, however is hard to realize for many applications on the GPU. Applications that generally see good performance gains (compared to implementations on similar parallel hardware or CPU) are those which are extremely parallel and have data locality which match well with the memory subsystem of such hardware. A lot of research has been devoted to finding ways to optimally utilize this massively parallel hardware to accelerate a diverse range of applications.

There exists a range of applications which are parallelizable but do not possess ’good’ memory access characteristics. These applications are generally classified as irregular memory access applications. Developing techniques to implement such applications have been a challenge on such massively parallel hardware. These applications are traditionally not well suited for most modern parallel architectures and even modest gains in performance are appreciated. Development of suitable primitives for this class of applications is crucial in advancing the acceptance of such massively parallel hardware in the high performance computing community.

We introduce another technique to solve this problem efficiently on the GPU for large data sets. This technique is a recursive formulation of a recent algorithm for shared memory systems by Helman and J´aJ´a. This recursive formulation allowed us to exploit the massive parallelism on the GPU as we were able to minimize the number of idle threads during computation. The optimal parameters for this algorithm to run on the GPU are also discussed, leading the fastest single-chip implementation of the list ranking problem, at the time of printing. Our optimized algorithm can rank a random list of 8 million elements in about 50 milliseconds which translates to about 15x speedup over a sequential CPU implementation and 5-6x speedup over the Cell Broadband Engine.

We then present an implementation of the list ranking primitive to accelerate a few Euler-tour technique based algorithms - these include tree rooting (parent finding), subtree size calculation, tree traversals and vertex level computation. A major chunk of the actual runtime for these implemenations is the list ranking step, and we obtain highly efficient algorithms for processing trees on the GPU as a result. We see a sustained performance benefit of about 9-10x over sequential BFS on GPU and about 2x over BFS implemented on CUDA. Optimizing such applications for the GPU is not a trivial task. We explain some of the characteristics of irregular application development for the GPU as well as techniques to model and assess performance for the GPU. Our research has yielded a set of simple equations can aid in quickly estimating the runtime of a particular implementation on the GPU. We discuss list ranking in this context and provide a few pointers in maximizing performance for irregular applications.

 

Year of completion:  June 2010
 Advisor : P. J. Narayanan, Kishore Kothapalli

Related Publications

  • M. Suhail Rehman, Kishore Kothapalli and P.J. Narayanan - Fast and Scalable List Ranking on the GPU Proceedings of 23rd International Conference on Supercomputing(ICS 09), 8-12 June, 2009, New York, USA. [PDF]

  • Kishore Kothapalli, Rishabh Mukherjee, M. Suhail Rehman, P.J Naryanan, Kannan Srinathan, Suryakant Patidar - A Performance Prediction Model for the CUDA GPGPU Platform, 16th Annual International Conference on High Performance Computing (HiPC 2009). Kochi, India, December 2009 . [PDF]

Downloads

thesis

ppt

Learning Non-Linear Kernel Combinations Subject to General Regularization: Theory and Applications


B. Rakesh Babu

Kernel methods are among the important recent developments in the field of machine learning with applications in computer vision, speech recognition, bio-informatics, etc. This new class of algorithms combine the stability and efficiency of linear algorithms with the descriptive power of nonlinear features. Kernel methods allow data to be mapped (implicitly) to a different space, which is often very high dimensional compared to the input space, so that complex patterns in the data become simpler to detect and learn. Kernel function maps the data implicitly into a different space. Support Vector Machines (SVMs) are one of the kernel methods which is widely successful for classification task. The performance of algorithm depends on the choice of the kernel. Sometimes, finding the right kernel is a complicated task. To overcome this, learning the kernel is the new paradigm which is developed in the recent years. For this, the kernel is parameterized as a weighted linear combination of base kernels. The weights of the kernel are jointly optimized with the objective of the task.

Learning both the SVM parameters and the kernel parameters is a Multiple Kernel Learning (MKL) problem. Many formulations of MKL are presented in literature. However, all these methods restrict to linear combination of base kernels. In this thesis, we show how the existing optimization techniques of MKL formulations can be extended to learn non-linear kernel combinations subject to general regularization on the kernel parameters. Although, this leads to non-convex problem, the proposed method retains all the efficiency of existing large scale optimization algorithms. We name the new MKL formulation as Generalized Multiple Kernel Learning (GMKL). We highlight the advantages of GMKL by tackling problems like feature selection and learning discriminative parts for object categorization problem. Here, we show how the proposed formulation can lead to better results not only as compared to traditional MKL but also as compared to state-of-the-art wrapper and filter methods for feature selection. In the problem of learning discriminative parts for object categorization, our objective is to determine minimal sets of pixels and image regions required for the task. We use the Multiple kernel learning to select the most relevant pixels and regions for classification. We then show how the framework can be used to enhance our understanding of the object categorization problem at hand, determine the importance of context and highlight artifacts in the training data. We also tackle the problem of recognizing characters in images of natural scenes in MKL framework. Traditionally it is not be handled well by OCR techniques. We assess the performance of various features ( using bag-of-visual-words representation ) based on nearest neighbor and SVM classification. Besides this, we investigate the appropriate representation schemes for recognition using MKL.

In short, the contributions of this thesis are: (i) Proposing new MKL formulation that can learn non-linear kernel combinations subject to general regularization on the kernel parameters. (ii) Exploring the utility of multiple kernel learning formulations for feature selection and to the problem of learning informative parts for object category recognition. (iii) Recognition of character images taken in natural scenes using the state of the art object recognition schemes. And also exploring the appropriate representation schemes for recognition using MKL.

 

Year of completion:  2010
 Advisor : C. V. Jawahar, Dr. Manik Varma

Related Publications

  • Manik Varma and Bodla Rakesh Babu - More Generality in Efficient Multiple Kernal Learning Proceedings of International Conference on Machine Learning(ICML 09), 14-18 June, 2009, Montreal, Quebec. [PDF]

  • T. E. de Campos and B. R. Babu and M. Varma, Character recognition in natural images,in Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, February 2009.

Downloads

thesis

ppt

De-identification for Privacy Protection in Surveillance Videos.


Prachi Agrawal (homepage)

me

 Advances in cameras and web technology have made it easy to capture and share large amounts of video data over to a large number of people. Also, an increasing number of video cameras observe public spaces like airports, train stations, shopping malls, streets, and other public places today. Video surveillance is an important tool to prevent, detect and reduce crime and disorder. It is also widely used to monitor people's conduct, detect their presence in an area and/or possibly study their actions too. However, many civil rights and privacy groups have expressed their concern that by allowing continual increases in surveillance of citizens, we will end up in a mass surveillance society, with extremely limited, or non-existent political and/or personal freedoms. With the widespread use of surveillance today, we are becoming used to ''being watched''. As time goes on, the more accustomed we become to video surveillance, we are not as likely to fight to maintain our right to privacy. Critics believe that in addition to its obvious function of identifying and capturing individuals who are committing undesirable acts, surveillance also functions to create in everyone a feeling of always being watched, so that they become self-policing. At the same time prices of hardware have fallen, and the capabilities of systems have grown dramatically. The day is not far when it would be easy to convert even the lowcost video surveillance units into ''intelligent'' human recognition systems. These raise concerns on the unintentional and unwarranted invasion of the privacy of individuals caught in the videos.

 

 

The data collected during video surveillance consist mainly of images and sounds which allow identification of people captured, whether directly or indirectly, in addition to monitoring their conduct. While there may be a possible security need to identify the individuals in these videos, identifying the action suffices in most cases. The actor needs to be identified only rarely and only to authorized personnel. The privacy concerns related to the processing and distribution of the collected videos are genuine and will grow with wider adaptation of video technology. To address these concerns, automated methods to de-identify individuals in these videos are necessary. De-identification does not aim at destroying all information involving the individuals. Its ideal goals are to obscure the identity of the actor without obscuring the action. There is a natural trade-off between protecting privacy and providing sufficient detail. The goal is to protect the privacy of the individuals while providing sufficient feel for the human activities in the space being imaged.

The policies on the privacy issues are still unclear and are fast evolving. The need of the hour is to be cautious as digital data has the potential to be replicated with little control, especially when placed on the cloud. This thesis outlines the scenarios in which de-identification is required and the issues brought out by those. We also present an approach to de-identify individuals from videos. Our approach involves tracking and segmenting individuals in a conservative voxel space involving x, y and time. A de-identification transformation is applied per frame using these voxels to obscure the identity. A robust de-identification scheme with a randomization module was designed in order to make reconstruction and comparison based identification attacks impossible. Face, silhouette, gait, and other characteristics need to be obscured, ideally. We show results of our scheme on a number of videos and for several variations of the transformations. We present the results of applying algorithmic identification on the transformed videos. We also present the results of a user-study to evaluate how well humans can identify individuals from the transformed videos. The results showed that as the parameter controlling the amount of deidentification increased, the actors became less identifiable (more privacy), while the video started losing the context and detail. The studies also suggest that an action can be recognized with more accuracy than the actor in a de-identified video, which is the guiding principle of de-identification. We also created a suitable dataset for the purpose of user studies, as the standard datasets available online are not challenging enough for such an evaluation.

 

Year of completion:  2010
 Advisor : P. J. Narayanan

Related Publications

  • Prachi Agrawal and P. J. Narayanan - Person De-identification in Videos Proceedings of the Ninth Asian Conference on Computer Vision(ACCV 09), 23-27 September, 2009, Xi'an, China. [PDF]

  • Prachi Agrawal and P. J. Narayanan, Person De-identification in Videos, in The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2010.

Demonstrations

  • The entire system was presented for the first time and a user study was conducted on the people who attended ExOR in November, 2009.
  • The system was made more user friendly. It was showcased for demonstration and another detailed user study was conducted on people who attended R & D Showcase in January, 2010.

Downloads

thesis

ppt 

High Quality Rendering of Large Point-based Surfaces.


Naveen Kumar Reddy Bolla (homepage)

Points-based models have gained popularity in recent years. The most attractive feature of the point-based primitives is its simplicity. No information about connectivity or topology of the surface is stored explicitly. Adaptive modification and dynamic sampling of the surface do not suffer from complexity and robustness like triangle mesh. The level of detail is simple, fast and almost continuous. However, points based methods suffer from surface normal discontinuities in representation. Shading is also not independent from sampling rate. Point based models can be visualized by rendering the points directly. Each sample in the representation has certain attributes (normal, color, material properties) for rendering and shading them as a continuous surface. Typically, a point sample also has a radius to define an area around it. Such samples are called surfels and approximate the shape of the surface linearly in their neighborhood. Since linear splats can only be flat shaded, such representations require a large number of samples for a good shading quality. This slows the rendering due to increase in rendering computation and related memory bus activity.

Point-based rendering suffer from the limited resolution of the fixed number of samples representing the model. At some distance, the screen space resolution is high relative to the point samples, which causes under-sampling. A better way of rendering a model is to re-sample the surface during the rendering at the desired resolution in object space, guaranteeing a sampling density sufficient for image resolution. Output sensitive sampling samples objects at a resolution that matches the expected resolution of the output image. This is crucial for hole-free point-based rendering. Many technical issues related to point-based graphics boil down to reconstruction and re-sampling. A point based representation should be as small as possible while conveying the shape well.

In the first part of this thesis, we present a compact representation for point based models using non-linear surface elements. In the second part, we present a method for fast ray casting of a dynamic surface defined by large number of attributed points called Metaballs.

Algebraic Splats: Higher-order approximations of the local neighborhood have the potential to represent the shape using fewer primitives, simultaneously achieving higher rendering speeds and quality. In this thesis, we present algebraic splats, which are low-order polynomials that approximate the local neighborhood of a pointset. We adapt the Moving Least Squares (MLS) approximation to generate algebraic splat representations for a given point set. Quadratic and cubic splats as basic rendering primitive are able to approximate point-based models using 10% or fewer points than linear splats, for equivalent visual quality. Though rendering a non-linear patch is slower compared to a linear splat, the overall speed of rendering of an object could be faster. The approximation error can also be less using a small number of higher order primitives. We represent a given point set with a user-specified small number of algebraic splats with optimal rendering quality. This is done by decimating the point set and jointly approximating each using a local algebraic surface based on the Moving Least Squares (MLS) approximation.

david.mr.points w100 david.mr.splats w100

Our rendering provides smooth surfaces with normals everywhere. We can render polynomials directly on today's GPUs using ray-tracing because of the semi-implicit nature of the splats in the local reference domain. The major contributions of this work are: (i) A compact and lightweight representation for point geometry using nonlinear surface elements, called algebraic splats; (ii) A multi-resolution representation which provides continuous Level of Detail; and (iii) A ray-tracing approach to render the algebraic splats on the GPU. The method includes an adaptive anti-aliasing step to obtain smooth silhouettes. The David 2mm model can be represented using about 30K (or 0.8% of original) algebraic splats with little or low reduction in visual quality. We can raycast the David model with adaptive antialiasing (3X grid) at 130-150 fps and 80-90 fps with shadowing. Our method is faster by a factor of 10 on most models and require about 90% less diskspace in comparision to linear splats.

Metaballs: Since the available computational power is steadily growing, more and more science areas rely on simulations of ever-growing problem sizes producing huge amount of data to be analyzed. Simulation and experimental measurement in life sciences, physics, chemistry, materials, and thermodynamics yield large and often also time-dependent datasets. Interactive visualization is the key service that facilitates the analysis of such datasets and thus enables the researchers in those fields to quickly assess and compare the results of a simulation or measurement, verify and improve their models, and in so doing coming ever closer to understanding how dynamic processes work.

Metaballs (also known as blobs) are popular particle-based implicit surface representations. A set of them can represent smooth deformable, free-form shapes represented as the iso-surface of a scalar field. Current methods can handle only a moderate number of metaballs at interactive rates on commodity GPUs. In this thesis, we present a method to handle a million or more dynamic metaballs at interactive rates. We use a perspective grid as the acceleration structure, which is built in each frame on the GPU using fast primitives. Perspective grids provide perfect coherence to primary rays. We ray trace each grid cell at a time using an adaptation of the marching points methods to metaballs. This method extracts high performance from the SIMD GPUs of today. We show interactive handling of upto a million balls. Our method is also able to handle secondary rays, though at much reduced speeds due to the lack of coherence but is suitable for live visualization of particle-based simulations. The average build time for 1 million primitives is about 38 milliseconds and we can render a million metaballs in about 94 milliseconds. Performance of Secondary rays is not fast due to the lack of coherence, with a rendering time of 150 ms for 1K primitives. (more...)

 

Year of completion:  July 2010
 Advisor : P. J. Narayanan

Related Publications

  • Naveen Kumar Bolla and P. J. Narayanan - Algebraic Splats Representation for Point Based Models IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP 2008), pp. 71-78, 16-19 Dec,2008, Bhubaneswar, India. [PDF]

  • Naveen Kumar Bolla, P. J. Narayanan, Algebraic Splats for High-Quality Rendering of Points (Submitted to Graphical Models, Under Review).
  • Naveen Kumar Bolla, P. J. Narayanan, Real-time Ray-tracing of Million Metaballs. (Submitted to EGSR)

Downloads

thesis

ppt

More Articles …

  1. Efficient Privacy Preserving Protocols for Visual Computation
  2. Speeding Up MRF Optimization Using Graph Cuts For Computer Vision
  3. Towards Efficient and Scalable Visual Processing in Images and Videos
  4. Exploiting the Graphics Hardware to solve two compute intensive problems: Singular Value Decomposition and Ray Tracing Parametric Patches
  • Start
  • Prev
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. MS Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.