CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Summer School 2026
    • Talks and Visits
    • Major Events
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Object-centric Video Navigation and Manipulation


Rajvi Shah (homepage)

The past decade has seen tremendous advancements in consumer electronics and web technology. The proliferation of digital cameras and popularity of content sharing sites have caused a rapid growth in creation and distribution of images and videos by home-users. Enhancement and manipulation of captured images is fairly popular among common users due to availability of numerous easy-to-use photo editing utilities like Instagram, Picasa, Photo Gallery, etc. In comparison, video manipulation is still less popular among common users due to the lack of easy-to-use yet powerful video editing platforms. Basic video editing platforms for home-users are simple and intuitive, but these tools provide limited functionality such as split and merge videos, add captions or audio etc. Professional video editing platforms are rich in functionality, but these tools demand high technical expertise for use. A novice user usually gets discouraged by complex interactions and cumbersome processing. Moreover, the traditional video editing interfaces model and represent videos as a collection of frames against a timeline. In a user's perception, a video has more meaningful semantics such as objects, actions, events, interactions, etc. The gap between perception and representation makes object-centric manipulation of videos an unnatural and laborious task. In this thesis, we attempt to bridge the gap between the power and usability of video manipulation interfaces by using computer vision techniques. We propose a representation based on three high-level video semantics, scene mosaic, object motion, and camera motion to enable simple and meaningful interaction for object-centric navigation and manipulation of long shot videos. We build an extended field of view mosaic of the video scene and represent object motion in this scene mosaic using 3D space-time trajectories. We define novel object and camera manipulation operations using object trajectories as basic interaction elements. The use of object trajectories as basic video semantics replaces complex interface elements by interactive curve manipulation operations. The object operations allow the users to perform various temporal manipulations on the video objects by interactively manipulating the object trajectories. For example, users can delay or advance the video objects by dragging the trajectories along the timeline or replicate objects by creating multiple copies of the object trajectories. The camera operations model the camera as a movable and scalable aperture and allow the users to simulate camera pan, tilt, and zoom by creating new aperture trajectories. Object and camera operations, in combination allow users to perform a number of high-level video manipulations in a simple 'click and drag' fashion.

 

Year of completion:  2012
 Advisor : P. J. Narayanan

Related Publications

  • Rajvi Shah and P.J. Narayanan - Trajectory based Video Object Manipulation Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2011),11-15 July, 2011, Barcelona, Spain. [PDF]

  • Rajvi Shah and P. J. Narayanan - Scene Background and Object Trajectories based Interactive Video Manipulation, in IEEE Transactions on Circuit Systems and Video Technology, Under Review.

Downloads

thesis

ppt

Improving image quality of synthetically zoomed tomographic images.


Neha Dixit (homepage)

 3-D medical images such as Computed Tomography(CT), Positron Emission Tomography(PET) and Magnetic Resonance Imaging(MRI) are commonly used for diagnosis. Tomographic reconstruction is the underlying technique for all these images. It is often of interest to increase the resolution of these images. Some of the possible solutions for increasing the resolution are: increasing the number of detectors, changing the detector material/geometry, increasing the excitation energy level or scanning time. These solutions are limited in terms of practical implementation. Hence, finding solutions at the processing level (post data acquisition) is of interest. This thesis focuses on the problem of preserving details and quality of tomographic images after upsampling. We have explored two different approaches for generating higher resolution images. These are based on re-examining the lattice geometry for reconstruction. We offer two novel solutions, one which uses a hexagonal instead of the conventional square lattice and a second one based super resolution strategy which combines samples drawn from two lattices.

H12 4

In the first solution, we reconstruct the image onto a hexagonal lattice. These samples are then interpolated to find the samples on the desired higher resolution square lattice. In the super resolution (SR)- based solution we reconstruct the images onto 2 low resolution lattices. The lattices are rotated versions of each other which provide a 'different view' of an object/scene. For generating low resolution images both square and hexagonal sampling have been considered.

The two solutions have been implemented and tested using a variety of test objects. These objects which are called phantoms were chosen to be of 2 types: analytical and real. Three analytically generated phantoms used were Concentric rings, dots and lines. Two real PET phantoms were Nema and Hoffman Brain phantoms. The results were benchmarked against upsampled images generated by: Direct upsampling and an existing super resolution technique which uses Union of Shifted Lattice (USL) samples. The evaluation results show a qualitative and quantitative improvement over results obtained by direct upsampling. The SR-based results are comparable to that obtained with USL in quality. However, the main strength of the proposed SR-based solution is the computational savings and a reduced number of required images for a given upsampling factor. (more...)

 

Year of completion:  March 2012
 Advisor : Jayanthi Sivaswamy

Related Publications

  • N. Dixit and J. Sivaswamy - A Novel Approach to Generate Up-sampled Tomographic Images using Combination of Rotated Hexagonal Lattices Proceedings of Sixteenth National Confernece on Communications (NCC'10),29-31 Jan. 2010,Chennai, India. [PDF]

  • Neha Dixit, N.V. Kartheek and Jayanthi Sivaswamy - Synthetic Zooming of Tomogrphics Images by Combination of Lattices Proceedings of IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 25-31 October, 2009. [PDF]


Downloads

thesis

ppt

Improving the Efficiency of SfM and its Applications.


Siddharth Choudhary (homepage)

Large scale reconstructions of camera matrices and point clouds have been created using structure from motion from community photo collections. Such a dataset is rich in information; we can interpret it as a sampling of the geometry and appearance of the underlying space. In this dissertation, we encode the visibility information between and among points and cameras as visibility probabilities. The conditional visibility probability of a set of points on a point (or a set of cameras on a camera) can be used to select points (or cameras) based on their dependence or independence. We use it to efficiently solve the problems of image localization and feature triangulation. We show how the conditional probability can be combined with other measures to prioritize a set of points (or cameras) for matching and use it for fast guided search of points for the image localization problem. We define the problem of feature triangulation as the estimation of 3D coordinate of a given 2D feature using the SfM data. Our approach can guide the search to quickly identify a subset of cameras in which the feature is visible.

Other than image localization and feature triangulation, bundle adjustment is a key component of the reconstruction pipeline and often its slowest and the most computational resource intensive. It hasn't been parallelized effectively so far. We also a present a hybrid implementation of sparse bundle adjustment on the GPU using CUDA, with the CPU working in parallel. The algorithm is decomposed into smaller steps, each of which is scheduled on the GPU or the CPU. We develop efficient kernels for the steps and make use of existing libraries for several steps. Our implementation outperforms the CPU implementation significantly, achieving a speedup of 30-40 times over the standard CPU implementation for datasets with upto 500 images on an Nvidia Tesla C2050 GPU.

 

Year of completion:  2012
 Advisor : P. J. Narayanan

Related Publications

  • Siddharth Choudhary and P J Narayanan - Visibility Probability Structure from SfM Datasets and Applications Proceedings of 12th European Conference on Computer Vision, 7-13 Oct. 2012, Vol. ECCV 2012, Part-VI, LNCS 7577, Firenze, Italy. [PDF]

  • Siddharth Choudhary, Shubham Gupta and P. J. Narayanan - Practical Time Bundle Adjustment for 3D Reconstruction on GPU Proceedings of ECCV Workshop on Computer Vision on GPU (CVGPU'10),5-11 Sep. 2010, Crete, Greece. [PDF]


Downloads

thesis

ppt

Scalable Clustering for Vision using GPUs.


K Wasif Mohiuddin (homepage)

Clustering algorithms have wide applications in Computer Vision, Data mining, Data Visualization, etc. Clustering is an important step for indexing and searching of documents, images, video, etc. Clustering large numbers of high-dimensional vectors is very computation intensive. CPUs are unable to handle such load and consume sometimes days and even weeks to cluster large data. GPUs are being used for general purpose computing of wide range of problems which require high computational power. Today's GPUs deliver as high as 1.5 TFLOPs. The GPU has evolved over the time not only by increasing number of cores but also major architectural changes like faster access of data, more shared memory, threads per block, etc. Such changes have enabled the GPU programmers to exploit its architectural features to the fullest and achieve high performance. In this thesis, we focus on K-Means algorithm which is a widely used unsupervised clustering algorithm. We develop a GPU based K-Means implementation for large datasets; also we have used this implementation to develop a video organizing application.

GPU K-Means: We present the design and implementation of the K-Means clustering algorithm on the modern GPU. All steps are performed entirely on the GPU efficiently in our approach. We also present a load balanced Multi-node, Multi-GPU implementation which can handle up to 6 million, 128-dimensional vectors. We use efficient memory layout for all steps to get high performance. The GPU accelerators are now present on high-end workstations and low-end laptops. Scalability in the number and dimensionality of the vectors, the number of clusters, as well as in the number of cores available for processing are important for usability to different users. Our implementation scales linearly or near-linearly with different problem parameters. We achieve up to 2 times increase in speed compared to the best GPU implementation for $K$-Means on a single GPU. We obtain a speed up of upto 170 on a single Nvidia Fermi GPU compared to a standard sequential implementation. We are able to execute single iteration of $K$-Means in 136 seconds on off-the-shelf GPUs to cluster 6 million vectors of 128 dimensions into 4K clusters and in 2.5 seconds to cluster 125K vectors of 128 dimensions into 2K clusters.

cluster2 cluster1

Video Organizer: Video data is increasing rapidly along with the capacity of storage devices owned by a lay user. Users have moderate to large personal collections of videos and would like to keep them in an organized manner based on its content. Video organizing tools for personal users are way behind even the primitive image organizing tools. We present a mechanism in this thesis to help ordinary users organize their personal collection of videos based on categories they choose. We cluster the PHOG features extracted from selected key frames using K-Means to form a representation for each user-selected category during the learning phase. During the organization phase, labels from a K-NN classifier on these cluster centers for each key frame are aggregated to give a label to the video while categorizing. Video processing is computationally intensive. To perform the computationally intensive steps involved, we exploit the CPU as well as the GPU that is common even on personal systems. Effective use of the parallel hardware on the system is the only way to make the tool scale reasonably to large collections that will be available soon. Our tool is able to organize a set of 100 sport videos of total duration of 1375 minutes in about 9.5 minutes. The process of learning the categories from 12 annotated videos of duration 165 minutes took 75 seconds on a GTX 580 card. These were on a standard desktop with an off-the-shelf GPU. The labeling accuracy is about 96% on all videos.

The ideas, approaches proposed in this thesis have been implemented and validated with experimental results. For large data-sets we developed a scalable, efficient K-Means clustering on GPU along with a Multi GPU framework and used it to develop a video organizer application providing high accuracy. (more...)

 

Year of completion:  August 2012
 Advisor : P. J. Narayanan

Related Publications

  • K. Wasif Mohiuddin and P.J. Narayanan - Scalable Clustering Using Multiple GPUs Proceedings of International Conference on High Performance Computing (HIPC) 18-21 Dec. 2011, E-ISBN 978-1-4577-1949-3, Print ISBN 978-1-4577-1951-6, Bangalore, India. [PDF]

  • K. Wasif Mohiuddin and P.J. Narayanan - A GPU-Assisted Personal Video Organizer Proceedings of 3rd Workshop on GPU on Computer Vision at ICCV (International conference on Computer Vision) 04-11 Nov. 2011, Barcelona, Spain. [PDF]

 


Downloads

thesis

ppt

Cascaded Filtering of Biometric Identification Using Random Projection


Atif Iqbal (homepage)

Biometrics has a long history, and is possibly as old as the human race itself. It is often used as an advanced security measure to safeguard important artifacts, buildings, information, etc. Biometrics is increasingly being used for secure authentication of individuals and is making its presence felt in our lives. With the fast increase in computational power, such systems can now be deployed on a very large scale. However, efficiency in large scale biometric matching is still a concern as the problem of deduplication (removal of duplicates) of a biometric database with N entries is O(N2), which can be extremely challenging for large databases. To make this problem tractable, many indexing methods have been proposed that would speed up the comparison process. However, the expectation of accuracy in such systems combined with the nature of biometric data makes the problem a very challenging one.

Biometric identification often involves explicit comparison of a probe template against each template stored in a database. An effective approach to speed up the process is that of filtering, where a lightweight comparison is used to reduce the database to smaller set of candidates for explicit comparison. However, most existing filtering schemes use specific features that are hand-crafted for the biometric trait at each stage of the filtering. In this work, we show that a cascade of simple linear projections on random lines can achieve significant levels of filtering. Each stage of filtering consists of projecting the probe onto a specific line and removal of database samples outside a window around the probe. The approach provides a way of automatic generation of filters and avoids the need of developing specific features for different biometric traits. The method also provides us with a variety of parameters such as the projection lines, the number and order of projections, and the window sizes to customize the filtering process to a specific application. The experiments are performed on the fingerprints, palmprints and iris.

For both iris and palmprint datasets, the representation that we use (before projection) is the popularly used thresholded filter response from pre-defined regions of the image. Experimental results show that using an ensemble of projections reduce the search space by 60% without increasing the false negative identification rate in palmprint. However for stronger biometrics such as iris, the approach does not yield similar results. We further explore this problem to find a solution, specifically for the case of fingerprints.

The fundamental approach here is to explore the effectiveness of weak features in a cascade for filtering fingerprint databases. We start with a set of potential indexing features computed from minutiae triplets and minutiae quadruplets. We show that by using a set of random lines and the proposed fitness function, one can achieve better results that optimized projection methods such as PCA or LDA. Experimental results on fingerprint datasets show that using an ensemble of projections we can reduce the penetration to 26% at a hit rate of 99%. As each stage of the cascade is extremely fast, and filtering is progressive along the cascade, one can terminate the cascade at any point to achieve the desired performance. One can also combine this method with other indexing methods to improve the overall accuracy and speed. We present detailed experimental results on various aspects of the process on the FVC 2002 dataset.

The proposed approach is scalable to large datasets due to the use of random linear projections and direcly lends to pipelined processing. The method also allows the use of multiple existing features without affecting the computation time.

 

Year of completion:  2012
 Advisor : Anoop M. Namboodiri

Related Publications

  • Atif Iqbal and Anoop M. Namboodiri - Cascaded Filtering for Biometric Identification using Random Projections Proceedings of Seventh National Conference on Communications (NCC 2011),28-30 Jan, 2011, Bangalore, India. [PDF]

  • Atif Iqbal and Anoop M. Nambodiri - Cascaded Filtering for Fingerprint Identification using Random Projection, IEEE Compter Society Conference Computer Vision and Pattern Recognition Workshops, June 2012

Downloads

thesis

ppt

More Articles …

  1. Mixed-Resolution Patch-Matching
  2. Private Outlier Detection and Content based Encrypted Search
  3. Document Image Retrieval using Bag of Visual Words Model.
  4. Learning Representations for Computer Vision Tasks.
  • Start
  • Prev
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. MS Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.