CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

3D Shape Analysis: Reconstruction and Classification


Sai Sagar Jinka

Abstract

The reconstruction and analysis of 3D objects by computational systems has been an intensive and long-lasting research problem in the graphics and computer vision scientific communities. Traditional acquisition systems are largely restricted to studio environment setup which requires multiple synchronized and calibrated cameras. With the advent of active depth sensors like time-of-flight sensors, structured lighting sensors made 3D acquisition feasible. This advancement of technology has paved way to many research problems like 3D object localization, recognition, classification, reconstruction which demand innovating sophisticated/elegant solutions to match their ever growing applications. 3D human body reconstruction, in particular, has wider applications like virtual mirror, gait analysis, etc. Lately, with the advent of deep learning, 3D reconstruction from monocular images garnered significant interest among the research community as it can be applied to in-the-wild settings. Initially we started exploration of classification of 3D rigid objects due to availabilty of ShapeNet datasets. In this thesis, we propose an efficient characterization of 3D rigid objects which take local geometry features into consideration while constructing global features in the deep learning setup. We introduce learnable B-Spline surfaces in order to sense complex geometrical structures (large curvature variations). The locations of these surfaces are initialized over the voxel space and are learned during training phase leading to efficient classification performance. Later on, we primarily focus on rather challenging problem of non-rigid 3D human body reconstruction from monocular images. In this context, this thesis presents three principle approaches to address 3D reconstruction problem. Firstly, we propose a disentangled solution where we recover shape and texture of the 3D shape predicted using two different networks. We recover the volumetric shape of non-rigid human body shapes given a single view RGB image followed by orthographic texture view synthesis using the respective depth projection of the reconstructed (volumetric) shape and input RGB image. Secondly, we propose PeeledHuman - a novel shape representation of the human body that is robust to self-occlusions. PeeledHuman encodes the human body as a set of Peeled Depth and RGB maps in 2D, obtained by performing ray-tracing on the 3D body model and extending each ray beyond its first intersection. We learn these Peeled maps in an end-to-end generative adversarial fashion using our novel framework - PeelGAN. The PeelGAN enables us to predict shape and color of the 3D human in an end-to-end fashion at significantly low inference rates. Finally, we further improve PeelGAN by introducing a shape prior while reconstructing from monocular images. We propose a sparse and efficient fusion strategy to combine parametric body prior with a non-parametric PeeledHuman representation. The parametric body prior enforces geometrical consistency on the body shape and pose, while the non-parametric representation models loose clothing and handles self-occlusions as well. We also leverage the sparseness of the non-parametric representation for faster training of our network while using losses on 2D maps. We evaluate our proposed methods extensively on a number of datasets. In this thesis, we also introduce 3DHumans dataset, which is a 3D life-like dataset of human body scans with rich geometrical and textural details. We cover a wide variety of clothing styles ranging from loose robed clothing like saree to relatively tight-fitting shirt and trousers. The dataset consists of around 150 male and 50 unique female subjects. Total male scans are about 180 and female scans are around 70. In terms of regional diversity, for the first time, we capture body shape, appearance and clothing styles for the South-Asian population. This dataset will be released for research purposes.

Year of completion:  May 2023
 Advisor : Avinash Sharma

Related Publications


    Downloads

    thesis

    Surrogate Approximations for Similarity Measures


    Nagender G

    Abstract

    This thesis targets the problem of surrogate approximations for similarity measures to improve their performance in various applications. We have presented surrogate approximations for popular dynamic time warping (DTW) distance, canonical correlation analysis (CCA), Intersection-over-Union (IoU), PCP, and PCKh measures. For DTW and CCA, our surrogate approximations are based on their corresponding definitions. We presented a surrogate approximation using neural networks for IoU, PCP, and PCKh measures. First, we propose a linear approximation for the naïve DTW distance. We try to speed up the DTW distance computation by learning the optimal alignment from the training data. We propose a surrogate kernel approximation over CCA in our next contribution. It enables us to use CCA in the kernel framework, further improving its performance. In our final contribution, we propose a surrogate approximation technique using neural networks to learn a surrogate loss function over IoU, PCP, and PCKh measures. For IoU loss, we validated our method over semantic segmentation models. For PCP, and PCKh loss, we validated over human pose estimation models.

    Year of completion:  March 2023
     Advisor : C V Jawahar

    Related Publications


      Downloads

      thesis

      Epsilon Focus Photography: A Study of Focus, Defocus and Depth-of-field


      Parikshit Sakurikar

      Abstract

      Focus, defocus and depth-of-field are integral aspects of a photograph captured using a wide-aperture camera. Focus and defocus blur provide critical cues for estimation of scene depth and structure which helps in scene understanding or post-capture image manipulation. Focus and defocus blur are also used creatively by photographers to produce remarkable compositional effects such as emphasis on the foreground subject with aesthetic bokeh in the background. Epsilon Focus Photography is a branch of computational photography that deals with the capture and processing of multi-focus imagery - where multiple wide-aperture images are captured with a small change in focus position. In this thesis, we provide a comprehensive study of various problems in epsilon focus photography along with a detailed analysis of the related work in the area. We provide useful constructs for the understanding and manip- ulation of focus, defocus blur and the depth-of-field of an image. The work in this thesis can be divided into four broad categories of measurement, representation, manipulation and applications of focus. Measuring focus is a long studied and challenging problem in computer vision. We study various methods to measure focus and propose a composite measure of focus that combines the strengths of well-known focus measures. We the study the task of post-capture focus manipulation at each pixel in an image and formulate a novel representation of focus that can find much use in image editing toolkits. Our representation can faithfully encode the fine characteristics of a wide-aperture image even at complex interaction locations such as depth-edges and over-saturated background regions, while optimizing the memory footprint of multi-focus imagery. Apart from precise geometric constructs for scene refocusing, we also propose an data-driven approach for post-capture scene refocusing using deep adversarial learning. We show how the tasks of deblurring an image, magnification of the defocused content and overall comprehensive focus manipulation can be efficiently modeled using conditional adversarial networks. We study several applications of focus in computer vision such as view interpolation and depth-from-focus. We provide a tool that can interpolate different views of a scene based on focus texture segmentation and propose a novel solution for depth-from-focus using the proposed composite focus measure. In summary, this thesis consists of a comprehensive study of epsilon focus photography and its applications in the context of computer vision and computational photography.

      Year of completion:  August 2021
       Advisor : P J Narayanan

      Related Publications


        Downloads

        thesis

        Optimization for and by Machine Learning


        Pritish Mohapatra

        Abstract

        In machine learning, tasks like making predictions using a model and learning model parameters can often be formulated as optimization problems. The feasibility of using a machine learning model de- pends on the efficiency with which the corresponding optimization problems can be solved. As such, the area of machine learning throws up many challenges and interesting problems for research in the field of optimization. While in some cases, it is possible to directly apply off-the-shelf optimization methods for problems in machine learning, in many other cases, it becomes necessary to develop optimization algo- rithms that are tailor-made for specific problems. On the other hand, developing optimization algorithms for specific problem domains can itself be helped by machine learning techniques. Learning optimiza- tion algorithms from data can help relieve tedious effort required to develop optimization methods for new problem domains. The challenge here is to appropriately parameterize the space of algorithms for different optimization problems. In this context, we explore the interplay between the areas of optimiza- tion and machine learning and make contributions in specific problems of interest that lie in the overlap of these fields.

         

        Year of completion:  December 2021
         Advisor : C. V. Jawahar

        Related Publications

        • Pritish Mohapatra, C. V. Jawahar and M. Pawan Kumar -  Learning to Round for Discrete Labeling Problems, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018, 09 - 11 April 2018, Playa Blanca, Lanzarote.[PDF]

        • Pritish Mohapatra, Michal Rolı́nek, C. V. Jawahar, Vladimir Kolmogorov and M. Pawan Kumar -  Efficient Optimization for Rank-based Loss Functions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 18 - 22 June 2018,Salt Lake City, Utah.[PDF]

        • Pritish Mohapatra, Puneet Kumar Dokania, C.V Jawahar and M. Pawan Kumar - Partial Linearization based Optimization for Multi-class SVM, Proceedings of European Conference on Computer Vision, (ECCV) – Amsterdam, The Netherlands, 2016. [PDF]

        • Aseem Behl, Pritish Mohapatra, C. V. Jawahar, M. Pawan Kumar - Optimizing Average Precision using Weakly Supervised Data IEEE Transations on Pattern Analysis and Machine Intelligence (TPAMI 2015). [PDF]

        • Mohak Sukhwani, Suriya Singh, Anirudh Goyal, Aseem Behl, Pritish Mohapatra, Brijendra Kumar Bharti, C.V. Jawahar - Monocular Vision based Road Marking Recognition for Driver Assistance and Safety Proceedings of the IEEE Conference on Vehicular Electronics and Safety,16-17 Dec 2014, Hyderabad, India. [PDF]

        • Pritish Mohapatra, C.V. Jawahar and M. Pawan Kumar - Efficient Optimization for Average Precision SVM Proceedings of the Neural Information Processing Systems Foundation,08-13 Dec 2014, Qubec, Canada. [PDF]


        Downloads

        thesis

         ppt

        Learning Representations for Word Images


        Praveen Krishnan

        Abstract

        Reading and writing documents is one among the primary skills with which we gather and communicate information. With the emergence of artificial intelligence (AI), researchers are in constant pursuit to build intelligent algorithms that can bring our physical and digital worlds close to each other. One such important domain is document image analysis, where we delve into the problem of understanding content from scanned document image collections. Considering “words” as the basic unit in understanding a document, in this thesis, we address the problem of finding the best possible representation for word images. Representation learning has been a key investigation for an AI problem. The primary goal of this thesis is to learn efficient representations for word images that encode its content. An ideal representation should be invariant to multiple fonts, handwritten styles and less sensitive to noise and distortions. In the past, representations have been handcrafted, specific to modalities (printed, handwritten), and sensitive to the complexities in handwriting in multi-writer scenarios. In this work, we choose the paradigm of learning from data using deep neural networks. We take our inspiration from the fact that given large amounts of annotated data, modern deep neural networks can inherently learn better representations. In this thesis, we also relax the need for large annotated datasets by heavily capitalizing on synthetically generated images. We also introduce a novel problem of learning semantic representation for word images which encodes the semantics of the word and reduces the vocabulary gap that exists between the query and the retrieved results. The first contribution of this thesis is a simple technique to generate large amounts of synthetic data, useful for pre-training deep neural networks. This led to the creation of IIIT-HWS dataset which is now widely used in the document community. The other major contributions of this thesis are: (a) the design of a deep convolutional architecture (named as HWNet) for learning an efficient holistic representation for word images, (b) a joint embedding scheme to project words and textual strings onto a common subspace, and (c) a novel form of word image representation which respects the word form along with its semantic meaning. The learned representations are evaluated under the tasks of word spotting and word recognition. We report state-of-the-art performance on popular datasets under both modern/historical and handwritten/printed document images while keeping the representation size compact in nature. Finally, in order to validate the proposed representations of this thesis, we present some interesting use cases such as (i) finding similarity between a pair of handwritten documents images, (ii) searching for keywords from online lecture videos, and (iii) building word retrieval system for Indic scripts.

         

        Year of completion:  November 2020
         Advisor : C V Jawahar

        Related Publications

        • Siddhant Bansal, Praveen Krishnan and C.V. Jawahar Improving Word Recognition using Multiple Hypotheses and Deep Embeddings The  25th International Conference of Pattern Recognition  (ICPR) (ICPR 2021), Milano  [PDF]

        • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Improving CNN-RNN Hybrid Networks for Handwriting Recognition The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

        • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Towards Spotting and Recognition of Handwritten Words in Indic Scripts The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

        • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Localizing and Recognizing Text in Lecture Videos The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

        • Vijay Rowtula, Praveen Krishnan, C.V. Jawahar - POS Tagging and Named Entity Recognition on Handwritten Documents, ICON, 2018[PDF]

        • Praveen Krishnan, Kartik Dutta and C. V. Jawahar - Word Spotting and Recognition using Deep Embedding, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]

        • Kartik Dutta,Praveen Krishnan, Minesh Mathew and C. V. Jawahar - Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]

        • Kartik Dutta, Praveen Krishnan, Minesh Mathew, and C. V. Jawahar -  Towards Accurate Handwritten Word Recognition for Hindi and Bangla National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]

        • Praveen Krishnan and C.V Jawahar - Matching Handwritten Document Images, The 14th European Conference on Computer Vision (ECCV) – Amsterdam, The Netherlands, 2016. [PDF]

        • Praveen Krishnan,  Kartik Dutta and C.V Jawahar - Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text, 15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]

        • Anshuman Majumdar, Praveen Krishnan and C.V. Jawahar - Visual Aesthetic Analysis for Handwritten Document Images,15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]

        • Praveen Krishnan, Naveen Sankaran, Ajeet Kumar Singh and C. V. Jawahar - Towards a Robust OCR System for Indic Scripts Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 7-10 April 2014, Tours-Loire Valley, France. [PDF]

        • Praveen Krishnan and C V Jawahar - Bringing Semantics in Word Image Retrieval Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF]

        • Praveen Krishnan, Ravi Sekhar, C V Jawahar - Content Level Access to Digital Library of India Pages Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]


        Downloads

        thesis

        More Articles …

        1. Geometry-aware methods for efficient and accurate 3D reconstruction
        2. Anatomical Structure Segmentation in Retinal Images with Some Applications in Disease Detection
        3. Recognizing People in Image and Videos
        4. Human Pose Retrieval for Image and Video collections
        • Start
        • Prev
        • 1
        • 2
        • 3
        • 4
        • Next
        • End
        1. You are here:  
        2. Home
        3. Research
        4. Thesis
        5. Doctoral Dissertations
        Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.