CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Modelling Structural Variations in Brain Aging


Alphin J Thottupattu

Abstract

The aging of the brain is a complex process shaped by a combination of genetic factors and environmental influences, exhibiting variations from one population to another. This thesis investigates normative population-specific structural changes in the brain and explores variations in aging-related changes across different populations. The study gathers data from diverse groups, constructs individual models, and compares them through a thoughtfully designed framework. This thesis proposes as a comprehensive pipeline covering data collection, modeling, and the creation of an analysis framework. Finally, it offers an illustrative cross-population analysis, shedding light on the comparative aspects of brain aging. In our study, the Indian population is considered as the reference, and an effort is made to ad- dress gaps within this population through the creation of a population-specific database, an atlas, and an aging model to facilitate the study. Due to the challenges in data collection, we adopted a cross-sectional approach. A cross-sectional brain image database is meticulously curated for In- dian population. A sub-cortical structural atlas is created for the young population, enabling us to establish reference structural segmentation map for the Indian population. Age-specific, gender balanced, and high-resolution scans collected to create the first Indian brain aging model. Choosing cross-sectional data collection made sense because data from other populations were also mostly collected in a cross-sectional manner. Using the in-house database for Indian population and pub- licly available datasets for other populations, our inter-population analysis compares aging trends across Indian, Caucasian, Chinese, and Japanese populations. Developing an aging model from cross-sectional data presents challenges in distinguishing between cross-sectional variations and normative trends. In response, we proposed a method specifically tailored for cross-sectional data. We present a unique metric within our comprehensive aging comparison framework to differentiate between temporal and global anatomical variations across populations. This thesis has detailed a comprehensive process to compare the aspects of healthy aging across these diverse groups, ultimately concluding with a pilot study across four different populations. This framework can be readily adapted to study various research problems, exploring changes associated with different populations while considering factors beyond ethnicity, such as lifestyle, education, socio-economic factors, etc. Similar analysis frameworks and studies with multiple modalities and larger sample sizes will contribute to deriving more conclusive results.

Year of completion:  May 2024
 Advisor : Jayanthi Sivaswamy

Related Publications


    Downloads

    thesis

    Towards Machine-understanding of Document Images


    Minesh Mathew

    Abstract

    Imparting machines the capability to understand documents like humans do is an AI-complete problem since it involves multiple sub-tasks such as reading unstructured and structured text, understanding graphics and natural images, interpreting visual elements such as tables and plots, and parsing the layout and logical structure of the whole document. Except for a small percentage of documents in structured electronic formats, a majority of the documents used today, such as documents in physical mediums, born-digital documents in image formats, and electronic documents like PDFs, are not readily machine readable. A paper-based document can easily be converted into a bitmap image using a flatbed scanner or a digital camera. Consequently, machine understanding of documents in practice requires algorithms and systems that can process document images—a digital image of a document. Successful application of deep learning-based methods and use of large-scale datasets significantly improved the performance of various sub-tasks that constitute the larger problem of machine understanding of document images. Deep-learning based techniques have successfully been applied to the detection, and recognition of text and detection and recognition of various document sub-structures such as forms and tables. However, owing to the diversity of documents in terms of language, modality of text present (typewritten, printed, handwritten or born-digital), images and graphics (photographs, computer graphics, tables, visualizations, and pictograms), layout and other visual cues, building generic-solutions to the problem of machine understanding of document images is a challenging task. In this thesis, we address some of the challenges in this space, such as text recognition in low-resource languages, information extraction from historic/handwritten collections and multimodal modeling of complex document images. Additionally, we introduce new tasks that call for a top-down perspective— to understand a document image as a whole, not in parts—of document image understanding, different from the mainstream trend where the focus has been on solving various bottom-up tasks. Most of the existing tasks in Document Image Analysis (DIA) deal with independent bottom-up tasks that aim to get a machine-readable description of certain, pre-defined document elements at various abstractions such as text tokens or tables. This thesis motivates a purpose-driven DIA wherein a document image is analyzed dynamically, subject to a specific requirement set by a human user or an intelligent agent. We first consider the problem of making document images printed in low-resource languages machine-readable using an OCR and thereby making these documents AI-ready. To this end, we propose to use an end-to-end neural network model that can directly transcribe a word or line image from a document to corresponding Unicode transcription. We analyze how the proposed setup overcomes many challenges to text recognition of Indic languages. Results of our synthetic to real transfer learning experiments for text recognition demonstrate that models pre-trained on synthetic data and further fine-tuned on a portion of the real data perform as well as models trained purely on real data. For 10+ languages for which there have not been public datasets for printed text recognition, we introduce a new dataset that has more than one million word images in total. We further conduct an empirical study to compare different end-to-end neural network architectures for word and line recognition of printed text. Another significant contribution of this thesis is the introduction of new tasks that require a holistic understanding of document images. Different from existing tasks in Document Image Analysis (DIA) that attempt to solve independent bottom-up tasks, we motivate a top-down perspective of DIA that requires a holistic understanding of the image and purpose-driven information extraction. To this end, we propose two tasks—DocVQA and InfographicVQA— fashioned along Visual Question Answering (VQA) in computer vision. For DocVQA, we show results using multiple strong baselines that are adapted from existing models for existing VQA and QA problems. For InfographicVQA, we propose a transformer-based, BERT-like model that jointly models multimodal—vision, language, and layout—input. We conduct open challenges for both tasks, attracting hundreds of submissions so far. Next, we work on the problem of information extraction from a document image collection. Recognizing text from historical and/or handwritten manuscripts is a major challenge to information extraction from such collections. Similar to open-domain QA in NLP, we propose a new task in the context of document images that seek to get answers for natural language questions asked on collections of manuscripts. We propose a two-stage retrieval-based approach for the problem that uses deep features of word images and textual words. Our approach is recognition-free and returns image snippets as answers to the questions. Although our approach is recognition-free and consequently oblivious to the semantics of the text in the documents, it can look for documents or document snippets that are lexically similar to the question. We show that our approach is a reasonable alternative when using text-based QA models is infeasible due to the difficulty in recognizing text in the document images.

    Year of completion:  January 2024
     Advisor : C V Jawahar

    Related Publications


      Downloads

      thesis

      3D Shape Analysis: Reconstruction and Classification


      Sai Sagar Jinka

      Abstract

      The reconstruction and analysis of 3D objects by computational systems has been an intensive and long-lasting research problem in the graphics and computer vision scientific communities. Traditional acquisition systems are largely restricted to studio environment setup which requires multiple synchronized and calibrated cameras. With the advent of active depth sensors like time-of-flight sensors, structured lighting sensors made 3D acquisition feasible. This advancement of technology has paved way to many research problems like 3D object localization, recognition, classification, reconstruction which demand innovating sophisticated/elegant solutions to match their ever growing applications. 3D human body reconstruction, in particular, has wider applications like virtual mirror, gait analysis, etc. Lately, with the advent of deep learning, 3D reconstruction from monocular images garnered significant interest among the research community as it can be applied to in-the-wild settings. Initially we started exploration of classification of 3D rigid objects due to availabilty of ShapeNet datasets. In this thesis, we propose an efficient characterization of 3D rigid objects which take local geometry features into consideration while constructing global features in the deep learning setup. We introduce learnable B-Spline surfaces in order to sense complex geometrical structures (large curvature variations). The locations of these surfaces are initialized over the voxel space and are learned during training phase leading to efficient classification performance. Later on, we primarily focus on rather challenging problem of non-rigid 3D human body reconstruction from monocular images. In this context, this thesis presents three principle approaches to address 3D reconstruction problem. Firstly, we propose a disentangled solution where we recover shape and texture of the 3D shape predicted using two different networks. We recover the volumetric shape of non-rigid human body shapes given a single view RGB image followed by orthographic texture view synthesis using the respective depth projection of the reconstructed (volumetric) shape and input RGB image. Secondly, we propose PeeledHuman - a novel shape representation of the human body that is robust to self-occlusions. PeeledHuman encodes the human body as a set of Peeled Depth and RGB maps in 2D, obtained by performing ray-tracing on the 3D body model and extending each ray beyond its first intersection. We learn these Peeled maps in an end-to-end generative adversarial fashion using our novel framework - PeelGAN. The PeelGAN enables us to predict shape and color of the 3D human in an end-to-end fashion at significantly low inference rates. Finally, we further improve PeelGAN by introducing a shape prior while reconstructing from monocular images. We propose a sparse and efficient fusion strategy to combine parametric body prior with a non-parametric PeeledHuman representation. The parametric body prior enforces geometrical consistency on the body shape and pose, while the non-parametric representation models loose clothing and handles self-occlusions as well. We also leverage the sparseness of the non-parametric representation for faster training of our network while using losses on 2D maps. We evaluate our proposed methods extensively on a number of datasets. In this thesis, we also introduce 3DHumans dataset, which is a 3D life-like dataset of human body scans with rich geometrical and textural details. We cover a wide variety of clothing styles ranging from loose robed clothing like saree to relatively tight-fitting shirt and trousers. The dataset consists of around 150 male and 50 unique female subjects. Total male scans are about 180 and female scans are around 70. In terms of regional diversity, for the first time, we capture body shape, appearance and clothing styles for the South-Asian population. This dataset will be released for research purposes.

      Year of completion:  May 2023
       Advisor : Avinash Sharma

      Related Publications


        Downloads

        thesis

        Surrogate Approximations for Similarity Measures


        Nagender G

        Abstract

        This thesis targets the problem of surrogate approximations for similarity measures to improve their performance in various applications. We have presented surrogate approximations for popular dynamic time warping (DTW) distance, canonical correlation analysis (CCA), Intersection-over-Union (IoU), PCP, and PCKh measures. For DTW and CCA, our surrogate approximations are based on their corresponding definitions. We presented a surrogate approximation using neural networks for IoU, PCP, and PCKh measures. First, we propose a linear approximation for the naïve DTW distance. We try to speed up the DTW distance computation by learning the optimal alignment from the training data. We propose a surrogate kernel approximation over CCA in our next contribution. It enables us to use CCA in the kernel framework, further improving its performance. In our final contribution, we propose a surrogate approximation technique using neural networks to learn a surrogate loss function over IoU, PCP, and PCKh measures. For IoU loss, we validated our method over semantic segmentation models. For PCP, and PCKh loss, we validated over human pose estimation models.

        Year of completion:  March 2023
         Advisor : C V Jawahar

        Related Publications


          Downloads

          thesis

          Epsilon Focus Photography: A Study of Focus, Defocus and Depth-of-field


          Parikshit Sakurikar

          Abstract

          Focus, defocus and depth-of-field are integral aspects of a photograph captured using a wide-aperture camera. Focus and defocus blur provide critical cues for estimation of scene depth and structure which helps in scene understanding or post-capture image manipulation. Focus and defocus blur are also used creatively by photographers to produce remarkable compositional effects such as emphasis on the foreground subject with aesthetic bokeh in the background. Epsilon Focus Photography is a branch of computational photography that deals with the capture and processing of multi-focus imagery - where multiple wide-aperture images are captured with a small change in focus position. In this thesis, we provide a comprehensive study of various problems in epsilon focus photography along with a detailed analysis of the related work in the area. We provide useful constructs for the understanding and manip- ulation of focus, defocus blur and the depth-of-field of an image. The work in this thesis can be divided into four broad categories of measurement, representation, manipulation and applications of focus. Measuring focus is a long studied and challenging problem in computer vision. We study various methods to measure focus and propose a composite measure of focus that combines the strengths of well-known focus measures. We the study the task of post-capture focus manipulation at each pixel in an image and formulate a novel representation of focus that can find much use in image editing toolkits. Our representation can faithfully encode the fine characteristics of a wide-aperture image even at complex interaction locations such as depth-edges and over-saturated background regions, while optimizing the memory footprint of multi-focus imagery. Apart from precise geometric constructs for scene refocusing, we also propose an data-driven approach for post-capture scene refocusing using deep adversarial learning. We show how the tasks of deblurring an image, magnification of the defocused content and overall comprehensive focus manipulation can be efficiently modeled using conditional adversarial networks. We study several applications of focus in computer vision such as view interpolation and depth-from-focus. We provide a tool that can interpolate different views of a scene based on focus texture segmentation and propose a novel solution for depth-from-focus using the proposed composite focus measure. In summary, this thesis consists of a comprehensive study of epsilon focus photography and its applications in the context of computer vision and computational photography.

          Year of completion:  August 2021
           Advisor : P J Narayanan

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Optimization for and by Machine Learning
            2. Learning Representations for Word Images
            3. Geometry-aware methods for efficient and accurate 3D reconstruction
            4. Anatomical Structure Segmentation in Retinal Images with Some Applications in Disease Detection
            • Start
            • Prev
            • 1
            • 2
            • 3
            • 4
            • 5
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. Thesis
            5. Doctoral Dissertations
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.