CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Exploring Data Driven Graphics for Interactivity


Aakash KT

Abstract

The goal of rendering in computer graphics is the synthesis of photorealistic images from abstract virtual scene descriptions. Among the many rendering methods, path tracing along with physically accurate material modeling has emerged as the industry standard for photorealistic image synthesis. Path tracing is, however a computationally expensive operation. Moreover, physically accurate material modeling requires iterative parameter tuning and visualization using path tracing. This quickly becomes a bottleneck for the tuning of a large number of parameters found in most physically accurate material models, and artists end up spending a lot of time waiting for path tracing to converge. This dissertation proposes to leverage the learning power of neural networks to plausibly approximate path tracing for: (a) quick material visualization while editing, (b) out-of-the box material recovery using inverse rendering. The traditional workflow for tuning material parameters and visualization is typically followed by modifications to lighting for better understanding the behaviour of the material. We thus incorporate the ability to modify the environment lighting in our neural material visualization framework. This significantly aids in understanding the behaviour of the material thus aiding the process of parameter tuning, which we demonstrate with a user study. Most real-world materials are spatially varying, which is that their behaviour varies across the surface of the geometry. While previous works in this area dealt with uniform materials, we propose improvements for handling spatially varying materials. We design our neural network architecture to be light-weight, having 10× lesser trainable parameters than the state-of-the-art, while also producing better visualizations. We also build a tool to demonstrate real-time parameter tuning and visualization of materials. Finally, we extend the above work to handle general 3D scenes in specific lighting scenarios. This is done by training a small fully connected neural network on a simple scene in the local tangent frame. Training in the local tangent frame decouples geometric complexity from training which implies that the network generalizes to any geometry. We embed this network in a standard path tracer to evaluate direct illumination at each path vertex. We thus not only obtain noise free direct illumination but also out-of-the box gradients of the rendering process, which can be used for inverse rendering. We show that these gradients can be used for plausibly recovering the material of a target scene. We conclude by discussing various practical aspects of our method.

Year of completion:  April 2022
 Advisor : P J Narayanan

Related Publications


    Downloads

    thesis

    A study of Automatic Segmentation of 3-D Brain MRI and its application to Deformable Registration


    Aabhas Majumdar

    Abstract

    Automated segmentation of cortical and non-cortical human brain structures has been hitherto approached using nonrigid registration followed by label fusion. We propose an alternative approach for this using a convolutional neural network (CNN) which classifies a voxel into one of many structures. Four different kinds of two-dimensional and three-dimensional intensity patches are extracted for each voxel, providing local and global (context) information to the CNN. The proposed approach is evaluated on five different publicly available datasets which differ in the number of labels per volume. The obtained mean Dice coefficient varied according to the number of labels, for example, it is 0.844 0.031 and 0.743 0.019 for datasets with the least (32) and the most (134) number of labels, respectively. These figures are marginally better or on par with those obtained with the current state-of-the-art methods on nearly all datasets, at a reduced computational time. The consistently good performance of the proposed method across datasets and no requirement for registration make it attractive for many applications where reduced computational time is necessary. Feature-based registration has been popular with a variety of features ranging from voxel intensity to Self-Similarity Context (SSC). In the second part of the thesis, we examine the question of how features learnt using various Deep Learning (DL) frameworks for segmentation can be used for deformable registration and whether this feature learning is necessary or not. We investigate the use of features learned by different DL methods in the current state-of-the-art discrete registration framework and analyze its performance on 2 publicly available datasets. We draw insights about the type of DL framework useful for feature learning. We consider the impact, if any, of the complexity of different DL models and brain parcellation methods on the performance of discrete registration. Our results indicate that the registration performance with DL features and SSC are comparable and stable across datasets whereas this does not hold for low level features.

    Year of completion:  April 2022
     Advisor : Jayanthi Sivaswamy

    Related Publications


      Downloads

      thesis

      Saliency Estimation in Videos and Images


      Samyak Jain

      Abstract

      With the growing data in terms of images and videos, it becomes imperative to derive a solution to filter out important/salient information in these data. Learning computational models for visual attention (saliency estimation) is an effort to inch machines/robots closer to human visual cognitive abilities. Developing such a solution can help reduce human effort and can be used in various applications like automatic cropping, segmentation etc. We approached this problem by investigating saliency estimation in images which is predicting the stimuli of the human visual system when exposed to an image. We start by identifying four key components of saliency models, i.e. , input features, multi-level integration, readout architecture, and loss functions. Data-driven efforts have dominated the landscape since the introduction of deep neural network architectures. In deep learning research, the choices in architecture design are often empirical and frequently lead to more complex models than necessary. The complexity, in turn, hinders the application requirements. We review other state-of-the-art models on these four components and propose two novel and simpler end-to-end architectures - SimpleNet and MDNSal. They are straightforward, neater, minimal, and more interpretable than other architectures achieving state-of-the-art performance on public saliency benchmarks. SimpleNet is an optimized encoder-decoder architecture. MDNSal is a parametric model that directly predicts parameters of a GMM distribution and aims to bring more interpretability to the prediction maps. Conclusively, we suggest that the way to move forward is not necessarily to design complex architectures but a modular analysis to optimize each component and possibly explore novel (and simpler) alternatives. After exploring these components, we shifted our focus to saliency estimation in videos where the stimuli of users are captured when exposed to a dynamic scenario. We propose ViNet architecture for the task of video saliency prediction. ViNet is a fully convolutional encoder-decoder architecture. The encoder uses visual features from a network trained for action recognition, and the decoder infers a saliency map via trilinear interpolation and 3D convolutions, combining features from multiple hierarchies. The overall architecture of ViNet is conceptually simple; it is causal and runs in real-time. ViNet does not use audio as input and still outperforms the state-of-the-art audio-visual saliency prediction models on nine different datasets (three visual-only and six audio-visual datasets). We also explore a variation of ViNet architecture by augmenting audio features into the decoder. To our surprise, upon sufficient training, the network becomes agnostic to the input audio and provides the same output irrespective of the input. Interestingly, we also observe similar behavior in the previous state-of-the-art models for audio-visual saliency prediction. Our findings contrast with earlier works on deep learningbased audio-visual saliency prediction, suggesting a clear avenue for future explorations incorporating audio more effectively

      Year of completion:  February 2022
       Advisor : Vineet Gandhi

      Related Publications


        Downloads

        thesis

        Fast and Accurate Image Recognition


        Sri Aurobindo Munagala

        Abstract

        DNNs (Deep Neural Networks) have found use in a variety of applications recently, and have become much larger and resource-hungry over the years. CNNs (Convolutional Neural Networks) are widely used in Computer Vision tasks, which make use of the convolution operation in successive layers – making them computationally expensive to run. Due to the large size and computation required to run modern models, it is difficult to use them in resource-constrained scenarios, such as mobiles and embedded devices. Also, the amount of data created and collected is growing at a rapid rate, and annotating large amounts of data to use it for training DNNs is expensive. There are approaches which seek to intelligently query samples to train upon in an iterative fashion, such as Active Learning – but AL setups are themselves very costly to run due to full training of large models many times in the process. In this thesis, we explore methods to achieve extremely high speedups in both CNN model inference, and train-times for AL setups. Various paradigms to achieve fast CNN inference have been explored in the past, two major ones being binarization and pruning. Binarization involves the quantization of weights and/or inputs of the network from 32-bit full precision floats into a {-1,+1} space, with the aim of both achieving compression (as singular bits occupy 32 times less space than 32-bit floats) and speedups (as bit-bit operations can be done faster). Network pruning, on the other hand, tries to identify and remove redundant parts of the network in an unstructured (individual weights) or structured (channels/layers) manner to create sparse and efficient networks. While both these paradigms – binarization and pruning – have demonstrated great efficacy in achieving speedups and compression in CNNs, little work has been done in attempting to combine both approaches together. We argue that these paradigms are complementary, and can be combined to offer high levels of compression and speedup without any significant accuracy loss. Intuitively, weights/activations closer to zero have higher binarization error making them good candidates for pruning. We propose a novel Structured Ternary-Quantized Network that incorporates speedups from binary convolution algorithms through structured pruning, enabling the removal of pruned parts of the network entirely post-training. Our approach beats previous works attempting the same by a significant margin. Overall, our method brings up to 89x layer-wise compression over the corresponding full-precision networks – achieving only 0.33% loss on CIFAR-10 with ResNet-18 with a 40% PFR (Prune Factor Ratio for filters), and 0.3% on ImageNet with ResNet-18 with a 19% PFR. We also explore the field of AL (Active Learning), which is used in scenarios where unlabelled data is abundant, but annotation costs for that data make it infeasible to utilize all data for supervised training. AL methods initially train a model with a small pool of annotated data, and use the model’s (referred to as the selection model) predictions on the rest of the unlabelled data to form a query for annotating more samples. The training-query process iteratively happens till a sufficient amount of data is labelled. However, in practice, this sample selection practice in AL setups takes a lot of time as the selection model is fully re-trained on the labelled pool every iteration. We offer two improvements to the standard AL setup to bring down the overall train time required for sample selection significantly: the first, the introduction of a “memory” that enables us to train the selection model for fewer samples every round as opposed to the entire labelled dataset so far, and the second, the use of fast convergence techniques to reduce the number of epochs the selection model trains for. Our proposed improvements can work in tandem with previous techniques such as the use of proxy models for selection, and the combined improvements can bring more than 75x speedups in overall sample selection time to standard AL setups, making them feasible and easy to run in real-life scenarios where procuring large amounts of data is easy, but labelling them is difficult.

        Year of completion:  January 2022
         Advisor : Anoop M Namboodiri

        Related Publications


          Downloads

          thesis

          Towards Understanding and Improving the Generalization Performance of Neural Networks


          sarath sivaprasad

          Abstract

          The widespread popularity of over-parameterized deep neural networks (NNs) is backed by its ‘unreasonable’ performance on unseen data that is Independent, Identically Distributed with respect to the train data. The generalization of NNs cannot be reasoned with the traditional machine learning wisdom that the increase in the number of parameters leads to overfitting on train samples and subsequently reduced generalization. Various generalization measures have been proposed in recent times to explain the generalization property of deep learning networks. Despite some promising investigations, there is little consensus on how we can explain the generalization of NNs. Furthermore, the ability of neural networks to fit any train data for any random configuration of labels makes it more challenging to explain their generalization performance. Despite this ability to completely fit any given data, the neural network seems to be able to ‘cleverly’ learn a generalizing solution. We hypothesize that the ‘simple’ solution lies in a constrained subspace of the hypothesis space. We propose a constraint formulation of neural networks to close the generalization gap. We show that, through a principled constraint, we can achieve comparable train test performance for neural networks. We propose a way to constrain each output of the neural network as the convex combination of its inputs to ensure certain desirable geometry of the decision boundaries. This document covers two major aspects. Firstly, we show the improved generalization of neural networks using convex constraints. The second section goes beyond the IID setting and investigates the generalization of neural networks on the Out Of Distribution test sets.In the first section of the document, we investigate the constrained formulation of neural networks where the output is a convex function of the input. We show that the convexity constraints can be enforced on both fully connected and convolutional layers, making them applicable to most architectures. The convexity constraints include restricting the weights (for all but the first layer) to be non-negative and using a non-decreasing convex activation function. Albeit simple, these constraints have profound implications on the generalization abilities of the network. We draw three valuable insights: (a) Input Output Convex Neural Networks (IOC-NNs) self regularize and significantly reduce the problem of overfitting; (b) Although heavily constrained, they outperform the base multi-layer perceptrons and achieve similar performance as compared to base convolutional architectures and (c) IOCNNs show robustness to noise in train labels. We demonstrate the efficacy of the proposed idea using thorough experiments and ablation studies on six commonly used image classification datasets with three different neural network architectures.In the second section, we revisit the ability of networks to completely fit any given data and yet ‘cleverly’ learn the generalizing hypothesis from the large number of variances that can explain the train data. In accordance with concurrent findings, our explorations show that neural networks learn the most ‘low-lying’ variance in the data. They learn the features that easily correlate with the label and do no further exploration after finding such a solution. With this insight, we reinvent the need to understand the generalization of neural networks and improve them. We go beyond traditional IID and OOD evaluation benchmarks to further our understanding of learning in deep networks. Through our explorations, we give a possible explanation as to why neural networks can do well in certain benchmarks and why other inventive methods fail to give any consistent improvement over a simple neural network. Domain Generalization (DG) requires a model to learn a hypothesis from multiple distributions that generalizes to an unseen distribution. DG has been perceived as a front face of OOD generalization. We present empirical evidence to show that the primary reason for generalization in DG is the presence of multiple domains while training. Furthermore, we show that methods for generalization in IID are equally important for generalization in DG. Tailored methods fail to add performance gains in the Traditional DG (TDG) evaluation. Our experiments prompt if TDG has outlived its usefulness in evaluating OOD generalization? To further strengthen our investigation, we propose a novel evaluation strategy, ClassWise DG (CWDG), where for each class, we randomly select one of the domains and keep it aside for testing. We argue that this benchmarking is closer to human learning and relevant in real-world scenarios. Counter-intuitively, despite being exposed to all domains during training, CWDG is more challenging than TDG evaluation. While explaining the observations, our work makes a case for more fundamental analysis around the DG problem before exploring new ideas to tackle it Keywords – generalization of deep networks, constrained formulation, input-output-convex neural network, robust generalization bounds, explainable decision boundaries, mixture of experts

          Year of completion:  May 2022
           Advisor : Vineet Gandhi

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Exploiting Cross-Modal Redundancy for Audio-Visual Generation
            2. Computer-Aided diagnosis of closely related diseases
            3. Deep Learning for Assisting Annotation and Studying Relationship between Cancers using Histopathology Whole Slide Images
            4. Bounds on Generalization Error of Variational Lower Bound Estimatesof Mutual Information and Applications of Mutual Information
            • Start
            • Prev
            • 11
            • 12
            • 13
            • 14
            • 15
            • 16
            • 17
            • 18
            • 19
            • 20
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. MS Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.