CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Computer-Aided diagnosis of closely related diseases


Abhinav Dhere

Abstract

It is often observed that certain human diseases exhibit similarities in some form while having different prognoses and requiring treatment strategies. These similarities may be in the form of risk factors towards the diseases, symptoms observed, visual similarity in imaging studies, or in some cases, similarity in molecular associations. Computer-Aided Diagnosis (CAD) of closely related diseases is challenging and requires tailored approaches to discriminate between such closely related diseases accurately. This thesis looks at two sets of closely related diseases of two different organs, identified from two different modalities. It develops novel approaches to achieve explainable and accurate CAD for these two close diseases. These two problems are discrimination of healthy, mild cognitive impairment (MCI) and Alzheimer’s Disease (AD) from brain MRI-derived surface mesh and classifying healthy, non-COVID pneumonia and COVID from chest X-ray images. In the first part of this thesis, we present a novel 2D image representation for the brain mesh surface, called a height map. Further, we explore the use of height maps towards the hierarchical classification of healthy, MCI, and AD cases. We also compare different strategies of extracting features and regions of interest from height maps and their performance towards healthy vs. MCI vs. AD classification. We demonstrate that the proposed method achieves fast classification of AD and MCI with minor loss of accuracy compared to state of the art. In the second half of this thesis, we present a novel deep learning architecture called Multi-scale Attention Learning Residual Learning (MARL) and a new conicity loss for training the MARL architecture. We utilize MARL and the conicity loss for achieving hierarchical classification of normal, non-COVID pneumonia and COVID pneumonia from Chest X-ray images. We present classification results on three public datasets and demonstrate that the proposed method achieves comparable or marginally better performance than state-of-the-art in all cases. Further, we demonstrate that the proposed framework achieves clinically consistent explanations with extensive experimentation. Qualitatively, this is shown by comparing GradCAM heatmaps for the proposed method to those for the state-of-the-art method. It is observed that the heatmaps overlap better with the bounding boxes for pneumonia marked by experts compared to the overlap achieved by the state-of-the-art method. Next, we show quantitatively that the GradCAM heatmaps for the proposed method generally lie within inner regions of the lung for nonCOVID pneumonia. However, the same heatmaps lie in outer regions in the case of COVID pneumonia. Thus, we establish the clinical consistency of explanations provided by the proposed framework.

Year of completion:  June 2022
 Advisor : Jayanthi Sivaswamy

Related Publications


    Downloads

    thesis

    Deep Learning for Assisting Annotation and Studying Relationship between Cancers using Histopathology Whole Slide Images


    Ashish Menon

    Abstract

    Cancer has been the leading cause of death across the globe and there has been a constant push and research from the scientific community as a whole towards assisting its diagnosis. Particularly the last decade has seen widespread use of computer vision and AI towards tackling cancer diagnosis using both radiological (non-invasive, ex: X-ray, CT ) and pathological modalities (invasive ex: histopathology). A histopathologic Whole Slide Image (WSI) represents digitized image of tissue sample characterized by a large size of up to 109 pixels at maximum resolution. It is considered the gold standard for cancer diagnosis. The routine diagnosis involves experts called pathologists to analyse a slide containing the tissue samples under a microscope. The process is often subjective to cognitive load and the diagnosis is prone to inter and intra pathologist errors. With the digitization of tissue samples as WSI, computerassisted diagnosis would be impactful to address the above issues especially with the advent of deep learning. For this we require models to be trained with large amounts of annotated data as well as understanding the cancer manifestation across organs. In this thesis work, we address two major issues with the help of deep learning techniques. (1) assisting the whole slide image annotation using an expert in the loop and (2) understanding the relationship between cancers and bring to light commonalities of cancer patterns between certain pairs of organs. A typical process of slide diagnosis under a microscope by experts involves exhaustive analysis in scanning across the slides, searching for anomalous/tumorous regions present. Owing to the large dimensions of the histopathology WSI, visually searching for clinically significant regions (patches) is a tedious task for a medical expert. Sequential analysis of several such images further increases the workload resulting in poor diagnosis. A major impediment to automate this task using a deep learning model is the requirement of large amounts of annotated data of WSI patches which is a laborious process and involves exhaustive search for anomalous regions. To tackle this issue, the first part of the thesis work proposes a novel CNN-based, expert feedback-driven interactive learning technique to mitigate this issue. The proposed method seeks to acquire labels of the most informative patches in small increments with multiple feedback rounds to maximize the throughput. It requires the expert to query a patch of interest from a slide and provide feedback to a set of unlabelled patches chosen using the proposed sampling strategy from a ranked list. The proposed technique is applied in a setting that assumes there exists a large cohort of unannotated slides, almost eliminating the need of annotated data upfront; instead learns with an expert involvement. We discuss several strategies for sampling the right set of samples to be labelled by the expert to minimise the expert feedback and maximise the throughput. Theproposed technique can also annotate multiple slides parallelly using a single slide under review (used to query anomalous patches), which in turn reduces the effort of annotation. The Cancer Genome Atlas (TCGA) contains large repositories of histopathology whole slide images spanning across several organs and subtypes. However, not much work has gone into analysis of all the organs and subtypes and their similarities. Our work attempts to bridge this gap by training deep learning models to classify cancer vs normal patches for 11 subtypes spanning 7 organs (9792 tissue slides) to achieve near-perfect classification performance. We used these models to investigate their performances in the test set of other organs (cross organ inference). We found that every model had a good cross organ inference accuracy when tested on breast, colorectal and liver cancers. Further, a high accuracy is observed between models trained on the cancer subtypes originating from same organ (kidney and lung). We also validated these performances by showing the separability of cancer and normal samples in a high dimensional feature space. We further hypothesized that the high cross organ inferences are due to shared tumor morphologies among organs. We validated the hypothesis by showing the overlap in the Gradient-weighted Class Activation Mapping (GradCAM) visualizations and similarities in the distributions of nuclei geometrical features present within the high attention regions.

    Year of completion:  June 2022
     Advisor : C V Jawahar,Vinod P K

    Related Publications


      Downloads

      thesis

      Bounds on Generalization Error of Variational Lower Bound Estimatesof Mutual Information and Applications of Mutual Information


      P Aditya Sreekar Sreekar

      Abstract

      Machine learning techniques have found widespread usage in critical systems in the recent past.This rise of ML was mainly fueled by Neural Networks, which enabled end-to-end systems to use.This is possible because of the feature learning and universal approximator properties of NNs. Recentresearch has extended the application of NNs to the estimation of mutual information (MI) from samples.They construct a critic using a NN and maximize a variation lower bound of MI. These methods havealleviated many problems faced by previous MI estimation methods like the inability to scale andincompatibility with mini-batch-based optimization. While these estimation methods are reliable whenthe true MI is low, they tend to produce high variance estimates when the true MI is high. We arguethat the high variance characteristics are due to the uncontrolled complexity of the critic’s hypothesisspace. In support of this argument, we use the data-drivenRademacher complexityof the hypothesisspace associated with the critic’s architecture to derive upper bounds on thegeneralization errorofvariational lower bound based estimates of MI. Using the derived bounds, we show that Rademachercomplexity of the critic’s hypothesis is a major contributor to the generalization error. Inspired by thiswe propose to negate the high variance characteristics of these estimators by constraining the critic’shypothesis space to aReproducing Kernel Hilbert Space(RKHS), which corresponds to a kernel learnedusingAutomated Spectral Kernel Learning(ASKL). Further, we propose two regularisation terms byexamining the upper bound on the Rademacher complexity of RKHS learned by ASKL. The weightsof these regularization terms can be varied for a bias-variance tradeoff. We empirically demonstratethe efficacy of this regularization in enforcing proper bias-variance tradeoff on four different variationallower bounds of MI, namely NWJ, MINE, JS and SMILE.We also propose a novel application of MI estimation using variational lower bounds and NNs inlearning disentangled representations of videos. The learned representations contain two sets of features,content and pose representations which disjointly represent the content and movement in the video. Thisis achieved by minimizing the mutual information between pose representation of different frames from the same video, thus in effect forcing the content representation to capture information common betweendifferent frames. We qualitatively and quantitatively compare our method against DRNET [24] on twosynthetic and one real-world dataset.

      Year of completion:  June 2022
       Advisor : Anoop M Namboodiri

      Related Publications


        Downloads

        thesis

        Parallelizing Modules to Optimize and Enhance Fingerprint Recognition Systems


        Saraansh Tandon

        Abstract

        Today human identification is a practice followed quite frequently in contexts ranging from forensic labs to high security organisations to almost every domestic household. One of the oldest and most accurate ways to establish the identity of an individual is through fingerprints. It is so because, biologically no two people (not even identical twins) have the same fingerprints. Hence it comes as no surprise that fingerprint biometrics has been a very active domain of research for a long time. Over the years, especially with the emergence of the deep learning era, the computational modules of a fingerprint recognition system have become bigger and bigger. Simultaneously, the actual hardware deploying fingerprint recognition systems has become more and more user friendly. These developments provide a contrast as one tries to incorporate the highly accurate but also extremely heavy modules of a fingerprint recognition system in the palm of our hands. In this thesis, we will perform a couple of studies which explore ways to optimize fingerprint recognition systems. This is done by trying to parallelize modules and eliminate semantically redundant computations. Along the way we will also obtain better performances as compared to the academic state of the art. Typical fingerprint recognition systems are comprised of a spoof detection module and a subsequent matching module, running one after the other. In the first study, we posit that both spoof detection and fingerprint matching are correlated tasks. Therefore, rather than performing the two tasks separately, we propose a joint model for spoof detection and matching to simultaneously perform both tasks without compromising the accuracy of either task. In practice, this reduces the time and memory requirements of the fingerprint recognition system by 50% and 40%, respectively. On the other hand, fingerprint matching task itself can be solved in multiple ways. Fingerprint feature extraction is a core step in the fingerprint matching task and it can be solved by using a global as well as a local approach. State of the art global approaches use huge deep learning models to process the full fingerprint image at once which makes the corresponding approach memory intensive. Whereas local approaches involve minutiae based patch extraction and multiple feature extraction steps which make the corresponding approach time-intensive. But both these approaches provide useful and sometimes exclusive insights for solving the problem. Hence using both kinds of approaches together for extracting fingerprint representations is semantically useful but quite inefficient. Hence, in the second study, we use a convolutional transformer based approach with an in-built minutiae extractor. This study provides an efficient solution for feature extraction that parallely provides a global as well as a local represen tation of a fingerprint. This not only provides a replacement for using a heavy combination of two independent local and global approaches but also provides a further 54.41% speedup over the local approach and a 57.93% conservation in memory as compared to the global approach. Both of these studies work towards the common goal of optimizing fingerprint recognition systems and hence provide a significant advantage in the context of such systems deployed on resource constrained devices like smartphones.

        Year of completion:  June 2022
         Advisor : Anoop M Namboodiri

        Related Publications


          Downloads

          thesis

          Scene Text Recognition for Indian Languages


          Sanjana Gunna

          Abstract

          Text recognition has been an active field in computer vision even before the beginning of the deep learning era. Due to the varied applications of recognition models, the research area has been classified into diverse categories based on the domain of the data used. Optical character recognition (OCR) is focused on scanned documents, whereas images with natural scenes and much complex backgrounds fall into the category of scene text recognition. Scene text recognition has become an exciting area of research due to the complexities and difficulties such as complex backgrounds, improper illumination, distorted images with noise, inconsistent usage of fonts and font sizes that are not usually horizontally aligned. Such cases make the task of scene text recognition more complicated and challenging. In recent years, we have observed the rise of deep learning. Subsequently, there has been an incremental growth in the recognition algorithms and datasets available for training and testing purposes. This surge has caused the performance of recognizing text in natural scenes to rise above the baseline models that were previously trained using hand-crafted features. Latin texts were the center of attention in most of these works and did not profoundly investigate the field of scene text recognition for non-Latin languages. Upon scrutiny, we observe that the performance of the current best recognition models has reached above 90% over scene text benchmark datasets. However, these recognition models do not perform as well on non-Latin languages as they did on Latin (or English) datasets. This striking difference in the performances over different languages is a rising concern among the researchers focusing on lowresource languages, and it is indeed the motivation behind our work. Scene text recognition in low-resource non-Latin languages is difficult and challenging due to the inherent complex scripts, multiple writing systems, various fonts and orientations. Despite such differences, we can also achieve Latin (English) text-like performance for low-resource non-Latin languages. In this thesis, we look at all the parameters involved in the process of text recognition and determine the importance of those parameters through thorough experiments. We use synthetic data for controlled experiments where we test the parameters as mentioned earlier in an isolated fashion to effectively identify the catalysts of text recognition. We analyse the complexity of the scripts via these synthetic data experiments. We present the results of our experiments on two baseline models, CRNN and STAR-Net models, on available datasets to ensure generalisability. In addition to this, we also propose an error correction module for correcting the labels by utilizing the training data of real test datasets. To further improve the results on real test datasets, we propose transfer learning from English to exploit the abundant data that is available for learning. We show that the transfer from English is not feasible, and it actually lowers the performance of the individual language models. Due to the failure of English transfer experiments, we shift our focus onto just the Indian languages and examine the characteristics of each language via character n-gram plots, visual features like vowel signs, conjunct characters and other word statistics. They also share a resemblance to each other concerning certain other factors. We then propose to apply transfer learning across languages to enhance the performance of the language models. We depict the improvement on real datasets because of the transfers among Indian languages that are visually closer or sometimes better than the individual models. The transfers among languages prove to be much more profitable than transfers from English. We comprehend the significance of the variety and number of fonts during data generation via synthetic data experiments on English test datasets. Synthetic data embodies various fonts to ensure diversity of data to create robust recognition systems. In order to strengthen data diversity, we incorporate over 500 Hindi fonts (including Unicode and non-Unicode fonts) into the synthetic data for improved performance on Hindi real test datasets. We also manifest the process to utilize and incorporate nonUnicode fonts of Indian languages into the training process error-free. In addition to these fonts, we make specific changes to encompass an augmentation pipeline that adds to the diversity of data. We utilize more than nine augmentation techniques to boost the performance of Hindi STR systems. We achieved significant improvements over previous works with our evaluations over natural settings. Through our experiments, we set new benchmark accuracies for STR on Hindi, Telugu, and Malayalam languages from the IIIT-ILST dataset by gaining 6%, 5%, and 2% gains in Word Recognition Rates (WRRs) compared to previous works. Similarly, we also achieved a 23% improvement in WRR for the Bangla language from the MLT-17 dataset. We further improve this result by incorporating the error correction module as mentioned above into the training pipeline. In addition to this, we also released two STR datasets for Gujarati and Tamil datasets, containing 440 scene images, further divided into 500 Gujarati and 2535 Tamil cropped word images. We report a 5% and 3% gain in WRR over our baseline models for Gujarati and Tamil, respectively. We also establish benchmark results for MLT-19 and Bangla datasets with 8% and 4% improvements in WRRs over baselines. Further enriching the synthetic dataset with non-Unicode fonts and multiple augmentations helps us achieve a remarkable Word Recognition Rate gain of over 33% on the IIIT-ILST Hindi dataset. Additionally, we implement a lexicon-based transcription approach that utilizes a dynamic lexicon for each image while testing and presenting the results for languages mentioned above. Keywords – Scene text recognition · transfer learning · photo OCR · multilingual OCR · Indian Languages · Indic OCR · Synthetic Data · Data Diversity

          Year of completion:  June 2022
           Advisor : C V Jawahar

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Summarizing Day Long Egocentric Videos
            2. Counting in the 2020s: Binned Representations and Inclusive Performance Measures for Deep Crowd Counting Approaches
            3. Deep Learning Methods for 3D Garment Digitization
            4. Skeleton-based Action Recognition in Non-contextual, In-the-wild and Dense Joint Scenarios
            • Start
            • Prev
            • 3
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9
            • 10
            • 11
            • 12
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.