Thesis Students

Weakly supervised explanation generation for computer aided diagnostic systems

Aniket Joshi

Abstract

Computer Aided Diagnosis (CAD) systems are developed to aid doctors and clinicians in diagnosis after interpreting and examining a medical image. CAD systems aids in performing the task more consistently. With the arrival of data-driven deep learning paradigm and availability of large amount of data in the medical domain, CAD systems are being developed to diagnose a large variety of diseases ranging from different types of cancers, heart and brain diseases, Alzheimer’s disease, and diabetic retinopathy, etc. These systems are highly competent in performing the task on which they are trained. Although such systems perform at par with the trained clinicians, they suffer from a limitation in that they are completely black box in nature and are trained only on image-level class labels. This poses a problem in deploying CAD systems as stand alone solutions for disease diagnosis. This is because decisions in the medical domain are about health of a patient and are well reasoned and backed by evidence, sometimes from multiple modalities. Hence, there is a critical need for CAD system’s decisions to be explainable. Restricting our focus to only image modality, a solution to design explainable CAD systems could be to train the system using both class labels and local annotations and derive the explanation in a fully supervised manner. However, getting these local annotations is very expensive, time-consuming, and infeasible in most circumstances. In this thesis we address this explainability and data scarcity problem and propose two different approaches towards the development of weakly supervised explainable CAD systems. Firstly, we aim to explain the classification decision by providing heatmaps denoting important regions of interest in the image, which helped the model make the prediction. In order to generate anatomically accurate heatmaps, we provide a mixed set of annotations to our model - class labels for the entire training set of images and rough localization of suspect regions for a smaller subset of images in the training set. The proposed approach is illustrated on two different disease classification tasks based on disparate image modalities - Diabetic macular edema (DME) classification from OCT slices and Breast Cancer detection from mammographic images. Good classification results are shown on public datasets, supplemented by explanations in the form of suspect regions; these are derived using just a third of images with local annotations, emphasizing the potential for generalisability of the proposed solution.

Year of completion:	November 2021
Advisor :	Jayanthi Sivaswamy

Related Publications

Downloads

Improving the Efficiency of Fingerprint Recognition Systems

Additya Popli

Abstract

Humans have used different characteristics to identify each-other since early times. This practice of identification based on person-specific features called biometric traits has developed over time to use more sophisticated techniques and characteristics like fingerprints, irises and gait in order to improve the identification performance. Fingerprints due to their distinctiveness, persistence over time and ease of capture, have become of the most widely use biometric traits for identification. However, with this ever increasing dependence on fingerprint biometrics, it is very important to ensure the safety of these recognition system against potential attackers. One of the most common and successful ways to circumvent these systems is through the use of fake or artificial fingers synthesized using commonly available materials like silicon and clay to match the real fingerprint of any particular person. Most fingerprint recognition systems employ a spoof detection module to filter out these fake fingerprints. While they seem to work well in general, it is a well-established fact that spoof detectors are not able to identify spoof fingerprints synthesized using ”unseen” or ”novel” spoof materials, i.e, the materials which were not available during the training phase of the detector. While it is possible to synthesize a few fingers using the various available materials, present-day spoof detectors require a large amount of samples for their training, which is practically not feasible due to the high cost and high complexity of fabrication of spoof fingers. In this thesis, we propose a method for creating artificial fingerprint images using only a very limited number of artificial fingers created from a specific material. We train a style-transfer network using available spoof fingerprint images which learns to extract material properties from the image, and then for each material, uses the limited set of spoof fingerprint images to generate a huge dataset of artificial fingerprint images without actually fabricating spoof fingers. These artificial fingerprint images can then be utilised by the spoof detector for training. Through our experiments, we show that the use of these artificially generated spoof images for training can improve the performance of existing spoof detectors over unseen spoof materials. Another major limitation of present-day recognition systems is their high resource requirements. Most fingerprint recognition systems use a spoof detector as a separate system either in series or in parallel with a fingerprint matcher leading to very high memory and time requirements during inference. To overcome this limitation, we explore the relationship between these two tasks in order to develop a common module capable of performing both spoof detection and matching. Our experiments show a high level of correlation between the features extracted for spoof detection and matching. We propose a new joint model which achieves similar fingerprint spoof detection and matching performance on various datasets as current state-of-the-art methods while using 50% less time and 40% less memory, thus providing a significant advantage for recognition systems deployed on resource-constrained devices like mobile phones.

Year of completion:	December 2021
Advisor :	Anoop M Namboodiri

Related Publications

Downloads

Interactive Layout Parsing of Highly Unstructured Document Images

Abhishek Trivedi

Abstract

Ancient historical handwritten documents were one of the earliest forms of written media and contributed to the most valuable cultural and natural heritage of many countries globally. They hold early written knowledge about subjects like science, medicine, Buddhist doctrines and astrology. India has the most extensive collection of manuscripts and studies have been conducted on their digitization for passing on their wealth of wisdom to future generations. Targeted annotation systems for automatic multi-region instance segmentation of their document images exist but with relatively inferior quality layout prediction compared to their human-annotated counterparts. Precise boundary annotations of image regions in historical document images are crucial for downstream applications like OCR, which rely on region-class semantics. Some document collections contain densely laid out, highly irregular, and overlapping multi-class region instances with a large range in aspect ratio. Addressing this, a web-based layout annotation and analytics system is proposed in this thesis. The system, called HInDoLA, features an intuitive annotation GUI, a graphical analytics dashboard, and interfaces with machine-learning-based intelligent modules on the backend. HInDoLA has successfully helped us create the first-ever large-scale dataset for layout parsing of Indic palm-leaf manuscripts named Indiscapes. Keeping the non-technical nature of domain experts in mind, the tool offers an interactive and relatively fast annotation process with the help of two modes, namely Fully Automatic mode and Semi-Supervised Intelligent Mode. We then discuss the semi-supervised approach superiority over fully automatic approaches for Historical document annotation. Fully automatic boundary estimation approaches tend to be data-intensive, cannot handle variable-sized images, and produce sub-optimal results for images mentioned above. BoundaryNet, a novel resizing-free approach for high-precision semi-automatic layout annotation, is another main contribution from this thesis. An attention-guided skip network first processes the variablesized user-selected region of interest. The network optimization is guided via Fast Marching distance maps to obtain a good quality initial-boundary estimate and an associated feature representation. These outputs are processed by a Residual Graph Convolution Network optimized using Hausdorff loss to get the final region boundary. A challenging image manuscript dataset demonstrates that BoundaryNet outperforms solid baselines and produces high-quality semantic region boundaries. Qualitatively, our approach generalizes across multiple document image datasets containing different script systems and layouts, all without additional fine-tuning.

Year of completion:	May 2022
Advisor :	Ravi Kiran Sarvadevabhatla

Related Publications

Downloads

Pose Based Action Recognition: Novel Frontiers for Fully Supervised Learning and Language aided Generalised Zero-Shot Learning

Pranay Gupta

Abstract

Action recognition is indispensable not only to the umbrella field of computer vision but in multitudes of allied fields such as video surveillance, human computer interaction, robotics and human robot interaction. Typically action recognition is performed over RGB videos, however, in recent years skeleton action recognition has also gained a lot of traction. Much of it is owed to the development of frugal motion capture systems, which enabled the curation of large scale skeleton action datasets. In this thesis, we focus on skeleton-based human action recognition. We begin our explorations with skeleton-action recognition in the wild by introducing Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. We also introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. We benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. The results from benchmarking the top performers of NTU-120 on the newly introduced datasets reveal the challenges and domain gap induced by actions in the wild. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. Via the introduced datasets, our work enables new frontiers for human action recognition. When moving ahead from the traditional supervised recognition to the more challenging zero-shot recognition, the language component of the action name becomes important. Focusing on this, we introduce SynSE, a novel syntactically guided generative approach for Zero-Shot Learning (ZSL). Our end-to-end approach learns progressively refined generative embedding spaces constrained within and across the involved modalities (visual, language). The inter-modal constraints are defined between action sequence embedding and embeddings of Parts of Speech (PoS) tagged words in the corresponding action description. We deploy SynSE for the task of skeleton-based action sequence recognition. Our design choices enable SynSE to generalize compositionally, i.e., recognize sequences whose action descriptions contain words not encountered during training. We also extend our approach to the more challenging Generalized Zero-Shot Learning (GZSL) problem via a confidence-based gating mechanism. We are the first to present zero-shot skeleton action recognition results on the large scale NTU-60 and NTU-120 skeleton action datasets with multiple splits. Our results demonstrate SynSE’s state of the art performance in both ZSL and GZSL settings compared to strong baselines on the NTU-60 and NTU-120 datasets. 3-D virtual avatars are extensively used in gaming, educational animation and physical exercise coaching applications. The development of these visually interactive systems relies heavily of how humans perceive the actions performed by the 3-D avatars. To this end, we perform a short user-study to gain insights into the recognizability of human actions performed virtually. Our results reveal that actions performed by 3-D avatars are significantly easier to recognize as compared to those performed 3-D skeletons. Concrete actions, i.e actions which can only be performed using a fixed set of movements are recognized more quickly and accurately as compared to abstract actions. Overall, in this thesis we study various new-frontiers in skeleton action recognition by means of novel datasets and tasks. We drift from unimodal approaches to a multimodal setup by incorporating language in a unique syntactically aware fashion with hopes of utilizing similar ideas in more challenging problems like skeleton action generation.

Year of completion:	May 2022
Advisor :	Ravi Kiran Sarvadevabhatla

Related Publications

Downloads

Exploring Data Driven Graphics for Interactivity

Aakash KT

Abstract

The goal of rendering in computer graphics is the synthesis of photorealistic images from abstract virtual scene descriptions. Among the many rendering methods, path tracing along with physically accurate material modeling has emerged as the industry standard for photorealistic image synthesis. Path tracing is, however a computationally expensive operation. Moreover, physically accurate material modeling requires iterative parameter tuning and visualization using path tracing. This quickly becomes a bottleneck for the tuning of a large number of parameters found in most physically accurate material models, and artists end up spending a lot of time waiting for path tracing to converge. This dissertation proposes to leverage the learning power of neural networks to plausibly approximate path tracing for: (a) quick material visualization while editing, (b) out-of-the box material recovery using inverse rendering. The traditional workflow for tuning material parameters and visualization is typically followed by modifications to lighting for better understanding the behaviour of the material. We thus incorporate the ability to modify the environment lighting in our neural material visualization framework. This significantly aids in understanding the behaviour of the material thus aiding the process of parameter tuning, which we demonstrate with a user study. Most real-world materials are spatially varying, which is that their behaviour varies across the surface of the geometry. While previous works in this area dealt with uniform materials, we propose improvements for handling spatially varying materials. We design our neural network architecture to be light-weight, having 10× lesser trainable parameters than the state-of-the-art, while also producing better visualizations. We also build a tool to demonstrate real-time parameter tuning and visualization of materials. Finally, we extend the above work to handle general 3D scenes in specific lighting scenarios. This is done by training a small fully connected neural network on a simple scene in the local tangent frame. Training in the local tangent frame decouples geometric complexity from training which implies that the network generalizes to any geometry. We embed this network in a standard path tracer to evaluate direct illumination at each path vertex. We thus not only obtain noise free direct illumination but also out-of-the box gradients of the rendering process, which can be used for inverse rendering. We show that these gradients can be used for plausibly recovering the material of a target scene. We conclude by discussing various practical aspects of our method.

Year of completion:	April 2022
Advisor :	P J Narayanan

Weakly supervised explanation generation for computer aided diagnostic systems

Aniket Joshi

Abstract

Related Publications

Downloads

Improving the Efficiency of Fingerprint Recognition Systems

Additya Popli

Abstract

Related Publications

Downloads

Interactive Layout Parsing of Highly Unstructured Document Images

Abhishek Trivedi

Abstract

Related Publications

Downloads

Pose Based Action Recognition: Novel Frontiers for Fully Supervised Learning and Language aided Generalised Zero-Shot Learning

Pranay Gupta

Abstract

Related Publications

Downloads

Exploring Data Driven Graphics for Interactivity

Aakash KT

Abstract

Related Publications

Downloads

More Articles …