CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Towards developing a multiple modality fusion technique for automatic detection of Glaucoma


Divya Jyothi Gaddipati

Abstract

Glaucoma is a major eye disease which when untreated, can gradually lead to irreversible loss in vision. The underlying causes are a loss of retinal nerve fibres (resulting in a thinning of the layer and enlargement of the optic cup) and peripapillary atrophy. Since these occur without any sign of symptoms in the initial stages, it is difficult to diagnose the disease in the early stages. Hence, the development of Computer-aided diagnostics (CAD) systems for early detection and treatment of the disease has attracted the attention of many medical experts and researchers alike. Optical Coherence Tomography (OCT) and Fundus photography are two widely used retinal imaging techniques for obtaining the structural information of the eye which helps to analyze and detect the diseases. Existing automated systems rely largely on fundus images for assessment of glaucoma due to their fast acquisition and cost. OCT images provide vital and unambiguous information for understanding the changes occurring in the retina, specifically related to the retinal nerve fiber layer and the optic nerve head which are essential for disease assessment. However, the high cost of OCT is a deterrent for deployment in screening at large scale. Hence, the focus of this thesis is to investigate the potential of integrating the two retinal imaging techniques which provide complementary information of the eye for developing automatic glaucoma screening system. Firstly, we propose a deep learning approach directly operating on 3D OCT volumes for glaucoma assessment which showed promising results, thus demonstrating the prominence of the highly discriminative features learnt from OCT for automated glaucoma detection. Next, we present a novel CAD solution wherein both OCT and fundus modality images are leveraged to learn a model that can perform a mapping of fundus to the OCT feature space. We show how this model can be subsequently used to detect glaucoma given an image from only one modality (fundus), thus enabling the automated screening operation to be executed on a large scale. The results show that fundus to OCT feature space mapping is an attractive option for glaucoma detection

Year of completion:  February 2020
 Advisor : Jayanthi Sivaswamy

Related Publications


    Downloads

    thesis

    Deep-Learning Features, Graphs and Scene Understanding


    Abhijeet Kumar

    Abstract

    Scene Understanding has been a major aspiration of computer vision from its early days. Its root lies in enabling the computer/robot/machine to understand, interpret and manipulate visual data, in similarity to what an average human eye does in front of a natural/artificial localized location/scene. This ennoblement of the machine have a widespread impact ranging from Surveillance, Aerial Imaging, Autonomous Navigation, Smart Cities and thus scene understanding have remained as an active area of research in the last decade. In the last decade, the scope of problems in the scene understanding community has broadened from Image Annotation, Image Captioning, Image Segmentation to Object Detection, Dense Image Captioning, Instance Segmentation etc. Advanced problems like Autonomous Navigation, Panoptic Segmentation, Video Summarization, Multi-Person Tracking in Crowded Scenes have also surfaced in this arena and are being vigorously attempted. Deep Learning has played a major role in this advancement/development. The performance metrics in some of these tasks have more than tripled in the last decade itself but these tasks remain far from solved. Success originating from deep learning can be attributed to the learned features. In simple words, features learned from a Convolutional Neural Network trained for annotation are in general far more suited for captioning then a non-deep learning method trained for captioning. Taking cue from this particular deep learning trend, we dived into the domain of scene understanding with the focus on utilization of prelearned-features from other similar domains. We focus on two tasks in particular: Automatic (multi-label)Image Annotation and (Road)Intersection Recognition. Automatic image annotation is one of the earliest problems in scene understanding and refers to the task of assigning (multiple) labels to an image based on its content. Whereas intersection recognition is the outcome of the new era of problems in scene understanding and it refers to the task of identifying an intersection from varied viewpoints in varied weather and lighting conditions. We focused on this significantly varied task approach to broaden the scope and generalizing capability of the results we compute. Both image annotation and intersection recognition pose some common challenges such as occlusion, perspective variations, distortions etc. While focusing on the image annotation task we further narrowed our domain by focusing on graph based methods. We again chose two different paradigms: a multiple kernel learning based non-deep learning approach and a deep-learning based approach, with a focus on bringing out contrast again. Through quantitative and qualitative results we show slightly boosted performance from the above mentioned paradigms. The intersection recognition task is relatively new in the field. Most of the work in field focuses on Places Recognition which utilized only single images. We focus on temporal information i.e. the traversal of the intersection/places as seen from a camera mounted on a vehicle. Through experiments we show a performance boost in intersection recognition from the inclusion of temporal information

    Year of completion:  January 2020
     Advisor : Avinash Sharma

    Related Publications


      Downloads

      thesis

      Computational Imaging Techniques to Recover Omni3D Structure & Surface Properties


      Rajat Aggarwal

      Abstract

      The process of imaging converts a 3D world into a 2D image. This process is inherently lossy, making the problem of understanding the world, ill-posed. During the imaging process, a light ray emanating from a source bounces of different surfaces, interacting with them according to the surface properties (albedo, color, specularity) before reaching the camera. The structure that we see in an image is primarily based on the final surface from which the light was reflected, except when that surface is highly specular (mirror-like). In this work, we explore imaging techniques the recover both the structure and surface properties of a scene. For structure recovery, we extend the idea of stereo imaging and present a practical solution to capture a complete 360◦ panorama using a single camera. Current approaches either use a moving camera for capturing multiple images of a scene, which are then stitched together to form the final panorama, or use multiple cameras that are synchronized. A moving camera limits the solution to static scenes, while multi-camera solutions require dedicated calibrated setups. Our approach improves upon the existing solutions in two significant ways: It solves the problem using a single camera, thus minimizing the calibration problem and providing us the ability to convert any digital camera into a stereo panoramic capture device. It captures all the light rays required for stereo panoramas in a single frame using a compact custom designed mirror, thus making the design practical to manufacture and easier to use. We analyze the optimality of the design as well as present panoramic stereo and depth estimation results. The methods for structure recovery, including stereo are often fooled when the surface is highly specular. To alleviate this, we propose an active-illumination based method to detect and segment mirror-like surfaces in a scene. In computer vision, many active illumination techniques employ Projector-Camera systems to extract useful information from the scenes. Known illumination patterns are projected onto the scene and their deformations in the captured images are then analyzed. We observe that the local frequencies in the captured pattern for the mirror-like surfaces is different from the projected pattern. This property allows us to design a custom Projector-Camera system to segment mirror-like surfaces by analyzing the local frequencies in the captured images. The system projects a sinusoidal pattern and capture the images from projector’s point of view. We present segmentation results for the scenes including multiple reflections and inter-reflections from the mirror-like surfaces. The method can further be used in the separation of direct and global components for the mirror-like surfaces by illuminating the non-mirror-like objects separately. We show how our method is also useful for accurate estimation of shape of the non-mirror-like regions in the presence of mirror-like regions in a scene.

      Year of completion:  November 2019
       Advisor : Anoop M Namboodiri

      Related Publications


        Downloads

        thesis

        Driver Attention Monitoring using Facial Features


        Isha Dua

        Abstract

        How can we assess the quality of human driving using AI? Driver inattention is one of the leading causes of vehicle crashes and incidents worldwide. Driver inattention includes driver fatigue leading to drowsiness and driver distraction, say due to the use of cellphone or rubbernecking, all of which leads to a lack of situational awareness. Hitherto, techniques presented to monitor driver attention evaluated factors such as fatigue and distraction independently. However, to develop a robust driver attention monitoring system, all the factors affecting a driver’s attention needs to be analyzed holistically. In this thesis, we present two novel approaches for driver attention analysis on the road using driver video and fusion of driver and road video.
        In the first approach, we propose the driver attention rating system that leverages the front camera of a windshield-mounted smartphone to monitor the driver attention by combining several features. We derive a driver attention rating by fusing spatio-temporal features based on the driver state and behavior such as head pose, eye gaze, eye closure, yawns, use of cellphones, etc. We present a few architec- tures for feature aggregation like AutoRate and Attention-based AutoRate. We perform an extensive evaluation of feature aggregation networks on real-world driving data and also data from controlled, static vehicle settings with 30 drivers in a large city. We compare the proposed method’s automatically- generated rating with the scores given by 5 human annotators. We introduce the kappa coefficient, an evaluation metric to compute the inter-rater agreement between the generated rating and the rating pro- vided by human annotators. We observe that Attention-based AutoRate outperforms other proposed designs for feature aggregation by 10%. Further, we use the learned temporal and spatial attention to visualize the key frame and the key action, which justifies the model’s predicted rating. Finally, to pro- vide driver-specific results, we fine-tune the Attention-based AutoRate model using the specific driver data to give personalized driver experience.

        Year of completion:  June 2020
         Advisor : Prof. C.V. Jawahar

        Related Publications


          Downloads

          thesis

          An investigation of the annotated data sparsity problem in the medical domain


          Pujitha Appan Kandala

          Abstract

          Diabetic retinopathy (DR) is the most common eye disease in people with diabetes. It affects them for significant number of years and can also lead to permanent blindness if left untreated. Early detection and treatment of DR is of utmost importance for the prevention of blindness. Hence, automatic disease detection and classification have been attracting much interest. High performance is critical in adoptionof such systems, which generally rely on training with a wide variety of annotated data. Availability of such varied annotated data in medical imaging is very scarce. The main focus of this thesis is to deal with the sparsity of annotated data and develop computer-aided diagnostic CAD systems which take less annotated data and yet give high accuracies. We propose three different solutions to address this problem. First, we propose a semi-supervised framework which paves way for including unlabeled data in training. A co-training framework is used in which features are extracted from a limited training set and independent models are learnt on each of the features, later the models are used to predict labels for new data. The highly confident labelled images from unlabelled set are added back to the training set and the process is continued, thus expanding the number of known labels. This framework is showcased on retinal neovascularization (NV) which is a critical stage of proliferative DR. The analysis of the results for detection of NV showed that an AUC of 0.985 with sensitivity of 96.2% at specificity of 92.6% which were superior to the existing models. Secondly, we propose crowdsourcing as a solution where we obtain annotations from a crowd and use them for training after refining. We employ a strategy to refine/overcome the noisy nature of crowdsourced annotations by i) assigning a reliability factor for each subject of the crowd based on their performance (at global and local levels) and experience and ii) requiring region of interest (ROI) markings rather than pixel-level markings from the crowd. We also show that these annotations are reliable by training a deep neural net (DNN) for detection of hard exudates which occur in mild non-proliferative DR. Experimental results obtained for hard exudate detection showed that training with refined crowdsourced data is effective as detection performance improves by 25% over training with just expert-markings. Lastly, we explore synthetic data generation as a solution to address this problem. We propose a novel method, based on generative adversarial networks (GAN), to generate images with lesions such that the overall severity level can be controlled. We showcase this approach for hard exudate and haemorrhage detection in retinal images with 4 levels of severity. These vary from mild to severe non-proliferativeDR. The synthetic data were also shown to be reliable for developing a CAD system for DR detection. Hard exudate/ haemorrhage detection was found to improve with inclusion of synthetic data in thetraining set with improvement in sensitivity of about 25% over training with just expert marked data.

           

          Year of completion:  November 2018
           Advisor : Jayanthi Sivaswamy

          Related Publications


            Downloads

            thesis

            More Articles …

            1. Document Image Quality Assessment
            2. Computational Video Editing and Re-editing
            3. Geometric + Kinematic Priors and Part-based Graph Convolutional Network for Skeleton-based Human Action Recognition
            4. Towards Data-Driven Cinematography and Video Retargeting using Gaze
            • Start
            • Prev
            • 17
            • 18
            • 19
            • 20
            • 21
            • 22
            • 23
            • 24
            • 25
            • 26
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.