CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Visual Perception Based Assistance for Fundus Image Readers


Rangrej Bharat

Abstract

Diabetic Retinopathy (DR) is a condition where individuals with diabetes develop a disease in the inner wall of eye known as retina. DR is a major cause of visual impairments and early detection can prevent vision loss. Use of automatic systems for DR diagnosis is limited due to their lower accuracy. As an alternative, reading-centers are becoming popular in real-world scenarios. Reading center is a facility where retinal images coming from various sources are stored and trained personals(who might not be experts) analyze them. In this thesis we look at techniques to increase efficiency of DR image-readers working in reading centers. The first half of this thesis aims at identifying efficient image-reading technique which is both fast and accurate. Towards this end we have conducted an eye-tracking study with medical experts while they were reading images for DR diagnosis. The analysis shows that experts employ mainly two types of reading strategies: dwelling and tracing. Dwelling strategy appears to be accurate and faster than tracing strategy. Eye movements of all the experts are combined in a novel way to extract an optimal image scanning strategy, which can be recommended to image-readers for efficient diagnosis. In order to increase the efficiency further, we propose a technique where saliency of lesions can be boosted for better visibility of lesions. This is named as an Assistive Lesion Emphasis System(ALES) and demonstrated in the second half of the thesis. ALES is developed as a two stage system: saliency detection and lesion emphasis. Two biologically inspired saliency models, which mimic human visual system, are designed using unsupervised and supervised techniques. Unsupervised saliency model is inspired from human visual system and achieved 10% higher recall than other existing saliency models when compared with average gazemap of 15 retinal experts. Supervised saliency model developed as deep learning based implementation of biologically inspired saliency model proposed by Itti-Koch(Itti, L., Koch, C. and Niebur, E., 1998) and achieves 10% to 20% higher AUC compared to existing saliency model when compared with manual markings. Saliency maps generated by these models are used to boost the prominence of lesions locally. This is done using two types of selective enhancement techniques. One technique uses multiscale fusion of saliency map with original image and other uses spatially varying gamma correction; both increases CNR of lesions by 30%. One saliency model and one selective-enhancement technique are clubbed together to illustrate two complete ALESs. DR diagnosis done by analyzing ALES output using optimal strategy should presumably be faster and accurate.

 

Year of completion:  July 2017
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Karthik G, Rangrej S and Jayanthi Sivaswamy - A Deep Learning Framework for Segmentation of Retinal Layers from OCT Images ACPR, Nanjing, 2017 [PDF]

  • Rangrej Bbharat and Jayanthi Sivaswamy - ALES: an assistive system for fundus image readers Journal of Medical Imaging, 4, Issue: 02, April 2017 [PDF]

  • Samrudhdhi B. Rangrej and Jayanthi Sivaswamy - A Biologically Inspired Saliency Model for Color Fundus Images, Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing. ACM, 2016. [PDF]


Downloads

thesis

A design for an automated Optical Coherence Tomography analysis system


Karthik Gopinath

Abstract

Age related macula degeneration (AMD), Cystoid Macular Edema (CME) and glaucoma are retinal diseases that affects vision and often lead to irreversible blindness. Many imaging modalities have been used for screening of retinal diseases. Optical coherence tomography (OCT) is one such imaging modality that provides structural information of retina. Optical coherence tomography angiography (OCTA) also provides vascular information of retina in addition to structural information. With advancement in OCT and OCTA technology, the number of patients being scanned using these modalities is increasing exponentially. Manual analysis of vast data often makes human experts feel fatigue. Also, this extends the time to treat any patient and creates a demand to develop a fast and accurate automated OCT image analysis system. The system proposed in this thesis aims at analysing retinal anatomy and its diseases. We approach the problem of automatically analysing the retinal anatomy by segmenting its layers using a deep learning framework. In literature these algorithms usually requires preprocessing steps, such as denoising, image flatenning and edge detection, all of which involve separate parameter tunings. We propose a deep learning technique to automate all these steps and handle the presence/absence of pathologies. This model consists of a combination of Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The CNN is used to extract layers of interest image and extract the edges, while the LSTM is used to trace the layer boundary. This model is trained on a mixture of normal and AMD cases using minimal data. Validation results on three public datasets show that the pixel-wise mean absolute error obtained with our system is 1.30 ± 0.48 which is lower than inter-marker error of 1.79 ± 0.76. The performance of the proposed module is also on par with the existing methods. We next propose three modules for disease analysis, first for diagnosing them at volume level, second at slice level and finally at pixel level by segmenting the abnormality. Firstly, we propose a novel method for glaucoma assessment at volume level using data from OCTA modality. The goal of this module is to gain insights about glaucoma indicators from both structural information (OCT volume) and vascular information (angioflow images). The proposed method achieves a mean sensitivity of 94% specificity of 91% and accuracy of 92% on 49 normal and 18 glaucomatous volumes. Secondly, for slice level disease(AMD and CME) classification from OCT volumes, we learn a decision boundary using a CNN in the new extremal representation space ofan image. Evaluating on four publically available datasets with training set consisting of 3500 OCT slices and test set having 1995 OCT slices, this module achieves a mean sensitivity of 96%. specificity of 97% and accuracy of 96%. Finally, we propose an automated cyst segmentation algorithm from OCT volumes. We propose a biologically inspired method based on selective enhancement of the cysts, by inducing motion to a given OCT slice. A CNN is designed to learn a mapping function that combines the result of multiple such motions to produce a probability map for cyst locations in a given slice. The final segmentation of cysts is obtained via a simple clustering of the detected cyst locations. The proposed method is trained on OPTIMA Cyst segmentation challenge (OCSC) train test and achieves a mean Dice Coefficient (DC) of 0.71, 0.69 and 0.79 on the OCSC test set, DME dataset and AEI dataset respectively. Thus, the complete system can be employed in screening scenarios to aid retinal experts in accurate and faster analysis of retinal anatomy and disease.

 

Year of completion:  July 2017
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Karthik G and Jayanthi Sivaswamy - Segmentation of retinal cysts from Optical Coherence Tomography volumes via selective enhancement IEEE Journal of Biomedical Health Informatics 2018 [PDF]

  • Karthik G, Rangrej S and Jayanthi Sivaswamy - A Deep Learning Framework for Segmentation of Retinal Layers from OCT Images ACPR, Nanjing, 2017 [PDF]

  • Karthik Gopinath, Jayanthi Sivaswamy and Tarannum Mansoori - Automatic Glaucoma Assessment from Angio-OCT Images Proc. of IEEE International Symposium on Bio-Medical Imaging(ISBI), 2016, 13 - 16 April, 2016, Prague. [PDF]


Downloads

thesis

Active Learning & its Applications


Priyam Bakliwal 

Abstract

Active learning also known as query learning is a sub-field of machine learning. It relies on the assumption that if the learning algorithm is allowed to choose the data from which it learns, it will perform better with less training. Active Learning is predominantly used in areas where getting a large amount of annotated data for training is not feasible or extremely expensive. Active learning models aims to overcome the annotation bottleneck by asking queries in the form of unlabelled instances to be labelled by a human. In this way, the framework aims to achieve high accuracy using very less labelled instances resulting in minimization of annotation cost. In the first part of our work, we propose an Active Learning based Image Annotation model. Automatic image annotation is the computer vision task of assigning a set of appropriate textual tags to a novel image. The aim is to eventually bridge the semantic gap of visual and textual representations with the help of these tags. The advantages of the proposed model includes: (a). It is able to output the variable number of tags for images which improves the accuracy. (b). It is effectively able to choose the difficult samples that needs to be manually annotated and thereby reducing the human annotation efforts. Studies on Corel and IAPR TC-12 datasets validate the effectiveness of this model. In the second part of the thesis, we propose an active learning based solution for efficient, scalable and accurate annotations of objects in video sequences. We focus on reducing the human annotation efforts with simultaneous increase in tracking accuracy to get precise, tight bounding boxes around an object of interest. We use a novel combination of two different tracking algorithms to track an object in the whole video sequence. We propose a sampling strategy to sample the most informative frame which is given for human annotation. This newly annotated frame is used to update the previous annotations. Thus, by collaborative efforts of both human and the system we obtain accurate annotations with minimal effort. We have quantitatively and qualitatively validated the results on eight different datasets. Active Learning is efficient in Natural Language documents as well. Multilingual processing tasks like statistical machine translation and cross language information retrieval rely mainly on availability of accurate parallel corpora. In the third section we propose a simple yet efficient method to generate huge amount of reasonably accurate parallel corpus using OCR with minimal user efforts. We show the performance of our proposed method on a manually aligned dataset of 300 Hindi-English sentences and 100 English-Malayalam sentences. In the last section we utilised Active Learning for model updation in c QA system. Community Question Answering(c QA ) platforms like Yahoo! Answers, Baidu Zhidao, Quora, StackOverflow etc. provides experts to give precise and targeted answers to any question posted by a user. These sites form huge repositories of information in the form of questions and answers. Retrieval of semantically relevant questions and answers from c QA forums have been an important research area for the past few years. Considering the ever growing nature of the data in c QA forums, these models cannot be kept stagnant. They need to be continuously updated so that they can adapt to the changing patterns of Questions-Answers with time. Such updation procedures are expensive and time consuming. We propose a novel Topic model based active sampler named Picky. It intelligently selects a smaller subset of the newly added Question-Answer pairs to be fed to the existing model for updating it. Evaluations on real life c QA datasets show that our approach converges at a faster rate, giving comparable performance to other baseline sampling strategies updated with data of ten times the size.

 

Year of completion:  July 2017
 Advisor : C V Jawahar

Related Publications

  • Priyam Bakliwal, Guruprasad M. Hegde and C.V. Jawahar - Collaborative Contributions for Better Annotations VISIGRAPP (6: VISAPP). 2017. [PDF]

  • Priyam Bakliwal, Devadath V V and C.V. Jawahar - Align Me: A framework to generate Parallel Corpus Using OCRs & Bilingual Dictionaries WSSANLP 2016 (2016): 173. [PDF]

  • Priyam Bakliwal, C.V. Jawahar - Active Learning Based Image Annotation Proceedings of the Fifth National Conference on Computer Vision Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2015), 16-19 Dec 2015, Patna, India. [PDF]


Downloads

thesis

Population specific template construction and brain structure segmentation using deep learning methods


Raghav Mehta 

Abstract

A brain template, such as MNI152 is a digital (magentic resonance image or MRI) representation of the brain in a reference coordinate system for the neuroscience research. Structural atlases, such as AAL and DKA, delineate the brain into cortical and subcortical structures which are used in Voxel Based Morphometry (VBM) and fMRI analysis. Many population specific templates, i.e. Chinese, Korean, etc., have been constructed recently. It was observed that there are morphological differences between the average brain of the eastern and the western population. In this thesis, we report on the development of a population specific brain template for the young Indian population. This is derived from a multi-centeric MRI dataset of 100 Indian adults (21 - 30 years old). Measurements made with this template indicated that the Indian brain, on average, is smaller in height and width compared to the Caucasian and the Chinese brain. A second problem this thesis examines is automated segmentation of cortical and non-cortical human brain structures, using multiple structural atlases. This has been hitherto approached using computationally expensive non-rigid registration followed by label fusion. We propose an alternative approach for this using a Convolutional Neural Network (CNN) which classifies a voxel into one of many structures. Evaluation of the proposed method on various datasets showed that the mean Dice coefficient varied from 0.844±0.031 to 0.743±0.019 for datasets with the least (32) and the most (134) number of labels, respectively. These figures are marginally better or on par with those obtained with the current state of the art methods on nearly all datasets, at a reduced computational time. We also propose an end-to-end trainable Fully Convolutional Neural Network (FCNN) architecture called the M-net, for segmenting deep (human) brain structures. A novel scheme is used to learn to combine and represent 3D context information of a given slice in a 2D slice. Consequently, the M-net utilizes only 2D convolution though it operates on 3D data. Experiment results show that the M-net outperforms other state-of-the-art model-based segmentation methods in terms of dice coefficient and is at least 3 times faster than them.

 

Year of completion:  July 2017
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Jayanthi Sivaswamy, Thottupattu AJ , Mehta R, Sheelakumari R and Kesavadas C - Construction of Indian Human Brain Atlas, Neurology India (To appear).[PDF]

  • Majumdar A, Mehta R and Jayanthi Sivaswamy - To Learn Or Not To Learn Features For Deformable Registration? Deep Learning Fails, MICCAI 2018 [PDF]

  • Raghav Mehta and Jayanthi Sivaswamy - M-net: A Convolutional Neural Network for Deep Brain Structure Segmentation, Biomedical Imaging (ISBI 2017), 2017 IEEE 14th International Symposium on. IEEE, 2017. [PDF]

  • R Mehta, Aabhas Majumdar and Jayanthi Sivaswamy - BrainSegNet: a convolutional neural network architecture for automated segmentation of human brain structures Journal of Medical Imaging 4.2 (2017): 024003-024003. [PDF]

  • Raghav Mehta and Jayanthi Sivaswamy - A Hybrid Approach to Tissue-based Intensity Standardization of Brain MRI Images Proc. of IEEE International Symposium on Bio-Medical Imaging(ISBI), 2016, 13 - 16 April, 2016, Prague. [PDF]


Downloads

thesis

Error Detection and Correction in Indic OCRs


Vinitha VS 

Abstract

Indian languages have a rich literature that is not available in digitized form. Attempts have been made to preserve this repository of art and information by maintaining a digital library of scanned books. However, this does not fulfill the purpose as indexing and searching the documents is difficult in images. An OCR system can be used to convert the scanned documents to editable form. However, the OCR systems are error prone. These errors are largely unavoidable and occur due to issues like poor-quality images, complex font, unknown glyphs etc. A post-processing system can help in improving the accuracy by using the information about the patterns and constraints in the word and sentence formation to identify the errors and correct them. OCR is considered to be a problem attempted with marked success in Latin scripts, especially English. This is not the case with Indic scripts as the error rates of various OCR systems available are comparatively high. The OCR pipeline includes three main stages, namely segmentation, text recognition and post-processing. We observe that Indic scripts have complex scripts and glyph segmentation itself is a challenge. The existence of visually similar glyphs also makes the recognition process difficult. The challenges faced in the post-processing stage are largely due to the properties of Indian languages. The inflectional properties of some languages like Telugu and Malayalam and agglutination of words creates issues due to the enormous and growing vocabulary in these languages. Unlike alphabet system in English, Indic scripts follow alphasyllabary writing system. Hence the choice of unicodes as the basic unit of a word is questionable. Aksharas which are a more meaningful unit is considered as a better alternative to unicodes. In this thesis, we analyze the challenges in building an efficient post-processor for Indic language OCR s and propose two novel error detection techniques. The post-processing module deals with the detection of errors in the recognized text and correction of those detected errors. To understand the issues in post-processing in Indian languages, we first perform a statistical analysis of the textual data. The unavailability of huge corpus prompted us to crawl various newspaper sites and Wikipedia dump to obtain the required text data. We compare the unique word distribution and word cover of popular Indian languages with English. We observe that languages like Telugu, Tamil, Kannada and Malayalam tend to have huge number of unique words compared to English. We also observe how many words get converted to other valid words in the language, using the Hamming distance between the words as a measure. We empirically analyze the effectiveness of statistical language models for error detection and correction. First we use an ngram model for detection of errors in the OCR output. We use akshara splitwords to create a bigram and trigram language model which gives the probability of a word. A word is declared as an error word if it has lower probability than a pre-computed threshold value. For error correction, we replace the lowest probability ngram with a higher probability one from the ngram list. We observe that akshara level ngrams perform better than unicode level ngram models in both error detection and correction. We also discuss why the dictionary based method, a popular method used in English, is not a reliable solution for error detection and correction in case of Indic OCR s. We use a simple binary dictionary method for error detection, wherein if the word is present in the dictionary, it is tagged as a correct word and error otherwise. The major bottleneck in using a lexicon is the enormous words in the languages like Telugu and Malayalam. In error correction, we use Levenshtein and Gestalt scores to select the candidate words from the dictionary for replacement of error word. Inflection of words causes issues in selecting the correct words as the candidate list consists of many words which are close to the error word. We propose two novel methods for detecting errors in the OCR output. Both the methods are language independent and does not require knowledge of language grammar. For detecting the errors in the OCR output, the first method proposed uses a recurrent neural network to learn the patterns of errors and correct words in the OCR output. The second method is using a Gaussian mixture model based clustering technique. Both methods use a language model of unicode as well as akshara split words in creating the features. We argue that aksharas are a better choice as the basic unit of a word than unicode. An akshara is formed by the combination of one or more unicode characters. We tested our method on four popular Indian languages and report an average error detection performance above 80% on a dataset of 5 K pages recognized using two state of the art OCR systems.

 

Year of completion:  December 2017
 Advisor : C V Jawahar

Related Publications

  • V S Vinitha, Minesh Mathew and C. V. Jawahar - An Empirical Study of Effectiveness of Post-processing in Indic Scripts 6th International Workshop on Multilingual OCR, Kyoto, Japan, 2017. [PDF]

  • V S Vinitha, C V Jawahar - Error Detection in Indic OCRs - Proceedings of 12th IAPR International Workshop on Document Analysis Systems (DAS'16), 11-14 April, 2016, Santorini, Greece. [PDF]


Downloads

thesis

More Articles …

  1. Understanding Short Social Video Clips using Visual-Semantic Joint Embedding
  2. Fine Pose Estimation and Region Proposals from a Single Image
  3. Efficient Annotation of Objects for Video Analysis
  4. Landmark Detection in Retinal Images
  • Start
  • Prev
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. MS Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.