CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Exemplar based approaches on Face Fiducial Detection and Frontalization


Mallikarjun BR

Abstract

Computer vision solutions such as face detection and recognition, facial reenactment, facial expression analysis and gender detection have seen fruitful applications in various domains such as security, surveillance, social media and animation. Many of the above solutions have common pre-processing steps such as fiducial detection, appearance modeling, face structural modelings etc. These steps can be considered as fundamental problems to be solved in building any computer vision solutions concerning face images. In this thesis, we propose exemplar based approaches to solve two fundamental problems, such as face fiducial detection and face frontalization. Exemplar based approaches have been proved to work in various computer vision problems, such as object detection, image impainting, object removal, action recognition, gesture recognition. This approach directly utilizes the information residing in the examples to achieve a certain objective, instead of coming up with a model representing all the examples and has shown to be effective. Face fiducial detection involves detecting key points on the faces such as eye corner, nose tip, mouth tips etc. It is one of the main pre-processing step done for face recognition, facial animation, gender detection, gaze identification and expression recognition systems. Number of different approaches like active shape models, regression based methods, cascaded neural networks, tree based methods and exemplar based approaches have been proposed in the recent past. Many of these algorithms only address part of the problems in this area. We propose an exemplar based approach which takes advantage of the complimentarity of different approaches and obtain consistently superior performance over the state-of-the-art methods. We provide extensive experiments over three popular datasets. Face frontalization is the process of synthesizing frontal view of the face given a non-frontal view. Method proposed for frontalization can be used in intelligent photo editing tools and also aids in improving the accuracy of face recognition systems. Methods previously proposed involve estimating the 3D model or assuming a generic 3D model of the face. Estimating an accurate 3D model of the face is not a completely solved problem and assumption of generic 3D model of the face results in loss of crucial shape cues. We propose an exemplar based approach which does not require 3D model of the face. We show that our method is efficient and performs consistently better than other approaches.

 

Year of completion:  May 2017
 Advisor : C V Jawahar

Related Publications

  • Mallikarjun B R, Visesh Chari, C. V. Jawahar , Akshay Asthana - Face Fiducial Detection by Consensus of Exemplars Proceedings of the IEEE Winter Conference on Applications of Computer Vision(WACV), 2016. [PDF]

  • Mallikarjun B.R., C.V. Jawahar - Efficient Face Frontalization in Unconstrained Images Proceedings of the Fifth National Conference on Computer Vision Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2015), 16-19 Dec 2015, Patna, India.


Downloads

thesis

Visual Perception Based Assistance for Fundus Image Readers


Rangrej Bharat

Abstract

Diabetic Retinopathy (DR) is a condition where individuals with diabetes develop a disease in the inner wall of eye known as retina. DR is a major cause of visual impairments and early detection can prevent vision loss. Use of automatic systems for DR diagnosis is limited due to their lower accuracy. As an alternative, reading-centers are becoming popular in real-world scenarios. Reading center is a facility where retinal images coming from various sources are stored and trained personals(who might not be experts) analyze them. In this thesis we look at techniques to increase efficiency of DR image-readers working in reading centers. The first half of this thesis aims at identifying efficient image-reading technique which is both fast and accurate. Towards this end we have conducted an eye-tracking study with medical experts while they were reading images for DR diagnosis. The analysis shows that experts employ mainly two types of reading strategies: dwelling and tracing. Dwelling strategy appears to be accurate and faster than tracing strategy. Eye movements of all the experts are combined in a novel way to extract an optimal image scanning strategy, which can be recommended to image-readers for efficient diagnosis. In order to increase the efficiency further, we propose a technique where saliency of lesions can be boosted for better visibility of lesions. This is named as an Assistive Lesion Emphasis System(ALES) and demonstrated in the second half of the thesis. ALES is developed as a two stage system: saliency detection and lesion emphasis. Two biologically inspired saliency models, which mimic human visual system, are designed using unsupervised and supervised techniques. Unsupervised saliency model is inspired from human visual system and achieved 10% higher recall than other existing saliency models when compared with average gazemap of 15 retinal experts. Supervised saliency model developed as deep learning based implementation of biologically inspired saliency model proposed by Itti-Koch(Itti, L., Koch, C. and Niebur, E., 1998) and achieves 10% to 20% higher AUC compared to existing saliency model when compared with manual markings. Saliency maps generated by these models are used to boost the prominence of lesions locally. This is done using two types of selective enhancement techniques. One technique uses multiscale fusion of saliency map with original image and other uses spatially varying gamma correction; both increases CNR of lesions by 30%. One saliency model and one selective-enhancement technique are clubbed together to illustrate two complete ALESs. DR diagnosis done by analyzing ALES output using optimal strategy should presumably be faster and accurate.

 

Year of completion:  July 2017
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Karthik G, Rangrej S and Jayanthi Sivaswamy - A deep learning framework for segmentation of retinal layers from OCT images ACPR, Nanjing [PDF]

  • Rangrej Bbharat and Jayanthi Sivaswamy - ALES: an assistive system for fundus image readers Journal of Medical Imaging, 4, Issue: 02, April 2017 [PDF]

  • Samrudhdhi B. Rangrej and Jayanthi Sivaswamy - A biologically inspired saliency model for color fundus images Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing. ACM, 2016. [PDF]


Downloads

thesis

A design for an automated Optical Coherence Tomography analysis system


Karthik Gopinath

Abstract

Age related macula degeneration (AMD), Cystoid Macular Edema (CME) and glaucoma are retinal diseases that affects vision and often lead to irreversible blindness. Many imaging modalities have been used for screening of retinal diseases. Optical coherence tomography (OCT) is one such imaging modality that provides structural information of retina. Optical coherence tomography angiography (OCTA) also provides vascular information of retina in addition to structural information. With advancement in OCT and OCTA technology, the number of patients being scanned using these modalities is increasing exponentially. Manual analysis of vast data often makes human experts feel fatigue. Also, this extends the time to treat any patient and creates a demand to develop a fast and accurate automated OCT image analysis system. The system proposed in this thesis aims at analysing retinal anatomy and its diseases. We approach the problem of automatically analysing the retinal anatomy by segmenting its layers using a deep learning framework. In literature these algorithms usually requires preprocessing steps, such as denoising, image flatenning and edge detection, all of which involve separate parameter tunings. We propose a deep learning technique to automate all these steps and handle the presence/absence of pathologies. This model consists of a combination of Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The CNN is used to extract layers of interest image and extract the edges, while the LSTM is used to trace the layer boundary. This model is trained on a mixture of normal and AMD cases using minimal data. Validation results on three public datasets show that the pixel-wise mean absolute error obtained with our system is 1.30 ± 0.48 which is lower than inter-marker error of 1.79 ± 0.76. The performance of the proposed module is also on par with the existing methods. We next propose three modules for disease analysis, first for diagnosing them at volume level, second at slice level and finally at pixel level by segmenting the abnormality. Firstly, we propose a novel method for glaucoma assessment at volume level using data from OCTA modality. The goal of this module is to gain insights about glaucoma indicators from both structural information (OCT volume) and vascular information (angioflow images). The proposed method achieves a mean sensitivity of 94% specificity of 91% and accuracy of 92% on 49 normal and 18 glaucomatous volumes. Secondly, for slice level disease(AMD and CME) classification from OCT volumes, we learn a decision boundary using a CNN in the new extremal representation space ofan image. Evaluating on four publically available datasets with training set consisting of 3500 OCT slices and test set having 1995 OCT slices, this module achieves a mean sensitivity of 96%. specificity of 97% and accuracy of 96%. Finally, we propose an automated cyst segmentation algorithm from OCT volumes. We propose a biologically inspired method based on selective enhancement of the cysts, by inducing motion to a given OCT slice. A CNN is designed to learn a mapping function that combines the result of multiple such motions to produce a probability map for cyst locations in a given slice. The final segmentation of cysts is obtained via a simple clustering of the detected cyst locations. The proposed method is trained on OPTIMA Cyst segmentation challenge (OCSC) train test and achieves a mean Dice Coefficient (DC) of 0.71, 0.69 and 0.79 on the OCSC test set, DME dataset and AEI dataset respectively. Thus, the complete system can be employed in screening scenarios to aid retinal experts in accurate and faster analysis of retinal anatomy and disease.

 

Year of completion:  July 2017
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Karthik G and Jayanthi Sivaswamy - Segmentation of retinal cysts from Optical Coherence Tomography volumes via selective enhancement IEEE Journal of Biomedical Health Informatics 2018 [PDF]

  • Karthik G, Rangrej S and Jayanthi Sivaswamy - A deep learning framework for segmentation of retinal layers from OCT images ACPR, Nanjing [PDF]

  • Karthik Gopinath, Jayanthi Sivaswamy and Tarannum Mansoori - Automatic Glaucoma Assessment from Angio-OCT Images Proc. of IEEE International Symposium on Bio-Medical Imaging(ISBI), 2016, 13 - 16 April, 2016, Prague. [PDF]


Downloads

thesis

Active Learning & its Applications


Priyam Bakliwal 

Abstract

Active learning also known as query learning is a sub-field of machine learning. It relies on the assumption that if the learning algorithm is allowed to choose the data from which it learns, it will perform better with less training. Active Learning is predominantly used in areas where getting a large amount of annotated data for training is not feasible or extremely expensive. Active learning models aims to overcome the annotation bottleneck by asking queries in the form of unlabelled instances to be labelled by a human. In this way, the framework aims to achieve high accuracy using very less labelled instances resulting in minimization of annotation cost. In the first part of our work, we propose an Active Learning based Image Annotation model. Automatic image annotation is the computer vision task of assigning a set of appropriate textual tags to a novel image. The aim is to eventually bridge the semantic gap of visual and textual representations with the help of these tags. The advantages of the proposed model includes: (a). It is able to output the variable number of tags for images which improves the accuracy. (b). It is effectively able to choose the difficult samples that needs to be manually annotated and thereby reducing the human annotation efforts. Studies on Corel and IAPR TC-12 datasets validate the effectiveness of this model. In the second part of the thesis, we propose an active learning based solution for efficient, scalable and accurate annotations of objects in video sequences. We focus on reducing the human annotation efforts with simultaneous increase in tracking accuracy to get precise, tight bounding boxes around an object of interest. We use a novel combination of two different tracking algorithms to track an object in the whole video sequence. We propose a sampling strategy to sample the most informative frame which is given for human annotation. This newly annotated frame is used to update the previous annotations. Thus, by collaborative efforts of both human and the system we obtain accurate annotations with minimal effort. We have quantitatively and qualitatively validated the results on eight different datasets. Active Learning is efficient in Natural Language documents as well. Multilingual processing tasks like statistical machine translation and cross language information retrieval rely mainly on availability of accurate parallel corpora. In the third section we propose a simple yet efficient method to generate huge amount of reasonably accurate parallel corpus using OCR with minimal user efforts. We show the performance of our proposed method on a manually aligned dataset of 300 Hindi-English sentences and 100 English-Malayalam sentences. In the last section we utilised Active Learning for model updation in c QA system. Community Question Answering(c QA ) platforms like Yahoo! Answers, Baidu Zhidao, Quora, StackOverflow etc. provides experts to give precise and targeted answers to any question posted by a user. These sites form huge repositories of information in the form of questions and answers. Retrieval of semantically relevant questions and answers from c QA forums have been an important research area for the past few years. Considering the ever growing nature of the data in c QA forums, these models cannot be kept stagnant. They need to be continuously updated so that they can adapt to the changing patterns of Questions-Answers with time. Such updation procedures are expensive and time consuming. We propose a novel Topic model based active sampler named Picky. It intelligently selects a smaller subset of the newly added Question-Answer pairs to be fed to the existing model for updating it. Evaluations on real life c QA datasets show that our approach converges at a faster rate, giving comparable performance to other baseline sampling strategies updated with data of ten times the size.

 

Year of completion:  July 2017
 Advisor : C V Jawahar

Related Publications

  • Priyam Bakliwal, Guruprasad M. Hegde and C.V. Jawahar - Collaborative Contributions for Better Annotations VISIGRAPP (6: VISAPP). 2017. [PDF]

  • Priyam Bakliwal, Devadath V V and C.V. Jawahar - Align Me: A framework to generate Parallel Corpus Using OCRs & Bilingual Dictionaries WSSANLP 2016 (2016): 173. [PDF]

  • Priyam Bakliwal, C.V. Jawahar - Active Learning Based Image Annotation Proceedings of the Fifth National Conference on Computer Vision Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2015), 16-19 Dec 2015, Patna, India. [PDF]


Downloads

thesis

Population specific template construction and brain structure segmentation using deep learning methods


Raghav Mehta 

Abstract

A brain template, such as MNI152 is a digital (magentic resonance image or MRI) representation of the brain in a reference coordinate system for the neuroscience research. Structural atlases, such as AAL and DKA, delineate the brain into cortical and subcortical structures which are used in Voxel Based Morphometry (VBM) and fMRI analysis. Many population specific templates, i.e. Chinese, Korean, etc., have been constructed recently. It was observed that there are morphological differences between the average brain of the eastern and the western population. In this thesis, we report on the development of a population specific brain template for the young Indian population. This is derived from a multi-centeric MRI dataset of 100 Indian adults (21 - 30 years old). Measurements made with this template indicated that the Indian brain, on average, is smaller in height and width compared to the Caucasian and the Chinese brain. A second problem this thesis examines is automated segmentation of cortical and non-cortical human brain structures, using multiple structural atlases. This has been hitherto approached using computationally expensive non-rigid registration followed by label fusion. We propose an alternative approach for this using a Convolutional Neural Network (CNN) which classifies a voxel into one of many structures. Evaluation of the proposed method on various datasets showed that the mean Dice coefficient varied from 0.844±0.031 to 0.743±0.019 for datasets with the least (32) and the most (134) number of labels, respectively. These figures are marginally better or on par with those obtained with the current state of the art methods on nearly all datasets, at a reduced computational time. We also propose an end-to-end trainable Fully Convolutional Neural Network (FCNN) architecture called the M-net, for segmenting deep (human) brain structures. A novel scheme is used to learn to combine and represent 3D context information of a given slice in a 2D slice. Consequently, the M-net utilizes only 2D convolution though it operates on 3D data. Experiment results show that the M-net outperforms other state-of-the-art model-based segmentation methods in terms of dice coefficient and is at least 3 times faster than them.

 

Year of completion:  July 2017
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Jayanthi Sivaswamy, Thottupattu AJ , Mehta R, Sheelakumari R and Kesavadas C - Construction of Indian Human Brain Atlas, Neurology India (To appear).[PDF]

  • Majumdar A, Mehta R and Jayanthi Sivaswamy - To Learn Or Not To Learn Features For Deformable Registration? Deep Learning Fails, MICCAI 2018[PDF]

  • Raghav Mehta and Jayanthi Sivaswamy - M-net: A Convolutional Neural Network for deep brain structure segmentation Biomedical Imaging (ISBI 2017), 2017 IEEE 14th International Symposium on. IEEE, 2017. [PDF]

  • R Mehta, Aabhas Majumdar and Jayanthi Sivaswamy - BrainSegNet: a convolutional neural network architecture for automated segmentation of human brain structures Journal of Medical Imaging 4.2 (2017): 024003-024003. [PDF]

  • Raghav Mehta and Jayanthi Sivaswamy - A Hybrid Approach to Tissue-based Intensity Standardization of Brain MRI Images Proc. of IEEE International Symposium on Bio-Medical Imaging(ISBI), 2016, 13 - 16 April, 2016, Prague. [PDF]


Downloads

thesis

More Articles …

  1. Error Detection and Correction in Indic OCRs
  2. Understanding Short Social Video Clips using Visual-Semantic Joint Embedding
  3. Fine Pose Estimation and Region Proposals from a Single Image
  4. Efficient Annotation of Objects for Video Analysis
  • Start
  • Prev
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.