CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Cognitive Vision: Examining Attention, Engagement and Cognitive load via Gaze and EEG


Viral Parekh

Abstract

Gaze and visual attention are very related. Analysis of gaze and attention can be used for behavior analysis, anticipation or to predict the engagement level of a person. Collectively all these problems fall into the space of cognitive vision systems. As the name suggests it is the intersection of two areas: computer vision and cognition. The goal of the cognitive vision system is to understand the principles of human vision and use them as inspiration to improve machine vision systems. In this thesis, we have focused on Eye gaze and Electroencephalogram (EEG) data to understand and analyze the attention, cognitive workload and demonstrated a few applications like engagement analysis and image annotation. With the presence of ubiquitous devices in our daily lives, effectively capturing and managing user attention becomes a critical device requirement. Gaze-lock detection to sense eye-contact with a device is a useful technique to track user’s interaction with the device. We propose an eye contact detection using a convolutional neural network (CNN) architecture, which achieves superior eye-contact detection performance as compared to state of the art methods with minimal data pre-processing; our algorithm is furthermore validated on multiple datasets, Gaze-lock detection is improved by combining head pose and eye-gaze information consistent with social attention literature. Further, we extend our work to analyze the engagement level in the person with dementia via visual attention. Engagement in dementia is typically measured using behavior observational scales (BOS) that are tedious and involve intensive manual labor to annotate, and are therefore not easily scalable. We propose AVEID, a low-cost and easy to use video-based engagement measurement tool to determine the level of engagement of a person with dementia (PwD) when interacting with a target object. We show that the objective behavioral measures computed via AVEID correlate well with subjective expert impressions for the popular MPES and OME BOS, confirming its viability and effectiveness. Moreover, AVEID measures can be obtained for a variety of engagement designs, thereby facilitating large-scale studies with PwD populations. Analysis of Cognitive load for a given user interface is an important measure of effectiveness or usability. We examine whether EEG-based cognitive load estimation is generalizable across the character, spatial pattern, bar graph and pie chart-based visualizations for the n-back task. Cognitive load is estimated via two recent approaches: (1) Deep convolutional neural network and Proximal support vector machines. Experiments reveal that cognitive load estimation suffers across visualizations suggesting that (a) they may inherently induce varied cognitive processes in users, and (b) effective adaptation techniques are needed to benchmark visual interfaces for usability given pre-defined tasks. Finally, the success of deep learning in computer vision has greatly increased the need for annotated image datasets. We propose an EEG (Electroencephalogram)-based image annotation system. While humans can recognize objects in 20-200 milliseconds, the need to manually label images results in a low annotation throughput. Our system employs brain signals captured via a consumer EEG device to achieve an annotation rate of up to 10 images per second. We exploit the P300 event-related potential (ERP) signature to identify target images during a rapid serial visual presentation (RSVP) task. We further perform unsupervised outlier removal to achieve an F1-score of 0.88 on the test set. The proposed system does not depend on category-specific EEG signatures enabling the annotation of any new image category without any model pre-training.

 

Year of completion:  July 2018
 Advisor : Prof. C.V. Jawahar and Ramanathan Subramanian

Related Publications

  • Viral Parekh, Ramanathan Subramanian, Dipanjan Roy C.V. Jawahar - An EEG-based Image Annotation System - National Conference on Computer Vision Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]


Downloads

thesis

Exemplar based approaches on Face Fiducial Detection and Frontalization


Mallikarjun BR

Abstract

Computer vision solutions such as face detection and recognition, facial reenactment, facial expression analysis and gender detection have seen fruitful applications in various domains such as security, surveillance, social media and animation. Many of the above solutions have common pre-processing steps such as fiducial detection, appearance modeling, face structural modelings etc. These steps can be considered as fundamental problems to be solved in building any computer vision solutions concerning face images. In this thesis, we propose exemplar based approaches to solve two fundamental problems, such as face fiducial detection and face frontalization. Exemplar based approaches have been proved to work in various computer vision problems, such as object detection, image impainting, object removal, action recognition, gesture recognition. This approach directly utilizes the information residing in the examples to achieve a certain objective, instead of coming up with a model representing all the examples and has shown to be effective. Face fiducial detection involves detecting key points on the faces such as eye corner, nose tip, mouth tips etc. It is one of the main pre-processing step done for face recognition, facial animation, gender detection, gaze identification and expression recognition systems. Number of different approaches like active shape models, regression based methods, cascaded neural networks, tree based methods and exemplar based approaches have been proposed in the recent past. Many of these algorithms only address part of the problems in this area. We propose an exemplar based approach which takes advantage of the complimentarity of different approaches and obtain consistently superior performance over the state-of-the-art methods. We provide extensive experiments over three popular datasets. Face frontalization is the process of synthesizing frontal view of the face given a non-frontal view. Method proposed for frontalization can be used in intelligent photo editing tools and also aids in improving the accuracy of face recognition systems. Methods previously proposed involve estimating the 3D model or assuming a generic 3D model of the face. Estimating an accurate 3D model of the face is not a completely solved problem and assumption of generic 3D model of the face results in loss of crucial shape cues. We propose an exemplar based approach which does not require 3D model of the face. We show that our method is efficient and performs consistently better than other approaches.

 

Year of completion:  May 2017
 Advisor : C V Jawahar

Related Publications

  • Mallikarjun B R, Visesh Chari, C. V. Jawahar , Akshay Asthana - Face Fiducial Detection by Consensus of Exemplars Proceedings of the IEEE Winter Conference on Applications of Computer Vision(WACV), 2016. [PDF]

  • Mallikarjun B.R., C.V. Jawahar - Efficient Face Frontalization in Unconstrained Images Proceedings of the Fifth National Conference on Computer Vision Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2015), 16-19 Dec 2015, Patna, India.


Downloads

thesis

Visual Perception Based Assistance for Fundus Image Readers


Rangrej Bharat

Abstract

Diabetic Retinopathy (DR) is a condition where individuals with diabetes develop a disease in the inner wall of eye known as retina. DR is a major cause of visual impairments and early detection can prevent vision loss. Use of automatic systems for DR diagnosis is limited due to their lower accuracy. As an alternative, reading-centers are becoming popular in real-world scenarios. Reading center is a facility where retinal images coming from various sources are stored and trained personals(who might not be experts) analyze them. In this thesis we look at techniques to increase efficiency of DR image-readers working in reading centers. The first half of this thesis aims at identifying efficient image-reading technique which is both fast and accurate. Towards this end we have conducted an eye-tracking study with medical experts while they were reading images for DR diagnosis. The analysis shows that experts employ mainly two types of reading strategies: dwelling and tracing. Dwelling strategy appears to be accurate and faster than tracing strategy. Eye movements of all the experts are combined in a novel way to extract an optimal image scanning strategy, which can be recommended to image-readers for efficient diagnosis. In order to increase the efficiency further, we propose a technique where saliency of lesions can be boosted for better visibility of lesions. This is named as an Assistive Lesion Emphasis System(ALES) and demonstrated in the second half of the thesis. ALES is developed as a two stage system: saliency detection and lesion emphasis. Two biologically inspired saliency models, which mimic human visual system, are designed using unsupervised and supervised techniques. Unsupervised saliency model is inspired from human visual system and achieved 10% higher recall than other existing saliency models when compared with average gazemap of 15 retinal experts. Supervised saliency model developed as deep learning based implementation of biologically inspired saliency model proposed by Itti-Koch(Itti, L., Koch, C. and Niebur, E., 1998) and achieves 10% to 20% higher AUC compared to existing saliency model when compared with manual markings. Saliency maps generated by these models are used to boost the prominence of lesions locally. This is done using two types of selective enhancement techniques. One technique uses multiscale fusion of saliency map with original image and other uses spatially varying gamma correction; both increases CNR of lesions by 30%. One saliency model and one selective-enhancement technique are clubbed together to illustrate two complete ALESs. DR diagnosis done by analyzing ALES output using optimal strategy should presumably be faster and accurate.

 

Year of completion:  July 2017
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Karthik G, Rangrej S and Jayanthi Sivaswamy - A deep learning framework for segmentation of retinal layers from OCT images ACPR, Nanjing [PDF]

  • Rangrej Bbharat and Jayanthi Sivaswamy - ALES: an assistive system for fundus image readers Journal of Medical Imaging, 4, Issue: 02, April 2017 [PDF]

  • Samrudhdhi B. Rangrej and Jayanthi Sivaswamy - A biologically inspired saliency model for color fundus images Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing. ACM, 2016. [PDF]


Downloads

thesis

A design for an automated Optical Coherence Tomography analysis system


Karthik Gopinath

Abstract

Age related macula degeneration (AMD), Cystoid Macular Edema (CME) and glaucoma are retinal diseases that affects vision and often lead to irreversible blindness. Many imaging modalities have been used for screening of retinal diseases. Optical coherence tomography (OCT) is one such imaging modality that provides structural information of retina. Optical coherence tomography angiography (OCTA) also provides vascular information of retina in addition to structural information. With advancement in OCT and OCTA technology, the number of patients being scanned using these modalities is increasing exponentially. Manual analysis of vast data often makes human experts feel fatigue. Also, this extends the time to treat any patient and creates a demand to develop a fast and accurate automated OCT image analysis system. The system proposed in this thesis aims at analysing retinal anatomy and its diseases. We approach the problem of automatically analysing the retinal anatomy by segmenting its layers using a deep learning framework. In literature these algorithms usually requires preprocessing steps, such as denoising, image flatenning and edge detection, all of which involve separate parameter tunings. We propose a deep learning technique to automate all these steps and handle the presence/absence of pathologies. This model consists of a combination of Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The CNN is used to extract layers of interest image and extract the edges, while the LSTM is used to trace the layer boundary. This model is trained on a mixture of normal and AMD cases using minimal data. Validation results on three public datasets show that the pixel-wise mean absolute error obtained with our system is 1.30 ± 0.48 which is lower than inter-marker error of 1.79 ± 0.76. The performance of the proposed module is also on par with the existing methods. We next propose three modules for disease analysis, first for diagnosing them at volume level, second at slice level and finally at pixel level by segmenting the abnormality. Firstly, we propose a novel method for glaucoma assessment at volume level using data from OCTA modality. The goal of this module is to gain insights about glaucoma indicators from both structural information (OCT volume) and vascular information (angioflow images). The proposed method achieves a mean sensitivity of 94% specificity of 91% and accuracy of 92% on 49 normal and 18 glaucomatous volumes. Secondly, for slice level disease(AMD and CME) classification from OCT volumes, we learn a decision boundary using a CNN in the new extremal representation space ofan image. Evaluating on four publically available datasets with training set consisting of 3500 OCT slices and test set having 1995 OCT slices, this module achieves a mean sensitivity of 96%. specificity of 97% and accuracy of 96%. Finally, we propose an automated cyst segmentation algorithm from OCT volumes. We propose a biologically inspired method based on selective enhancement of the cysts, by inducing motion to a given OCT slice. A CNN is designed to learn a mapping function that combines the result of multiple such motions to produce a probability map for cyst locations in a given slice. The final segmentation of cysts is obtained via a simple clustering of the detected cyst locations. The proposed method is trained on OPTIMA Cyst segmentation challenge (OCSC) train test and achieves a mean Dice Coefficient (DC) of 0.71, 0.69 and 0.79 on the OCSC test set, DME dataset and AEI dataset respectively. Thus, the complete system can be employed in screening scenarios to aid retinal experts in accurate and faster analysis of retinal anatomy and disease.

 

Year of completion:  July 2017
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Karthik G and Jayanthi Sivaswamy - Segmentation of retinal cysts from Optical Coherence Tomography volumes via selective enhancement IEEE Journal of Biomedical Health Informatics 2018 [PDF]

  • Karthik G, Rangrej S and Jayanthi Sivaswamy - A deep learning framework for segmentation of retinal layers from OCT images ACPR, Nanjing [PDF]

  • Karthik Gopinath, Jayanthi Sivaswamy and Tarannum Mansoori - Automatic Glaucoma Assessment from Angio-OCT Images Proc. of IEEE International Symposium on Bio-Medical Imaging(ISBI), 2016, 13 - 16 April, 2016, Prague. [PDF]


Downloads

thesis

Active Learning & its Applications


Priyam Bakliwal 

Abstract

Active learning also known as query learning is a sub-field of machine learning. It relies on the assumption that if the learning algorithm is allowed to choose the data from which it learns, it will perform better with less training. Active Learning is predominantly used in areas where getting a large amount of annotated data for training is not feasible or extremely expensive. Active learning models aims to overcome the annotation bottleneck by asking queries in the form of unlabelled instances to be labelled by a human. In this way, the framework aims to achieve high accuracy using very less labelled instances resulting in minimization of annotation cost. In the first part of our work, we propose an Active Learning based Image Annotation model. Automatic image annotation is the computer vision task of assigning a set of appropriate textual tags to a novel image. The aim is to eventually bridge the semantic gap of visual and textual representations with the help of these tags. The advantages of the proposed model includes: (a). It is able to output the variable number of tags for images which improves the accuracy. (b). It is effectively able to choose the difficult samples that needs to be manually annotated and thereby reducing the human annotation efforts. Studies on Corel and IAPR TC-12 datasets validate the effectiveness of this model. In the second part of the thesis, we propose an active learning based solution for efficient, scalable and accurate annotations of objects in video sequences. We focus on reducing the human annotation efforts with simultaneous increase in tracking accuracy to get precise, tight bounding boxes around an object of interest. We use a novel combination of two different tracking algorithms to track an object in the whole video sequence. We propose a sampling strategy to sample the most informative frame which is given for human annotation. This newly annotated frame is used to update the previous annotations. Thus, by collaborative efforts of both human and the system we obtain accurate annotations with minimal effort. We have quantitatively and qualitatively validated the results on eight different datasets. Active Learning is efficient in Natural Language documents as well. Multilingual processing tasks like statistical machine translation and cross language information retrieval rely mainly on availability of accurate parallel corpora. In the third section we propose a simple yet efficient method to generate huge amount of reasonably accurate parallel corpus using OCR with minimal user efforts. We show the performance of our proposed method on a manually aligned dataset of 300 Hindi-English sentences and 100 English-Malayalam sentences. In the last section we utilised Active Learning for model updation in c QA system. Community Question Answering(c QA ) platforms like Yahoo! Answers, Baidu Zhidao, Quora, StackOverflow etc. provides experts to give precise and targeted answers to any question posted by a user. These sites form huge repositories of information in the form of questions and answers. Retrieval of semantically relevant questions and answers from c QA forums have been an important research area for the past few years. Considering the ever growing nature of the data in c QA forums, these models cannot be kept stagnant. They need to be continuously updated so that they can adapt to the changing patterns of Questions-Answers with time. Such updation procedures are expensive and time consuming. We propose a novel Topic model based active sampler named Picky. It intelligently selects a smaller subset of the newly added Question-Answer pairs to be fed to the existing model for updating it. Evaluations on real life c QA datasets show that our approach converges at a faster rate, giving comparable performance to other baseline sampling strategies updated with data of ten times the size.

 

Year of completion:  July 2017
 Advisor : C V Jawahar

Related Publications

  • Priyam Bakliwal, Guruprasad M. Hegde and C.V. Jawahar - Collaborative Contributions for Better Annotations VISIGRAPP (6: VISAPP). 2017. [PDF]

  • Priyam Bakliwal, Devadath V V and C.V. Jawahar - Align Me: A framework to generate Parallel Corpus Using OCRs & Bilingual Dictionaries WSSANLP 2016 (2016): 173. [PDF]

  • Priyam Bakliwal, C.V. Jawahar - Active Learning Based Image Annotation Proceedings of the Fifth National Conference on Computer Vision Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2015), 16-19 Dec 2015, Patna, India. [PDF]


Downloads

thesis

More Articles …

  1. Population specific template construction and brain structure segmentation using deep learning methods
  2. Error Detection and Correction in Indic OCRs
  3. Understanding Short Social Video Clips using Visual-Semantic Joint Embedding
  4. Fine Pose Estimation and Region Proposals from a Single Image
  • Start
  • Prev
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.