CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us
  • Login

Road Topology Extraction from Satellite images by Knowledge Sharing

 


Anil Kumar

Abstract

Motivated by human visual system machine or computer vision technology has been revolutionized in last few decades and make a strong impact in wide range of applications such as object recognition, face recognition and identification etc. However, despite much encouraging advancement, there are still many fields which lack to utilize the full potential of computer vision techniques. One such field is to analyze the satellite images for geo-spatial applications. In past, building and launching the satellites in to space was expensive, and was big hurdle in acquir- ing low cost images from satellites. However, with technological innovations, the inexpensive satellites are capable of sending terabytes of images of our planet on the daily basis that can provide insights on global-scale economic, social and industrial processes. The significant applications of satellite imagery are urban planning, crop yield forecasting, mapping and change detection. The most obvious application of satellite imagery is to extract topological road network from the satellite images, as it plays an important role in planning the mobility between multiple geographical locations of interest. The extraction of road topology from the satellite images is formulated as binary segmentation problem in vision community. Despite of huge satellite imagery, the fundamental hurdle in applying computer vision algorithms based on deep learning architectures is unavailability of labeled data and causes the poor results. Another challenge in extraction of the roads from satellite imagery is visual ambiguity in identifying the roads and occlusion by various objects. This challenge causes many standard algorithms in computer vision research to perform poorly and is the major concern. In this thesis we develop deep learning based models and techniques that allows us to address the above challenges. In the first part of our work, we make an attempt to perform road segmentation with the less labeled data and existing unsupervised feature learning techniques. In particular, we use self-supervised technique to learn visual representations with an artificial supervision, followed by fine tuning of model with labeled dataset. We use semantic image in-painting as an artificial/auxiliary task for supervision. The enhancement of road segmentation is in direct relation with the features captured by model through inpainting of the erased regions in the image. To further enhance the feature learning, we propose to inpaint the difficult regions of the image and develop a novel adversarial training scheme to learn mask used for erasing the image. The proposed scheme gradually learns to erase regions, which are difficult to inpaint. Thus, this increase in difficulty level of image in-painting leads to better road segmentation. Additionally, we study the proposed approach on scene parsing and land classification in satellite images.

 

Year of completion:  June 2019
 Advisor : C V Jawahar

Related Publications


    Downloads

    thesis

    Towards Scalable Applications for Handwritten Documents

     


    rowtula Vijay

    Abstract

    Even in today’s world, a large number of documents are generated as handwritten documents. This is specially true when the knowledge/expertise is captured conveniently with availability of electronic gad- gets. Information extraction from handwritten document images has numerous applications, especially in digitization of archived handwritten documents, assessing patient medical records and automated evaluation of student handwritten assessments, to mention a few. Document categorization and tar- geted information extraction from various such sources can help in designing better search and retrieval systems for handwritten document images. Information extraction from handwritten medical records written in ambulance for doctor’s interpretation in hospital, reading postal address to automate the let-ter sorting are examples where document image work flow helped in scaling the system with minimal human intervention. In such work flow systems, images flow across subjects who can be in different locations. Our work is motivated with the success of these document image work-flow systems that were put into practice when the handwriting recognition accuracy was unacceptably low. Our goal is to bring scalability in handwritten document processing which can enhance the throughput of the analysis by employing multitude of developments in document image space. In this thesis, we initially focus on presenting a document image workflow system that helps in scal-ing the handwritten student assessments in a typical university setting. We observed that this improves the efficiency since the book keeping time as well as physical paper movement is minimized. An electronic workflow can make the anonymization easy, alleviating the fear of biases in many cases. Also, parallel and distributed assessment by multiple instructors is straightforward in an electronic workflow system. At the heart of our solution, we have (i) a distributed image capture module with a mobile phone (ii) image processing algorithms that improve the quality and readability (iii) image annotation module that process the evaluations/feedbacks as a separate layer. Further, we extend our work by proposing an approach to detect POS and Named Entity tags directly from offline handwritten document images without explicit character/word recognition. We observed that POS tagging on handwritten text sequences increases the predictability of named entities and also brings a linguistic aspect to handwritten document analysis. As a pre-processing step, the document image is binarized and segmented into word images. The proposed approach comprising of a CNN - LSTM model, trained on word image sequences produces encouraging results on challenging IAM dataset. Finally, we describe an effective method for automatically evaluating the short descriptive hand-written answers from the digitized images. Automated evaluation of handwritten answers has been a challenging problem for scaling education system for many years. Speeding up the evaluation still re- mains as the major bottleneck for enhancing the throughput. Our goal is to assign an evaluation score that is comparable to the human assigned scores. Our solution is based on the observation that a human evaluator judges the relevance of the answer using a set of keywords and their semantics. Since reliable handwriting recognizer are not yet available, we attempt this problem in the image space. We model this problem as a self supervised, feature based classification problem, which can fine tune itself for each question without any explicit supervision. We conduct experiments on three different datasets obtained from students. Experiments show that our method performs comparable to that of human evaluators. With these works, we attempted to bring state-of-the-art enhancements in handwritten document analysis and deep learning into scalable applications which can be helpful in the field of education.

     

    Year of completion:  June 2019
     Advisor : C V Jawahar

    Related Publications

    • Vijay Rowtula, Subba Reddy Oota and C. V. Jawahar -  Towards Automated Evaluation of Handwritten Assessments, The 15th International Conference on Document Analysis and Recognition (ICDAR), 2019, 20 - 25 September 2019, Australia.[PDF]

    • Vijay Rowtula, Praveen Krishnan, C.V. Jawahar - POS Tagging and Named Entity Recognition on Handwritten Documents, ICON, 2018[PDF]


    Downloads

    thesis

    Analyzing Racket Sports From Broadcast Videos

     


    Anurag Ghosh

    Abstract

    Sports video data is recorded for nearly every major tournament but remains archived and inaccessi-ble to large scale data mining and analytics. However, Sports videos have a inherent temporal structure,due to the nature of sports themselves. For instance, tennis, comprises of points, games and sets played between the two players/teams. Recent attempts in sports analytics are not fully automatic for finer details or have a human in the loop for high level understanding of the game and, therefore, have limited practical applications to large scale data analysis. Many of these applications depend on specialized camera setups and wearable devices which are costly and unwieldy for players and coaches, specially in a resource constrained environments like India. Utilizing very simple and non-intrusive sensor(s) (like a single camera) along with computer vision models is necessary to build indexing and analytics systems. Such systems can be used to sort through huge swathes of data, help coaches look at interesting video segments quickly, mine player data and even generate automatic reports and insights for a coach to monitor. Firstly, we demonstrate a score based indexing approach for broadcast video data. Given a broadcast sport video, we index all the video segments with their scores to create a navigable and searchable match. Even though our method is extensible to any sport with scores, we evaluate our approach on broadcast tennis videos. Our approach temporally segments the rallies in the video and then recognizes the scores from each of the segments, before refining the scores using the knowledge of the tennis scoring system. We finally build an interface to effortlessly retrieve and view the relevant video segments by also automatically tagging the segmented rallies with human accessible tags such as ‘fault’ and ‘deuce’. The efficiency of our approach is demonstrated on broadcast tennis videos from two major tennis tournaments. Secondly, we propose an end-to-end framework for automatic attributes tagging and analysis of broadcast sport videos. We use commonly available broadcast videos of badminton matches and, un- like previous approaches, we do not rely on special camera setups or additional sensors. We propose a method to analyze a large corpus of broadcast videos by segmenting the points played, tracking and recognizing the players in each point and annotating their respective strokes. We evaluate the performance on 10 Olympic badminton matches with 20 players and achieved 95.44% point segmentation accuracy, 97.38% player detection score (mAP@0.5), 97.98% player identification accuracy, and stroke segmentation edit scores of 80.48%. We further show that the automatically annotated videos alone could enable the gameplay analysis and inference by computing understandable metrics such as player’s reaction time, speed, and footwork around the court, etc. Lastly, we adapt our proposed framework for tennis games to mine spatiotemporal and event data from large set of broadcast videos. Our broadcast videos include all Grand Slam matches played between Roger Federer, Rafael Nadal and Novac Djokovic. Using this data, we demonstrate that we can infer the playing styles and strategies of tennis players. Specifically, we study the evolution of famous rivalries of Federer, Nadal, and Djokovic across time. We compare and validate our inferences with expert opinions of their playing styles.

     

    Year of completion:  June 2019
     Advisor : C V Jawahar

    Related Publications

    • Anurag Ghosh, Suriya Singh and C.V. Jawahar -  Towards Structured Analysis of Broadcast Badminton Videos IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA, 2018 [PDF]

    • Anurag Ghosh and C. V. Jawahar -  SmartTennisTV: Automatic indexing of tennis videos National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]

    • Anurag Ghosh, Yash Patel, Mohak Sukhwani and C.V. Jawahar - Dynamic Narratives for Heritage Tour 3rd Workshop on Computer Vision for Art Analysis (VisART), European Conference on Computer Vision (ECCV), 2016 [PDF]


    Downloads

    thesis

    Efficient Annotation and Knowledge Distillation for Semantic Segmentation

     


    Tejaswi Kasarla

    Abstract

    Scene understanding is a very crucial task in computer vision. With the recent advances in research of autonomous driving, road scene segmentation has become a very important problem to solve. Few exclusive datasets were also proposed to address this research problem. Advances in other methods of deep learning gave rise to deeper and computationally heavy models and were subsequently adapted to solve the problem of semantic segmentation. In general, larger datasets and heavier models are a trademark of segmentation problems. This thesis is a direction to propose methods to reduce annotation effort of datasets and computational efforts of models in semantic segmentation. In the first part of the thesis, we introduce the general sub-domain of semantic segmentation and explain the challenges of dataset annotation effort. We also explain a sub-field of machine learning called active learning which relies on a smart selection of data to learn from, thereby performing better with lesser training. Since obtaining labeled data for semantic segmentation is extremely expensive, active learning can play a huge role in reducing the effort of obtaining that labeled data. We also discuss the computational complexity of the training methods and give an introduction to a method called knowledge distillation which relies on transferring the knowledge from complex objective functions of deeper and heavier models to improve the performance of a simpler model. Thus, we will be able to deploy better performing simpler models on computationally restrained systems such as self-driving cars or mobile phones. The second part is a detailed work on active learning in which we propose a Region-Based Active Learning method to efficiently obtain labelled annotations for semantic segmentation. The advantages of this method include: (a) using lesser labeled data to train a semantic segmentation model (b) proposing efficient models to obtaining the labeled data thereby reducing the annotation effort. (c) transferring model trained on one dataset to another unlabeled dataset with minimal annotation cost which significantly improves the accuracy. Studies on Cityscapes and Mapillary datasets show that this method is effective to reduce the dataset annotation cost. The third part is the work on knowledge distillation and we propose a method to improve the performance of faster segmentation models, which usually suffer from low accuracies due to model compression. This method is based on distilling the knowledge during the training phase from models with high performance to models with low performance through a teacher-student type of architecture. The knowledge distillation allows the student to mimic the performance of the teacher network, thereby improving its performance during inference. We quantitatively and qualitatively validate the results on three different networks for Cityscapes dataset. Overall, in this thesis, we propose methods to reduce the bottlenecks of semantic segmentation tasks, specifically related to road scene understanding: the high cost of obtaining labeled data and the high computational requirement for better performance. We show that smart annotation of partial regions in data can reduce the effort of obtaining labels. We also show that extra supervision from a better performing model can improve the accuracies for computationally faster models which are useful for real-time deployment.

     

    Year of completion:  June 2019
     Advisor : Dr. Vineeth N. Balasubramanian,C V Jawahar

    Related Publications

    • Tejaswi Kasarla, G. Nagendar, Guruprasad M Hegde,Vineeth N. Balasubramanian and C V Jawahar - Region-based Active Learning for Efficient Labeling in Semantic Segmentation IEEE Winter Conference on Applications of Computer Vision (WACV 2019)Hilton Waikoloa Village, Hawaii


    Downloads

    thesis

    Handwritten Word Recognition for Indic & Latin scripts using Deep CNN-RNN Hybrid Networks

     


    Kartik Dutta

    Abstract

    Handwriting recognition (HWR) in Indic scripts is a challenging problem due to the inherent subtleties in the scripts, cursive nature of the handwriting and similar shape of the characters. Though a lot of research has been done in the field of text recognition, the focus of the vision community has been primarily on English. Furthermore, a lack of publicly available handwriting datasets in Indic scripts has also affected the development of handwritten word recognizers and made direct comparisons across different methods an impossible task in the field. Also, due to this lack annotated data, it becomes challenging to train deep neural networks which contain millions of parameters. These facts are quite surprising considering the fact that there are over 400 million Devanagari speakers in India alone. We first tackle the problem of lack of annotated data using various approaches. We describe a framework for annotating large scale of handwritten word images without the need for manual segmentation and label re-alignment. Two new word level handwritten datasets for Telugu and Devanagari are released which were created using the above mentioned framework. We synthesize synthetic datasets containing millions of realistic images with a large vocabulary for the purpose of pre-training using publicly available Unicode fonts. Later on, pre-training using data from Latin script is also shown to be useful to overcome the shortage of data.Capitalizing on the success of the CNN-RNN Hybrid architecture, we propose various improvements in the architecture and it’s training pipeline to make it even more robust for the purposes of handwriting recognition. We now change the network to use a Resnet-18 like structure for the convolutional part along with adding a spatial transformer network layer. We also use an extensive data augmentation scheme involving multi-scale, elastic, affine and test time distortion. We outperform the previous state-of-the-art methods on existing benchmark datasets for both Latin and Indic scripts by quite some margin. We perform an ablation study to empirically show how the various changes we made to the original CNN-RNN Hybrid network have improved its performance with respect to handwriting recognition. We dive deeper into the working of our networks convolutional layers and verify the robustness of convolutional-features through layer visualizations. We hope the release of the two datasets mentioned in this work along with the architecture and training techniques that we have used instill interest among fellow researchers of the field.

     

    Year of completion:  Mar 2019
     Advisor : C V Jawahar

    Related Publications

    • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Improving CNN-RNN Hybrid Networks for Handwriting Recognition The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

    • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Towards Spotting and Recognition of Handwritten Words in Indic Scripts The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

    • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Localizing and Recognizing Text in Lecture Videos The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

    • Praveen Krishnan, Kartik Dutta and C. V. Jawahar - Word Spotting and Recognition using Deep Embedding, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]

    • Kartik Dutta,Praveen Krishnan, Minesh Mathew and C. V. Jawahar - Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]

    • Kartik Dutta, Praveen Krishnan, Minesh Mathew, and C. V. Jawahar -  Towards Accurate Handwritten Word Recognition for Hindi and Bangla National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]


    Downloads

    thesis

    More Articles …

    1. Graph-Spectral Techniques For Analyzing Resting State Functional Neuroimaging Data
    2. Exploration of multiple imaging modalities for Glaucoma detection
    3. Learning Deep and Compact Models for Gesture Recognition
    4. Gender Differences in Facial Emotion Perception for User Profiling via Implicit Behavioral Signals
    • Start
    • Prev
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • Next
    • End
    1. You are here:  
    2. Home
    3. Research
    4. MS Thesis
    5. Thesis Students
    Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.