Efficient Annotation and Knowledge Distillation for Semantic Segmentation


Tejaswi Kasarla


Scene understanding is a very crucial task in computer vision. With the recent advances in research of autonomous driving, road scene segmentation has become a very important problem to solve. Few exclusive datasets were also proposed to address this research problem. Advances in other methods of deep learning gave rise to deeper and computationally heavy models and were subsequently adapted to solve the problem of semantic segmentation. In general, larger datasets and heavier models are a trademark of segmentation problems. This thesis is a direction to propose methods to reduce annotation effort of datasets and computational efforts of models in semantic segmentation. In the first part of the thesis, we introduce the general sub-domain of semantic segmentation and explain the challenges of dataset annotation effort. We also explain a sub-field of machine learning called active learning which relies on a smart selection of data to learn from, thereby performing better with lesser training. Since obtaining labeled data for semantic segmentation is extremely expensive, active learning can play a huge role in reducing the effort of obtaining that labeled data. We also discuss the computational complexity of the training methods and give an introduction to a method called knowledge distillation which relies on transferring the knowledge from complex objective functions of deeper and heavier models to improve the performance of a simpler model. Thus, we will be able to deploy better performing simpler models on computationally restrained systems such as self-driving cars or mobile phones. The second part is a detailed work on active learning in which we propose a Region-Based Active Learning method to efficiently obtain labelled annotations for semantic segmentation. The advantages of this method include: (a) using lesser labeled data to train a semantic segmentation model (b) proposing efficient models to obtaining the labeled data thereby reducing the annotation effort. (c) transferring model trained on one dataset to another unlabeled dataset with minimal annotation cost which significantly improves the accuracy. Studies on Cityscapes and Mapillary datasets show that this method is effective to reduce the dataset annotation cost. The third part is the work on knowledge distillation and we propose a method to improve the performance of faster segmentation models, which usually suffer from low accuracies due to model compression. This method is based on distilling the knowledge during the training phase from models with high performance to models with low performance through a teacher-student type of architecture. The knowledge distillation allows the student to mimic the performance of the teacher network, thereby improving its performance during inference. We quantitatively and qualitatively validate the results on three different networks for Cityscapes dataset. Overall, in this thesis, we propose methods to reduce the bottlenecks of semantic segmentation tasks, specifically related to road scene understanding: the high cost of obtaining labeled data and the high computational requirement for better performance. We show that smart annotation of partial regions in data can reduce the effort of obtaining labels. We also show that extra supervision from a better performing model can improve the accuracies for computationally faster models which are useful for real-time deployment.


Year of completion:  June 2019
 Advisor : Dr. Vineeth N. Balasubramanian,C V Jawahar

Related Publications