Towards Enhancing Semantic Segmentation in Resource Constrained Settings

Ashutosh Mishra

Abstract

Understanding the semantics of the scene to automate the decision process for self-driving cars completely is becoming a crucial task to solve in computer vision. Due to the recent progress in the state of autonomous driving, added with a lot of semantic segmentation datasets for road scene understanding being proposed, semantic segmentation of road scenes has recently evolved to be an important problem to tackle. But training semantic segmentation models becomes a resource-intensive task since it requires multi-GPU training and therefore becomes the bottleneck to reproducing results for better understanding quickly. This thesis introduces challenges and provides solutions to reduce the training time of segmentation models by introducing two small-scale datasets. Additionally, the thesis explores the potential of employing neural architecture search and automatic pruning techniques to create efficient segmentation modules in resource-constrained settings. Chapter2 of the thesis introduces the problem of semantic segmentation and discusses some deep learning approaches to solve supervised semantic segmentation. We briefly discuss the different metrics used and also touch upon the statistics of various datasets that are available in the literature to train semantic segmentation models. Chapter 3 of the thesis explains the need of having a dataset based on the Indian road scenario. Most of the datasets in the literature are captured in Western settings having well-defined traffic participants, delineated boundaries, etc, which seldom mold in the Indian setting. We describe the annotation pipeline, along with the quality check framework used to annotate the dataset. Now, though the IDD dataset [121] caters to the Indian setting, this dataset is still quite resource intensive in terms of GPU computation. Hence, there is a need to have a small resolution, less label-sized dataset for rapid prototyping. We introduce our proposed datasets and provide a detailed set of experiments, and statistical comparisons with the existing datasets to substantiate our claim regarding the usefulness of the proposed solution. We also show through experiments that the models trained using our datasets can be deployed on low-resource hardware such as Raspberry Pi. At the end of this chapter, we also look into the significance of the proposed datasets in facilitating challenges at two prominent conferences: the International Conference on Computer Vision (ICCV) and the National Conference on Pattern Recognition, Image Processing, and Graphics (NCVPRIPG) in 2019. These challenges aimed to address semantic segmentation in resource-constrained settings, inviting innovative architectures capable of achieving decent accuracy on these proposed datasets. We also discuss the potential application of these datasets in teaching semantic segmentation through a course of notebooks introducing traditional as well as deep learning-based methods to perform segmentation. These notebooks are plug-and-play, where the first three notebooks can run on laptop CPU, while the fourth notebook requires GPU access.

Year of completion:	January 2024
Advisor :	C V Jawahar,Girish Varma