On Designing Efficient Deep Neural Networks for Semantic Segmentation

Nikitha Vallurupalli


Semantic segmentation is an essential primitive in real-time systems such as autonomous navigation, which require processing at high frames per second. Hence for models to be practically applicable, it is essential that they have to be compact, fast as well as achieve high prediction accuracies. Previous research into semantic segmentation has focused on creating high-performance deep learning architectures. Most of the time, these best-performing models are complex, deep, have large processing times, and demand a significantly higher amount of processing capacity. Another relevant area of research is model compression, by which we can obtain lightweight models. Considering that there also have been works that produced mainstream light-weight semantic segmentation models at the expense of performance, we design models that bring a desirable balance between performance and latency. Specifically, methods and architectures that give a high performance while being real-time and working on resourceconstrained settings. We identify the redundancies in the existing state-of-the-art approaches and propose compact architecture family called ESSNet with accuracy comparable to the state-of-the-art while utilizing only a fraction of the space and computational power of those networks. We propose convolutional module designs with sparse coding theory as a premise and we also present two real-time encoder backbones employing our proposed modules. We empirically evaluate the efficacy of our proposed layers and compare them with existing approaches. Secondly, we explore the need for optimization during the training phase in the proposed models and present a novel training method called Gradual Grouping that results in models with improved implementation efficiency vs accuracy trade-offs. Additionally, we conduct extensive experiments by varying architecture hyper-parameters such as network depth, kernel sizes, dilation rates, split branching and additional context extraction modules. We also present a compact architecture using multi-branch separable convolutional layers with different dilation rates.

Year of completion:  November 2022
 Advisor : C V Jawahar,Girish Varma

Related Publications