Computer Vision on Road Scenes: Benchmarking, and Open World Object Detection


Deepak Kumar Singh

Abstract

In autonomous driving, we have multiple computer vision tasks like object detection, semantic segmentation, and instance segmentation which plays a crucial role in perceiving the environment around the vehicle. Understanding the behaviour and performance of such tasks helps improve and address the key issues that are inherent in the system. There can be issues which are latent in the deep learning architecture and also in the datasets on which the deep learning models are trained and tested. In this thesis, we benchmark the performance of various popular deep learning models on road scene datasets for various computer vision tasks and also formulate open-world object detection on road scenes by addressing the inherent issues present in road scene datasets.

In the first part of the work, we aim to understand the performance and behaviour of various deep learning models on road scene datasets; Cityscapes, IDD, and BDD. Object detection, semantic segmentation, and instance segmentation form the bases for many computer vision tasks in autonomous driving. The complexity of these tasks increases as we shift from object detection to instance segmentation. The state-of-the-art models are evaluated on standard datasets such as PASCAL-VOC and MS-COCO, which does not consider the dynamics of road scenes. Driving datasets such as Cityscapes and Berkeley Deep Drive(BDD) are captured in a structured environment with better road markings and fewer variations in the appearance of objects and background. However, the same does not hold for Indian roads. The Indian Driving Dataset(IDD) dataset is captured in unstructured driving scenarios and is highly challenging for a model due to its diversity. This work presents a comprehensive evaluation of state-of-the-art models on object detection, semantic segmentation, and instance segmentation on road scene datasets. We present our analyses and compare their quantitative and qualitative performance on structured driving datasets(Cityscapes and BDD) and the unstructured driving dataset(IDD); understanding the behavior on these datasets helps in addressing various practical issues and helps in creating real-life applications.

 

Year of completion:  March 2025
 Advisor : Jawahar C V

Related Publications


    Downloads

    thesis