Computer Vision based Large Scale Urban Mobility Audit andParametric Road Scene Parsing

Durga Nagendra Raghava Kumar Modhugu

Abstract

The footprint of partial or fully autonomous vehicles is increasing gradually with time. The existenceand availability of the necessary modern infrastructure are crucial for the widespread use of autonomousnavigation. One of the most critical efforts in this direction is to build and maintain HD maps efficientlyand accurately. The information in HD maps is organized in various levels 1) Geometric layer, 2)Semantic layer and 3) Map prior’s layer. The conventional approaches to capturing and extractinginformation at different HD map levels rely heavily on huge sensor networks and manual annotation.This is not scalable to create HD maps for massive road networks. We propose two novel solutionsto address the mentioned problems in this work. The first solution deals with the generation of thegeometric layer with parametric information of the road scene and other one to update information onroad infrastructure and traffic violations in the semantic layer.Firstly, the creation of the geometric layer of the HD map requires understanding the road layout interms of structure, number of lanes, lane width, curvature, etc. Prediction of these attributes as part ofa generalizable parametric model with which road layout can be rendered would suite the creation of ageometric layer. Many previous works that tried to solve this problem rely only on ground imagery andare limited by the narrow field of view of the camera, occlusions, and perspective shortening. This workdemonstrates the effectiveness of using aerial imagery as an additional modality to overcome the abovechallenges. We propose a novel architecture, Unified, that combines aerial and ground imagery featuresto infer scene attributes. We quantitatively evaluate on the KITTI dataset and show that our Unifiedmodel outperforms prior works. Since this dataset is limited to road scenes close to the vehicle, we sup-plement the publicly available Argoverse dataset with scene attribute annotations and evaluate far-awayscenes. We quantitatively and qualitatively show the importance of aerial imagery in understanding roadscenes, especially in regions farther away from the ego-vehicle.Finally, we also propose a simple mobile imaging setup to address and audit several common prob-lems in urban mobility and road safety, which can enrich the information in a semantic layer of HDmaps. Recent computer vision techniques are used to identify street irregularities (including missinglane markings and potholes), absence of street lights, and defective traffic signs using videos obtainedfrom a moving camera-mounted vehicle. Beyond the inspection of static road infrastructure, we alsodemonstrate the applicability of mobile imaging solutions to spot traffic violations. We validate ourproposal on the long stretches of unconstrained road scenes covering over 2000Km and discuss practi-cal challenges in applying computer vision techniques at such a scale. Exhaustive evaluation is carried viiout on 257 long-stretches with unconstrained settings and 20 conditions-based hierarchical frame-levellabels for different timings, weather conditions, road type, traffic density, and state of road damage. Forthe first time, we demonstrate that large-scale analytics of irregular road infrastructure is feasible withexisting computer vision techniques.

Year of completion:	December 2022
Advisor :	C V Jawahar