Behaviour Detection of Vehicles Using Graph Convolutional Networks with Attention

Sravan Mylavarapu

Abstract

Autonomous driving has evolved with the advent of deep learning and increased computational capacity in the past two decades. While many tasks have evolved that help autonomous navigation, understanding on-road vehicle behaviour from a continuous sequence of sensor data (camera or radar) is an important and useful task. Determining state of other vehicles and their motion helps in decision making for the ego-vehicle (Self/Our Vehicle). Such systems can be a part of driver assistance systems or even path planners for vehicles to avoid obstacles. Instances of such decision making could be, deciding to apply brakes incase an external agent is moving into our lane or coming opposite to us. Another instance would be keeping distance from an aggressive driver who is changing lanes and overtaking frequently could be helpful and safer. Specifically, our proposed methods classify the behaviours of on-road vehicles into one of the following categories-{Moving Away, Moving Towards Us, Parked/stationary, Lane Change, Overtake}. Many current methods leverage 3D depth information using expensive equipment, such as Lidar and complex algorithms. In this thesis, we propose a simpler pipeline for understanding vehicle behaviour from a monocular (single camera) image sequence or video. This will help in easier installation of the equipment and helps in Driver Assistance systems. A simple video along with Camera parameters, scene semantics (Detected Objects in the images) are used to get information about the objects of interest (vehicles) and other static objects like lanes/poles in the scene. We consider detecting objects across frames as the pre-processing pipe-line and propose two main methods, 1.Spatio-temporal MRGCN (Chapter:3) and 2.Relational-Attentive GCN (Chapter:4) for behaviour detection of vehicles. We divide our process of identifying behaviours into learning positional information of vehicles first, and then observing them across time to make a prediction. The positional information is encoded by a Multi-Relational Graph Convolutional Network (MR-GCN) in both the methods. Temporal information is encoded by recurrent networks in Method 1, while it is formulated inside the graph in Method 2. We also showcase how Attention is an important aspect in both our methods through our experiments. The proposed frameworks can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse and include European, Chinese and Indian on-road scenes. The framework also provides for seamless transfer of models across datasets without entailing re-annotation, retraining and even fine-tuning. We provide comparative performance gain over baselines and detail a variety of ablations to showcase the efficacy of the framework.

Year of completion:	February 2021
Advisor :	Anoop M Namboodiri,Madhava Krishna