Towards Visual 3D Scene Understanding and Prediction for ADAS

 

Abstract:

Modern advanced driver assistance systems (ADAS) rely on a range of sensors including radar, ultrasound, cameras and LIDAR. Active sensors such as radar are primarily used for detecting traffic participants (TPs) and measuring their distance. More expensive LIDAR are used for estimating both traffic participants and scene elements (SEs). However, camera-based systems have the potential to achieve the same capabilities at a much lower cost, while allowing new ones such as determination of TP and SE types as well as their interactions in complex traffic scenes.

In this talk, we present several technological developments for ADAS. A common theme is to overcome challenges posed by lack of large-scale annotations in deep learning frameworks. We introduce approaches to correspondence estimation that are trained on purely synthetic data but adapt well to real data at test-time. Posing the problem in a metric learning framework with fully convolutional architectures allows estimation accuracies that surpass other state-of-art by large margins. We introduce object detectors that are light enough for ADAS, trained with knowledge distillation to retain accuracies of deeper architectures. Our semantic segmentation methods are trained on weak supervision that requires only a tenth of conventional annotation time. We propose methods for 3D reconstruction that use deep supervision to recover fine object part locations, but rely on purely synthetic  3D CAD models. Further, we develop generative adversarial frameworks for reconstruction that alleviate the need to even align the 3D CAD models with images at train time. Finally, we present a framework for TP behavior prediction in complex traffic scenes, that utilizes the above as inputs to predict future trajectories that fully account for TP-TP and TP-SE interactions. Our approach allows prediction of diverse uncertain outcomes and is trained with inverse optimal control to predict long-term strategic behaviors in complex scenes.

Bio:

Manmohan Chandraker is an assistant professor at the CSE department of University of California, San Diego and leads the computer vision research effort at NEC Labs America in Cupertino. He received a B.Tech. in Electrical Engineering at the Indian Institute of Technology, Bombay and a PhD in Computer Science at the University of California, San Diego. His principal research interests are sparse and dense 3D reconstruction, including structure-from-motion, 3D scene understanding and dense modeling under complex illumination or material behavior, with applications to autonomous driving, robotics or human-computer interfaces. His work on provably optimal algorithms for structure and motion estimation received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UC San Diego and was a nominee for the 2010 ACM Dissertation Award. His work on shape recovery from motion cues for complex material and illumination received the Best Paper Award at CVPR 2014.