Year wise list: 2017 | 2016 | 2015 | 2014 | 2013 |


Towards Visual 3D Scene Understanding and Prediction for ADAS

 

Manmohan Chandrakar

Manmohan Chandrakar

NEC Labs America

Date : 23/12/2016

 

Abstract:

Modern advanced driver assistance systems (ADAS) rely on a range of sensors including radar, ultrasound, cameras and LIDAR. Active sensors such as radar are primarily used for detecting traffic participants (TPs) and measuring their distance. More expensive LIDAR are used for estimating both traffic participants and scene elements (SEs). However, camera-based systems have the potential to achieve the same capabilities at a much lower cost, while allowing new ones such as determination of TP and SE types as well as their interactions in complex traffic scenes.

In this talk, we present several technological developments for ADAS. A common theme is to overcome challenges posed by lack of large-scale annotations in deep learning frameworks. We introduce approaches to correspondence estimation that are trained on purely synthetic data but adapt well to real data at test-time. Posing the problem in a metric learning framework with fully convolutional architectures allows estimation accuracies that surpass other state-of-art by large margins. We introduce object detectors that are light enough for ADAS, trained with knowledge distillation to retain accuracies of deeper architectures. Our semantic segmentation methods are trained on weak supervision that requires only a tenth of conventional annotation time. We propose methods for 3D reconstruction that use deep supervision to recover fine object part locations, but rely on purely synthetic  3D CAD models. Further, we develop generative adversarial frameworks for reconstruction that alleviate the need to even align the 3D CAD models with images at train time. Finally, we present a framework for TP behavior prediction in complex traffic scenes, that utilizes the above as inputs to predict future trajectories that fully account for TP-TP and TP-SE interactions. Our approach allows prediction of diverse uncertain outcomes and is trained with inverse optimal control to predict long-term strategic behaviors in complex scenes.

Bio:

Manmohan Chandraker is an assistant professor at the CSE department of University of California, San Diego and leads the computer vision research effort at NEC Labs America in Cupertino. He received a B.Tech. in Electrical Engineering at the Indian Institute of Technology, Bombay and a PhD in Computer Science at the University of California, San Diego. His principal research interests are sparse and dense 3D reconstruction, including structure-from-motion, 3D scene understanding and dense modeling under complex illumination or material behavior, with applications to autonomous driving, robotics or human-computer interfaces. His work on provably optimal algorithms for structure and motion estimation received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UC San Diego and was a nominee for the 2010 ACM Dissertation Award. His work on shape recovery from motion cues for complex material and illumination received the Best Paper Award at CVPR 2014.


Learning without exhaustive supervision

 

IshanMisra

Ishan Misra

Carnegie Mellon University

Date : 15/12/2016

 

Abstract:

Recent progress in visual recognition can be attributed to large datasets and high capacity learning models. Sadly, these data hungry models tend to be supervision hungry as well. In my research, I focus on algorithms that learn from large amounts of data without exhaustive supervision. The key approach to make algorithms "supervision efficient" is to exploit the structure or prior properties available in the data or labels, and model it in the learning algorithm. In this talk, I will focus on three axes to get around having exhaustive annotations: 1) finding structure and natural supervision in data to reduce the need for manual labels: unsupervised and semi-supervised learning from video [ECCV'16, CVPR'15]; 2) sharing information across tasks so that tasks which are easier or "free" to label can help other tasks [CVPR'16]; and 3) finding structure in the labels and the labeling process so that one can utilize labels in the wild [CVPR'16].

Bio:

Ishan Misra is a PhD student at Carnegie Mellon University, working with Martial Hebert and Abhinav Gupta. His research interests are in Computer Vision and Machine Learning, particularly in visual recognition. Ishan got his BTech in Computer Science from IIIT-Hyderabad where he worked with PJ Narayanan. He got the Siebel fellowship in 2014, and has spent two summers as an intern at Microsoft Research, Redmond.


Understanding Stories by Joint Analysis of Language and Vision

 

MakarandTapaswi

Makarand Tapaswi

KIT Germany

Date : 14/09/2016

 

Abstract:

Humans spend a large amount of time listening, watching, and reading stories. We argue that the ability to model, analyze, and create new stories is a stepping stone towards strong AI. We thus work on teaching AI to understand stories in films and TV series. To obtain a holistic view of the story, we align videos with novel sources of text such as plot synopses and books. Plots contain a summary of the core story and allow to obtain a high-level overview. On the contrary, books provide rich details about characters, scenes and interactions allowing to ground visual information in corresponding textual descriptions. We also work on testing machine understanding of stories by asking it to answer questions. To this end, we create a large benchmark dataset of almost 15,000 questions from 400 movies and explore its characteristics with several baselines.

Bio:

MakarandTapaswireceived his undergraduate education from NITK Surathkal in Electronics and Communications Engineering. Thereafter he pursued an Erasmus Mundus Masters program in Information and Communication Technologies from UPC Barcelona and KIT Germany. He continued with the Computer Vision lab at Karlsruhe Institute of Technology in Germany and recently completed his PhD. He will be going to University of Toronto as a post-doctoral fellow starting in October.


 

Computer Vision @ Facebook

ManoharPaluri

Manohar Paluri

Facebook AI Research

Date : 26/07/2016

Abstract:

Over the past 5 years the community has made significant strides in the field of Computer Vision. Thanks to large scale datasets, specialized computing in form of GPUs and many breakthroughs in modeling better convnet architectures Computer Vision systems in the wild at scale are becoming a reality. At Facebook AI Research we want to embark on the journey of making breakthroughs in the field of AI and using them for the benefit of connecting people and helping remove barriers for communication. In that regard Computer Vision plays a significant role as the media content coming to Facebook is ever increasing and building models that understand this content is crucial in achieving our mission of connecting everyone. In this talk I will gloss over how we think about problems related to Computer Vision at Facebook and touch various aspects related to supervised, semi-supervised, unsupervised learning. I will jump between various research efforts involving representation learning and predictive learning as well. I will highlight some large scale applications and talk about limitations of current systems through out the talk to motivate the reason to tackle these problems.

Bio:

Manohar Paluri is currently a Research Lead at Facebook AI research and manages the Computer Vision team in the Applied Research organization. He is passionate about Computer Vision and in the longer term goal of building systems that can perceive the way humans do. In the process of working on various perception efforts through out his career he spent considerable time looking at Computer Vision problems in Industry and Academia. He worked at renowned places like Google Research, IBM Watson Research Labs, Stanford Research Institute before helping co found Facebook AI Research directed by Dr. Yann Lecun. Manohar spent his formative years at IIIT Hyderabad where he finished his undergraduate studies with Honors in Computer Vision and joined Georgia Tech. to pursue his Ph.D. For over a decade he has been working on various problems related to Computer Vision and in general Perception and has made various contributions through his publications at CVPR, NIPS, ICCV, ECCV, ICLR, KDD, IROS, ACCV etc. He is passionate about building real world systems that are used by billions of people and bring positive value at scale. Some of these systems are running at Facebook and already have tremendous impact in how people communicate using Facebook.

 

Object detection methods for Common Objects in Context

SoumithChintala

Soumith Chintala

Facebook AI Research

Date : 19/01/2016

 

Abstract:

In this talk, we discuss this year's COCO object detection challenge, Facebook's entry into the challenge that placed 2nd in the competition using deep convnets and specially designed cost functions. We shall also discuss the DeepMask segmentation system that forms the core of our detection pipeline.  http://mscoco.org/dataset/#detections-leaderboard

Bio:

Soumith Chintala is a well known face in the deep learning community. He works with Facebook AI Research (FAIR). Prior, to that he has worked with MuseAmi and Prof. Yann LeCun's group in NYU. He is well known as lead maintainer of Torch7 machine learning library an other open source projects like convnet-benchmarks, eblearn.

 

Bayesian Nonparameric Modeling of Temporal Coherence for Entity-driven Video Analytics

Aravindh

Adway Mitra

PhD, IISc Bangalore

Date : 19/01/2016

 

Abstract:

Due to the advent of video-­sharing sites like Youtube, online user­generated video content is increasing very rapidly. To simplify search of meaningful information from such huge volume of content, Computer Vision researchers have started to work on problems like Video Summarization and Scene Discovery from videos. People understand videos based on high­-level semantic concepts. But most of the current research in video analytics makes use of low­level features and descriptors, which may not have semantic interpretation. We have aimed to fill in this gap, by modeling implicit structural information about videos, such as spatio­temporal properties. We have represented videos as a collection of semantic visual concepts which we call “entities”, such as persons in a movie.To aid these tasks, we have attempted to model the important property oftemporal coherence”, which means that adjacent frames are likely to have similar visual features, and contain the same set of entities. Bayesian nonparametrics is a natural way of modeling all these, but they have also given rise to the need for new models and algorithms.

A tracklet is a spatio­temporal fragment of a video­ a set of spatial regions in a short sequence of consecutive frames, each of which enclose a particular entity. We first attempt to find a representation of tracklets to aid tracking entities across videos using region descriptors like Covariance Matrices of spatial features making use of temporal coherence. Next, we move to modeling temporal coherence at a semantic level. Each tracklet is associated to an entity. Spatio­-temporally close but non­overlapping tracklets are likely to belong to the same entity, while tracklets that overlap in time can never belong to the same entity. The aim is to cluster the tracklets based on the entities associated with them, with the goal of discovering the entities in the video along with all their occurrences. We represented an entity with a mixture component, and proposed a temporally coherent version of Chinese Restaurant Process (TC­CRP) that can encode the constraints easily. TC­CRP shows excellent performance on person discovery from TV ­series videos. We also discuss semantic video summarization, based on entity discovery. Next, we considered entity­driven temporal segmentation of the video into scenes, where each scene is modeled as a sparse distribution over entities.. We proposed EntScene: a generative model for videos based on entities and scenes, and also an inference algorithm based on Dynamically Blocked Gibbs Sampling. We experimentally found significant improvements in terms of segmentation and scene discovery compared to alternative inference algorithms.

 

Recent Advances in the Field of Missing Data: Missingness Graphs, Recoverability and Testability

Aravindh

Karthika Mohan

UCLA

Date : 06/01/2016

 

Abstract:

The talk will discuss recent advances in the field of missing data, including: 1)  the graphical representation called "Missingness Graph" that portrays the causal mechanisms responsible for missingness, 2) the notion of recoverability, i.e. deciding whether there exists a consistent estimator for a given query, 3) graphical conditions (necessary and sufficient) for recovering joint and conditional distributions and algorithms for detecting these conditions in the missingness graph, 4) the question of testability i.e. whether an assumed model can be subjected to statistical tests, considering the
missingness in the data and 5) the indispensability of causal assumptions for large sets of missing data problems.

Brief Bio:

Karthika Mohan is a PhD Candidate in the department of Computer Science at UCLA majoring in Artificial Intelligence. She is advised by Prof Judea Pearl. Her research interests span areas in Artificial Intelligence and Statistics that intersect with Probabilistic Graphical Models, Causal Inference and Missing Data