Scene Interpretation in Images and Videos.


Chetan Jakkoju (homepage)

Scene interpretation is a fundamental task in both computer vision and robotic systems. We deal with two important aspects of scene interpretation, they are scene reconstruction and scene recognition. Scene reconstruction is determining 3D positions of world points and retrieving camera poses from images. It has several applications such as virtual building editing in computer aided architecture, video augmentation in film industry and planning and navigation in mobile robotics. Among several approaches to modeling the scene, we deal with piecewise planar modeling due to several advantages: Man-made environments are often piece-wise-planar, planar modeling has compact representation and this can be easily modified. We propose a convex optimization based, approach for piecewise planar reconstruction. We show that the task of reconstructing a piece-wise planar environment can be set in an L8 based Homographic framework that iteratively computes scene plane and camera pose parameters. Instead of image points, the algorithm optimizes over inter-image homographies. The resultant objective function is minimized using Second Order Cone Programming algorithms. Apart from showing the convergence of the algorithm, we also empirically verify its robustness to error in initialization through various experiments on synthetic and real data. We intend this algorithm to be in between initialization approaches like decomposition methods and iterative non-linear minimization methods like Bundle Adjustment.

Scene recognition in robotics, specifically terrain scene recognition is one of the fundamental tasks of autonomous navigation. Navigable terrains are examples of planar scenes. The goal of terrain recognition is to recognize various terrains that occur in urban and rural environments in an automated fashion. It has applications in various domains such as advanced driver assistance systems, remote sensing, etc. Various sensing modalities such as ladars, lasers, accelerometers, stereo cameras, omni-directional cameras or combination of them are used in literature. This thesis attacks the problem of scene interpretation using a single camera. This investigation is especially crucial since cameras are relatively low in cost, consume low power, light weight and have the potential to provide very rich information about the environment. Recent advances in computer vision, machine learning and improvements in hardware capabilities have greatly increased the scope of monocular camera, even in unstructured and real world environments. In this thesis, we start with empirical study of promising color, texture and their combination with classifiers such as Support Vector Machines (SVM) and Random Forests. We present comparison across features and classifiers. Then we present a monocular camera based terrain recognition scheme called Partition based classifier. The uniqueness of the proposed scheme is that it inherently incorporates spatial smoothness while segmenting an image, without the requirement of any additional post-processing. The algorithm is fast because it is build on top of a Random Forest classifier. The efficacy of the proposed solution can be seen as we reach low error rates on both our dataset and other publicly available datasets.

Further partition classifier is extended to be online and adaptive. The new scheme consists of two underlying classifiers. One of which is learnt over bootstrapped or offline dataset, the second is another classifier that adapts to changes on the fly. Posterior probabilities of both the static and online classifiers are fused to assign the eventual label for the online image data. The online classifier learns at frequent intervals of time through a sparse and stable set of tracked patches, which makes it lightweight and real-time friendly. The learning which is acuted at frequent intervals during the sojourn significantly improves the performance of the classifier vis-a-vis a scheme that only uses the classifier learnt offline. The method finds immediate applications for outdoor autonomous driving where the classifier needs to be updated frequently based on what shows up recently on the terrain and without largely deviating from those learnt offline.

 

Year of completion: July 2012
Advisor : Dr. C. V. Jawahar and Dr. Madhava Krishna

Related Publications

  • Chetan J, Madhava Krishna and C. V. Jawahar - Fast and Spatially-smooth Terrain Classification using Monocular Camera Proceedings of 20th International Conference on Pattern Recognition (ICPR'10),23-26 Aug. 2010, Istanbul, Turkey. [PDF]

  • Chetan J., Madhava Krishna and C. V. Jawahar - An Adaptive Outdoor Terrain Classification Methodology using Monocular Camera In proceedings of International Conference on Intelligent Robots and Systems. (IROS 2010 )
  • Visesh Chari, Anil Nelakanti, Chetan Jakkoju and C. V. Jawahar - Piecewise Planar Reconstruction using Convex Optimization In proceedings of Asian Conference on Computer Vision (ACCV 2009).

Downloads

thesis

ppt