View-graph Selection Framework for Structure from Motion


Abstract

View-graph is an essential input to large-scale structure from motion (SfM) pipelines. Accuracy and efficiency of large-scale SfM is crucially dependent on the input view-graph. Inconsistent or inaccurate edges can lead to inferior or wrong reconstruction. Most SfM methods remove `undesirable' images and pairs using several fixed heuristic criteria, and propose tailor-made solutions to achieve specific reconstruction objectives such as efficiency, accuracy, or disambiguation. In contrast to these disparate solutions, we propose an optimization based formulation that can be used to achieve these different reconstruction objectives with task-specific cost modeling that uses and construct a very efficient network flow based formulation for its approximate solution. The abstraction brought on by this selection mechanism separates the challenges specific to datasets and reconstruction objectives from the standard SfM pipeline and improves its generalization. This paper mainly focuses on application of this framework with standard SfM pipeline for accurate and ghost-free reconstructions of highly ambiguous datasets. To model selection costs for this task, we introduce new disambiguation priors based on local geometry. We further demonstrate versatility of the method by using it for the general objective of accurate and efficient reconstruction of large-scale Internet datasets using costs based on well-known SfM priors.

 

IMG

Related Publications

  • Rajvi Shah, Visesh Chari, and P J Narayanan View-graph Selection Framework for SfM, 15th European Conference on Computer Vision, Munich, Germany 2018 [pdf]

Code:

Will be made available soon

Bibtex

 @InProceedings{Shah_2018_ECCV,
author = {Shah, Rajvi and Chari, Visesh and J Narayanan, P},
title = {View-graph Selection Framework for SfM},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

City-scale Road Audit using Deep Learning


Abstract

Road networks in cities are massive and is a critical component of mobility. Fast response to defects, that can occur not only due to regular wear and tear but also because of extreme events like storms, is essential. Hence there is a need for an automated system that is quick, scalable and cost- effective for gathering information about defects. We propose a system for city-scale road audit, using some of the most recent developments in deep learning and semantic segmentation. For building and benchmarking the system, we curated a dataset which has annotations required for road defects. However, many of the labels required for road audit have high ambiguity which we overcome by proposing a label hierarchy. We also propose a multi-step deep learning model that segments the road, subdivide the road further into defects, tags the frame for each defect and finally localizes the defects on a map gathered using GPS. We analyze and evaluate the models on image tagging as well as segmentation at different levels of the label hierarchy.

 


Related Publicationstract

  • Sudhir Yarram, Girish Varma and C.V. Jawahar City-Scale Road Audit System using Deep Learning,International Conference on Intelligent Robots and Systems Madrid, Spain, 2018 [pdf]

Code

City-scale Road Audit using Deep Learning code available here [Code]


Dataset

City-scale Road Audit using Deep Learning dataset available here [Dataset]

 

Word level Handwritten datasets for Indic scripts


Abstract

Handwriting recognition (HWR) in Indic scripts is a challenging problem due to the inherent subtleties in the scripts, cursive nature of the handwriting and similar shape of the characters. Lack of publicly available handwriting datasets in Indic scripts has affected the development of handwritten word recognizers. In order to help resolve this problem, we release 2 handwritten word datasets: IIIT-HW-Dev, a Devanagari dataset and IIIT-HW-Telugu, a Telugu dataset.

 

Hindi Telugu
Sample word images from IIIT-HW-Dev | Sample word images from IIIT-HW-Telugu

Major Contributions

  • A Devanagari dataset comprising of over 95K handwritten words.
  • A Telugu dataset comprising comprising of over 120K handwritten words.
  • The dataset is benchmarked using existing state-of-the art methods for HWR Recognition

Related Publications

  • Kartik Dutta, Praveen Krishnan, Minesh Mathew and CV Jawahar, Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset, International Workshop on Document Analysis Systems (DAS) 2018 .
  • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V Jawahar, Towards Spotting and Recognition of Handwritten Words in Indic Scripts , International Conference on Frontiers of Handwriting Recognition ( ICFHR) 2018

Dataset:

The download link is available on request.
Please write to This email address is being protected from spambots. You need JavaScript enabled to view it.

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{IIITHWDev2018,
title={ Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset },
author={Dutta, Kartik and Krishnan, Praveen and Mathew, Minesh and Jawahar, C.~V.},
booktitle={DAS},
year={2018} }

@inproceedings{IIITHWTelugu2018,
    title={ Towards Spotting and Recognition of Handwritten Words in Indic Scripts },
    author={Dutta, Kartik and Krishnan, Praveen and Mathew, Minesh and Jawahar, C.~V.},
    booktitle={ICFHR},
    year={2018}
} 

Learning Human Poses from Actions


Abstract

We consider the task of learning to estimate human pose in still images. In order to avoid the high cost of full supervision, we propose to use a diverse data set, which consists of two types of annotations: (i) a small number of images are labeled using the expensive ground-truth pose; and (ii) other images are labeled using the inexpensive action label. As action information helps narrow down the pose of a human, we argue that this approach can help reduce the cost of training without significantly affecting the accuracy. To demonstrate this we design a probabilistic framework that employs two distributions: (i) a conditional distribution to model the uncertainty over the human pose given the image and the action; and (ii) a prediction distribution, which provides the pose of an image without using any action information. We jointly estimate the parameters of the two aforementioned distributions by minimizing their dissimilarity coefficient, as measured by a task-specific loss function. During both training and testing, we only require an efficient sampling strategy for both the aforementioned distributions. This allows us to use deep probabilistic networks that are capable of providing accurate pose estimates for previously unseen images. Using the MPII data set, we show that our approach outperforms baseline methods that either do not use the diverse annotations or rely on pointwise estimates of the pose.

 

IMG-pose



Bibtex

If you use this work or dataset, please cite :

 @inproceedings{posesfromactions2018,
    title={ Learning Human Poses from Actions},
    author={Arun, Aditya and Jawahar, C.~V and Kumar, M. Pawan},
    booktitle={BMVC},
    year={2018}
}

LectureVideoDB - A dataset for text detection and Recognition in Lecture Videos


Abstract

Lecture videos are rich with textual information and to be able to understand the text is quite useful for larger video understanding/analysis applications. Though text recognition from images have been an active research area in computer vision, text in lecture videos has mostly been overlooked. In this work, we investigate the efficacy of state-of-the art handwritten and scene text recognition methods on text in lecture videos. To this end, a new dataset - LectureVideoDB compiled from frames from multiple lecture videos is introduced. Our experiments show that the existing methods do not fare well on the new dataset. The results necessitate the need to improvise the existing methods for robust performance on lecture videos.

 

IMG

Major Contributions

  • A new dataset comprising of over 5000 frames from lecture videos annotated for text detection and recognition.
  • The dataset is benchmarked using existing state-of-the art methods for scene text detection and scene text/ HWR Recognition.

Related Publications

  • Kartik Dutta, Minesh Mathew, Praveen Krishnan and CV Jawahar, Localizing and Recognizing Text in Lecture Videos, International Conference on Frontiers of Handwriting Recognition ( ICFHR) 2018.

Dataset:

The download link is available on request.
Please write to This email address is being protected from spambots. You need JavaScript enabled to view it.

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{lectureVideoDB2018,
    title={Localizing and Recognizing Text in Lecture Videos},
    author={Dutta, Kartik and Mathew, Minesh and Krishnan, Praveen and Jawahar, C.~V.},
    booktitle={ICFHR},
    year={2018}
}