Word level Handwritten datasets for Indic scripts


Abstract

Handwriting recognition (HWR) in Indic scripts is a challenging problem due to the inherent subtleties in the scripts, cursive nature of the handwriting and similar shape of the characters. Lack of publicly available handwriting datasets in Indic scripts has affected the development of handwritten word recognizers. In order to help resolve this problem, we release 2 handwritten word datasets: IIIT-HW-Dev, a Devanagari dataset and IIIT-HW-Telugu, a Telugu dataset.

 

Hindi Telugu
Sample word images from IIIT-HW-Dev | Sample word images from IIIT-HW-Telugu

Major Contributions

  • A Devanagari dataset comprising of over 95K handwritten words.
  • A Telugu dataset comprising comprising of over 120K handwritten words.
  • The dataset is benchmarked using existing state-of-the art methods for HWR Recognition

Related Publications

  • Kartik Dutta, Praveen Krishnan, Minesh Mathew and CV Jawahar, Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset, International Workshop on Document Analysis Systems (DAS) 2018 .
  • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V Jawahar, Towards Spotting and Recognition of Handwritten Words in Indic Scripts , International Conference on Frontiers of Handwriting Recognition ( ICFHR) 2018

Dataset:

The download link is available on request.
Please write to This email address is being protected from spambots. You need JavaScript enabled to view it.

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{IIITHWDev2018,
title={ Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset },
author={Dutta, Kartik and Krishnan, Praveen and Mathew, Minesh and Jawahar, C.~V.},
booktitle={DAS},
year={2018} }

@inproceedings{IIITHWTelugu2018,
    title={ Towards Spotting and Recognition of Handwritten Words in Indic Scripts },
    author={Dutta, Kartik and Krishnan, Praveen and Mathew, Minesh and Jawahar, C.~V.},
    booktitle={ICFHR},
    year={2018}
} 

LectureVideoDB - A dataset for text detection and Recognition in Lecture Videos


Abstract

Lecture videos are rich with textual information and to be able to understand the text is quite useful for larger video understanding/analysis applications. Though text recognition from images have been an active research area in computer vision, text in lecture videos has mostly been overlooked. In this work, we investigate the efficacy of state-of-the art handwritten and scene text recognition methods on text in lecture videos. To this end, a new dataset - LectureVideoDB compiled from frames from multiple lecture videos is introduced. Our experiments show that the existing methods do not fare well on the new dataset. The results necessitate the need to improvise the existing methods for robust performance on lecture videos.

 

IMG

Major Contributions

  • A new dataset comprising of over 5000 frames from lecture videos annotated for text detection and recognition.
  • The dataset is benchmarked using existing state-of-the art methods for scene text detection and scene text/ HWR Recognition.

Related Publications

  • Kartik Dutta, Minesh Mathew, Praveen Krishnan and CV Jawahar, Localizing and Recognizing Text in Lecture Videos, International Conference on Frontiers of Handwriting Recognition ( ICFHR) 2018.

Dataset:

The download link is available on request.
Please write to This email address is being protected from spambots. You need JavaScript enabled to view it.

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{lectureVideoDB2018,
    title={Localizing and Recognizing Text in Lecture Videos},
    author={Dutta, Kartik and Mathew, Minesh and Krishnan, Praveen and Jawahar, C.~V.},
    booktitle={ICFHR},
    year={2018}
}

Word Spotting in Silent Lip Videos


Abstract

Our goal is to spot words in silent speech videos without explicitly recognizing the spoken words, where the lip motion of the speaker is clearly visible and audio is absent. Existing work in this domain has mainly focused on recognizing a fixed set of words in word-segmented lip videos, which limits the applicability of the learned model due to limited vocabulary and high dependency on the model's recognition performance.
Our contribution is two-fold:
  1. we develop a pipeline for recognition-free retrieval, and show its performance against recognition-based retrieval on a large-scale dataset and another set of out-of-vocabulary words.
  2. We introduce a query expansion technique using pseudo-relevant feedback and propose a novel re-ranking method based on maximizing the correlation between spatio-temporal landmarks of the query and the top retrieval candidates. Our word spotting method achieves 35% higher mean average precision over recognition-based method on large-scale LRW dataset. Finally, we demonstrate the application of the method by word spotting in a popular speech video "The great dictator" by Charlie Chaplin) where we show that the word retrieval can be used to understand what was spoken perhaps in the silent movies.

 

IMG

Related Publications

  • Abhishek Jha, Vinay Namboodiri and C.V. Jawahar, Word Spotting in Silent Lip Videos, IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA, 2018. [ PDF ] [Supp] [ Code To be released Soon ] [ Models To be released soon ]

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{jha2018word,
    title={Word Spotting in Silent Lip Videos},
    author={Jha, Abhishek and Namboodiri, Vinay and Jawahar, C.~V.},
    booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA},
    pages={10},
    year={2018}

}
 

Dataset

Lip Reading in the Wild Dataset: [ External Link ]
Grid Corpus: [ External Link ]
Charlie Chaplin, The Great Dictator speech [ Youtube ]


Human Shape Capture and Tracking at Home


Abstract

Human body tracking typically requires specialized capture set-ups. Although pose tracking is available in consumer devices like Microsoft Kinect, it is restricted to stick figures visualizing body part detection. In this paper, we propose a method for full 3D human body shape and motion capture of arbitrary movements from the depth channel of a single Kinect, when the subject wears casual clothes. We do not use the RGB channel or an initialization procedure that requires the subject to move around in front of the camera. This makes our method applicable for arbitrary clothing textures and lighting environments, with minimal subject intervention. Our method consists of 3D surface feature detection and articulated motion tracking, which is regularized by a statistical human body model. We also propose the idea of a Consensus Mesh (CMesh) which is the 3D template of a person created from a single view point. We demonstrate tracking results on challenging poses and argue that using CMesh along with statistical body models can improve tracking accuracies. Quantitative evaluation of our dense body tracking shows that our method has very little drift which is improved by the usage of CMesh.

 

IMG

Major Contributions

  • Temporal tracking on monocular depth input for subjects wearing casual clothes, using a statistical human body model.
  • Creation of 3D template mesh (Consensus mesh) of a person using a single Kinect.
  • Simple capture setup of one Kinect, captured dataset and code to be released soon.

Related Publications

  • Gaurav Mishra, Saurabh Saini, Kiran Varanasi and P.J. Narayanan, Human Shape Capture and Tracking at Home, IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA, 2018.. [PDF] [Supp] [ WACV Presentation ]

Code and Dataset:

We will release code and captured dataset for this project soon.

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{Mishra2018,
    title={Human Shape Capture and Tracking at Home},
    author={Mishra, Gaurav and Saini, Saurabh and Varanasi, Kiran and Narayanan, P.~J.},
    booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA},
    pages={10},
    year={2018}
}

SmartTennisTV: Automatic indexing of tennis videos


Abstract

In this paper, we demonstrate a score based indexing approach for tennis videos. Given a broadcast tennis video (BTV), we index all the video segments with their scores to create a navigable and searchable match. Our approach temporally segments the rallies in the video and then recognizes the scores from each of the segments, before refining the scores using the knowledge of the tennis scoring system. We finally build an interface to effortlessly retrieve and view the relevant video segments by also automatically tagging the segmented rallies with human accessible tags such as ’fault’ and ’deuce’. The efficiency of our approach is demonstrated on BT’s from two major tennis tournaments.

 


Related Publications

  • Anurag Ghosh and C.V. Jawahar, SmartTennisTV: Automatic indexing of tennis videos, National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, IIT Mandi, HP, India, 2017 [PDF] [Slides] ( Best Paper Award Winner )

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{ghosh2017smart,
    title={SmartTennisTV: Automatic indexing of tennis videos},
    author={Ghosh, Anurag and Jawahar, C.~V.},
    booktitle={National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2017), IIT Mandi, HP, India, 2017},
    pages={10},
    year={2017}
}

Dataset

We will be releasing the dataset soon. The link will be provided here.