Online Handwriting Recognition using Depth Sensors

 

image1


Abstract

In this work, we propose an online handwriting solution, where the data is captured with the help of depth sensors. User writes in air and our method recognizes it online in real time using the novel representation of features. Our method uses an efficient fingertip tracking approach and reduces the necessity of pen-up/pen-down switching. We validate our method on two depth sensors, Kinect and Leap Motion Controller and use state-of-the-art classifiers to recognize characters. On a sample dataset collected from 20 users, we report 97.59% accuracy for character recognition. We also demonstrate how this system can be extended for lexicon recognition with 100% accuracy. We have prepared a dataset containing 1,560 characters and 400 words with intention of providing common benchmark for air handwriting character recognition and allied research.


Datase

data

Character samples from Dataset

To evaluate the performance of our approach, we created a dataset, 'Dataset for AIR Handwriting' (DAIR). The dataset is created using 20 subjects, where each user stands straight in front of the sensor and writes in the air with one finger out. Users are allowed to write at their own speed and writing style. Dataset contains two sections DAIR I and DAIR II. DAIR I  consists of 1248 character samples from 16 users by taking 3 samples per character per user. DAIR II consists of words from a lexicon of length 40. Words in the lexicon are taken from the names of most populous cities and vary in length from 3 to 5. It contains 400 words which totals to 1490 characters.

Our datasets can be downloaded from the following links:

Please mail us at {sirnam.swetha@research, rajat.aggarwal@students}.iiit.ac.in for any queries.


Results

final result

Labels represent the predicted and expected characters/words in each pair. Each pair has the input trajectory and trajectory after normalization respectively. (a) samples correctly classified, (b) mis-classified samples, (c) correctly classified words.

 

image5

Accuracy comparison of characters using Kinect and Leap Motion Controller


Related Publications

Rajat Aggarwal, Sirnam Swetha, Anoop M. Namboodiri, Jayanthi Sivaswamy, C. V. Jawahar - Online Handwriting Recognition using Depth Sensors Proceedings of the 13th IAPR International Conference on Document Analysis and Recognition, 23-26 Aug 2015 Nancy, France. [PDF]


Associated People

 

 

The IIIT 5K-word dataset

[Project page]

About

The IIIT 5K-word dataset is harvested from Google image search. Query words like billboards, signboard, house numbers, house name plates, movie posters were used to collect images. The dataset contains 5000 cropped word images from Scene Texts and born-digital images. The dataset is divided into train and test parts. This dataset can be used for large lexicon cropped word recognition. We also provide a lexicon of more than 0.5 million dictionary words with this dataset.


Downloads

IIIT 5K-word (106 MB)
README
UPDATES


Publications

Anand Mishra, Karteek Alahari and C. V. Jawahar.
Scene Text Recognition using Higher Order Language Priors
BMVC 2012 [PDF]


Bibtex

If you use this dataset, please cite:
@InProceedings{MishraBMVC12,
  author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.",
  title     = "Scene Text Recognition using Higher Order Language Priors",
  booktitle = "BMVC",
  year      = "2012",
}

Contact

For any queries about the dataset feel free to contact Anand Mishra. Email:This email address is being protected from spambots. You need JavaScript enabled to view it.

The IIIT Scene Text Retrieval (STR) Dataset

[Project Page]

strfig


About

The IIIT STR dataset is harvested from Google and Flickr image search. Query words like coffee shop, motel, post office, high school, department were used to collect the images. Additionally, query words like sky, building were used in Flickr to collect some random distractors (images not containg text). The dataset contains 10,000 images in all. The images are manually annotated to say whether they contain a query word or not. Annotation for all the 50 query words used in our paper is available. Each query word appears 10-50 times in the dataset.


Downloads

IIIT STR (758 MB)
README


Publications

Anand Mishra, Karteek Alahari and C. V. Jawahar.
Image Retrieval using Textual Cues
ICCV 2013 [PDF]


Bibtex

If you use this dataset, please cite:

@InProceedings{MishraICCV13,
  author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.",
  title     = "Image Retrieval using Textual Cues",
  booktitle = "ICCV",
  year      = "2013",
}

Related datasets


Contact

For any queries about the dataset feel free to contact Anand Mishra. Email:This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Sports-10K and TV Series-1M Video Datasets

[Project page]


videostrfig


 

About

We introduce two large video datasets namely Sports-10K and TV series-1M to demonstrate scene text retrieval in the context of video sequences. The first one is from sports video clips, containing many advertisement signboards, and the second is from four popular TV series: Friends, Buffy, Mr. Bean, and Open All Hours. The TV series-1M contains more than 1 million frames. Words such as central, perk, pickles, news, SLW27R (a car number) frequently appear in the TV series-1M dataset. All the image frames extracted from this dataset are manually annotated with the query text they may contain. Annotations are done by a team of three people for about 150 man-hours. We use 10 and 20 query words to demonstrate the retrieval performance on the Sports-10K and the TV series-1M datasets respectively.


Downloads

Please mail us at This email address is being protected from spambots. You need JavaScript enabled to view it. for a copy of these two datasets (To be used for research purpose only).


Publications

Anand Mishra, Karteek Alahari and C. V. Jawahar.
Image Retrieval using Textual Cues
ICCV 2013 [PDF]


Bibtex

If you use this dataset, please cite:

@InProceedings{MishraICCV13,
  author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.",
  title     = "Image Retrieval using Textual Cues",
  booktitle = "ICCV",
  year      = "2013",
}

Related datasets


Contact

For any queries about the dataset feel free to contact Anand Mishra. Email:This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Image Retrieval using Textual Cues

Anand Mishra, Karteek Alahari and C V Jawahar

ICCV 2013


Abstract

We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.


Contributions

  • Query driven approach for scene text based image retrieval
  • Scene text indexing without explicit text localization and recognition
  • Category as well as instance retrieval
  • The largest datasets for the problem

Paper

ICCV 2013 Paper / Poster


Datasets

IIIT Scene Text Retrieval (IIIT STR)

Video Scene Text Retrieval Datasets (TV series-1M and Sports-10K)


Extended results

TBA 


BibTeX

@InProceedings{Mishra13,
  author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.",
  title     = "Image Retrieval using Textual Cues",
  booktitle = "Proceedings of IEEE International Conference on Computer Vision",
  year      = "2013"
}

Acknowledgements

Anand Mishra is partly supported by MSR India PhD Fellowship 2012.


Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.