Scene Text Understanding

Motivation

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. Given the rapid growth of camera-based applications readily available on mobile phones, understanding scene text is more important than ever. One could, for instance, foresee an application to answer questions such as, “What does this sign say?”. This is related to the problem of Optical Character Recognition (OCR), which has a long history in the computer vision community. However, the success of OCR systems is largely restricted to text from scanned documents. Scene text exhibits a large variability in appearances, and can prove to be challenging even for the state-of-the-art OCR methods. Many scene understanding methods recognize objects and regions like roads, trees, sky in the image successfully, but tend to ignore the text on the sign board. Our goal is to fill this gap in understanding the scene.

Highlights

Binarization as a labelling problem (see our ICDAR'11 paper)
Both open and closed vocabulary word recognition (see our CVPR'12 paper and BMVC'12 paper)
Use of both top-down (lexicons) and bottom-up cues (character detection)
Holistic recognition approach of lexicon-driven scene text recognition (see our ICDAR'13 paper)
Applications in image retrieval (see our ICCV'13 paper)
Word recognition for large lexicions and cropped word image retrieval (see our ACCV'14 paper)

Code

Generating exemplars (based on our ICDAR'13 paper) README

Coming Soon:

Scene Text Binarization
Scene Character Recognition

Datasets

Lexicons (ACCV '14)

IIIT 5K-word

SVT-CHAR (9.9MB) README

IIIT Scene Text Retrieval (IIIT STR)

Video Scene Text Retrieval Datasets (Sports-10K and TV series-1M)

Related Publications

Anand Mishra, Karteek Alahari and C. V. Jawahar - Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues - Computer Vision and Image Understanding (CVIU 2016), volume 145, pages 30–42, 2016. [PDF]
Udit Roy, Anand Mishra, Karteek Alahari, C.V. Jawahar - Scene Text Recognition and Retrieval for Large Lexicons Proceedings of the 12th Asian Conference on Computer Vision,01-05 Nov 2014, Singapore. [PDF] [Abstract] [Poster] [Lexicons] [bibtex]
Anand Mishra, Karteek Alahari and C V Jawahar - Image Retrieval using Textual Cues Proceedings of International Conference on Computer Vision (ICCV), 1-8th Dec.2013, Sydney, Australia. [Pdf] [Abstract] [Project page][bibtex]
Vibhor Goel, Anand Mishra, Karteek Alahari, C V Jawahar - Whole is Greater than Sum of Parts: Recognizing Scene Text Words Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), 25-28 Aug. 2013, Washington DC, USA. [PDF] [Abstract] [bibtex]
Anand Mishra, Karteek Alahari and C V Jawahar - Scene Text Recognition using Higher Order Language Priors Proceedings of British Machine Vision Conference (BMVC), 3-7 Sep. 2012, Guildford, UK. [PDF] [Abstract] [Slides] [bibtex]
Anand Mishra, Karteek Alahari and C V Jawahar - Top-down and Bottom-up Cues for Scene Text Recognition Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 16-21 June 2012, pp. 2287-2294, Providence RI, USA. [PDF] [Abstract] [Poster] [bibtex]
Anand Mishra, Karteek Alahari and C.V. Jawahar - An MRF Model for Binarization of Natural Scene Text Proceedings of 11th International Conference on Document Analysis and Recognition (ICDAR 2011),18-21 September, 2011, Beijing, China. [PDF] [Abstract] [Slides] [bibtex]

People

Acknowledgements

Anand Mishra is partly supported by MSR India PhD Fellowship 2012.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.