Learning Representations for Word Images

Praveen Krishnan

Abstract

Reading and writing documents is one among the primary skills with which we gather and communicate information. With the emergence of artificial intelligence (AI), researchers are in constant pursuit to build intelligent algorithms that can bring our physical and digital worlds close to each other. One such important domain is document image analysis, where we delve into the problem of understanding content from scanned document image collections. Considering “words” as the basic unit in understanding a document, in this thesis, we address the problem of finding the best possible representation for word images. Representation learning has been a key investigation for an AI problem. The primary goal of this thesis is to learn efficient representations for word images that encode its content. An ideal representation should be invariant to multiple fonts, handwritten styles and less sensitive to noise and distortions. In the past, representations have been handcrafted, specific to modalities (printed, handwritten), and sensitive to the complexities in handwriting in multi-writer scenarios. In this work, we choose the paradigm of learning from data using deep neural networks. We take our inspiration from the fact that given large amounts of annotated data, modern deep neural networks can inherently learn better representations. In this thesis, we also relax the need for large annotated datasets by heavily capitalizing on synthetically generated images. We also introduce a novel problem of learning semantic representation for word images which encodes the semantics of the word and reduces the vocabulary gap that exists between the query and the retrieved results. The first contribution of this thesis is a simple technique to generate large amounts of synthetic data, useful for pre-training deep neural networks. This led to the creation of IIIT-HWS dataset which is now widely used in the document community. The other major contributions of this thesis are: (a) the design of a deep convolutional architecture (named as HWNet) for learning an efficient holistic representation for word images, (b) a joint embedding scheme to project words and textual strings onto a common subspace, and (c) a novel form of word image representation which respects the word form along with its semantic meaning. The learned representations are evaluated under the tasks of word spotting and word recognition. We report state-of-the-art performance on popular datasets under both modern/historical and handwritten/printed document images while keeping the representation size compact in nature. Finally, in order to validate the proposed representations of this thesis, we present some interesting use cases such as (i) finding similarity between a pair of handwritten documents images, (ii) searching for keywords from online lecture videos, and (iii) building word retrieval system for Indic scripts.

Year of completion:	November 2020
Advisor :	C V Jawahar

Related Publications

Siddhant Bansal, Praveen Krishnan and C.V. Jawahar Improving Word Recognition using Multiple Hypotheses and Deep Embeddings The 25th International Conference of Pattern Recognition (ICPR) (ICPR 2021), Milano [PDF]
Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Improving CNN-RNN Hybrid Networks for Handwriting Recognition The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]
Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Towards Spotting and Recognition of Handwritten Words in Indic Scripts The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]
Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Localizing and Recognizing Text in Lecture Videos The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]
Vijay Rowtula, Praveen Krishnan, C.V. Jawahar - POS Tagging and Named Entity Recognition on Handwritten Documents, ICON, 2018[PDF]
Praveen Krishnan, Kartik Dutta and C. V. Jawahar - Word Spotting and Recognition using Deep Embedding, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]
Kartik Dutta,Praveen Krishnan, Minesh Mathew and C. V. Jawahar - Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]
Kartik Dutta, Praveen Krishnan, Minesh Mathew, and C. V. Jawahar - Towards Accurate Handwritten Word Recognition for Hindi and Bangla National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]
Praveen Krishnan and C.V Jawahar - Matching Handwritten Document Images, The 14th European Conference on Computer Vision (ECCV) – Amsterdam, The Netherlands, 2016. [PDF]
Praveen Krishnan, Kartik Dutta and C.V Jawahar - Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text, 15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]
Anshuman Majumdar, Praveen Krishnan and C.V. Jawahar - Visual Aesthetic Analysis for Handwritten Document Images,15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, China (ICFHR), 2016. [PDF]
Praveen Krishnan, Naveen Sankaran, Ajeet Kumar Singh and C. V. Jawahar - Towards a Robust OCR System for Indic Scripts Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 7-10 April 2014, Tours-Loire Valley, France. [PDF]
Praveen Krishnan and C V Jawahar - Bringing Semantics in Word Image Retrieval Proceedings of the 12th International Conference on Document Analysis and Recognition, 25-28 Aug. 2013, Washington DC, USA. [PDF]
Praveen Krishnan, Ravi Sekhar, C V Jawahar - Content Level Access to Digital Library of India Pages Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]