Towards Boosting the Accuracy of Non-Latin Scene Text Recognition


Sanjana Gunna, Rohit Saluja, and C.V. Jawahar

[Video]   [Paper]   [ Code]

images

Clockwise from top-left; “Top: Annotated Scene-text images, Bottom: Baselines’ predictions (row-1) and Transfer Learning models’ predictions (row-2)”, from Gujarati, Hindi, Bangla, Tamil, Telugu and Malayalam. Green, red, and “ ” represent correct predictions, errors, and missing characters, respectively. (Color figure online)

Abstract

Scene-text recognition is remarkably better in Latin languages than the non-Latin languages due to several factors like multiple fonts, simplistic vocabulary statistics, updated data generation tools, and writing systems. This work examines the possible reasons for low accuracy by comparing English datasets with non-Latin languages. Several controlled experiments are performed on English, by varying the number of (i) fonts to create the synthetic data and (ii) created word images. We discover that these factors are critical for the scene-text recognition systems. We share 55 additional fonts in Arabic, and 97 new fonts in Devanagari, which we found using a region-wise online search. These fonts were not used in the previous scene text recognition works. The fonts details can be found in the github repo. We apply our learnings to improve the state-of-the-art results of two nonLatin languages, Arabic, and Devanagari and achieve WRR gains for IIIT-ILST and MLT datasets.


Paper

  • Paper
    Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

    Sanjana Gunna, Rohit Saluja, C.V. Jawahar
    Towards Boosting the Accuracy of Non-Latin Scene Text Recognition, International Workshop on Arabic and derived Script Analysis and Recognition (ASAR 2021), 2021.
    [PDF] |

    @inproceedings{gunnaNonLatin2021,
    title={Towards {B}oosting the {A}ccuracy of {N}on-{L}atin {S}cene {T}ext {R}ecognition,
    author={Sanjana Gunna and Rohit Saluja and C V Jawahar},
    booktitle={2021 International Conference on Document Analysis and Recognition Workshops (ICDARW)},
    year={2021}
    }

Contact

  1. Sanjana Gunna - This email address is being protected from spambots. You need JavaScript enabled to view it.
  2. Rohit Saluja - This email address is being protected from spambots. You need JavaScript enabled to view it.