CVIT Projects

Transfer Learning for Scene Text Recognition in Indian Languages

Sanjana Gunna, Rohit Saluja, and C.V. Jawahar

[Video] [Paper] [ Code]

Clockwise from top-left; “Top: Annotated Scene-text images, Bottom: Baselines’ predictions (row-1) and Transfer Learning models’ predictions (row-2)”, from Gujarati, Hindi, Bangla, Tamil, Telugu and Malayalam. Green, red, and “ ” represent correct predictions, errors, and missing characters, respectively. (Color figure online)

Abstract

Scene text recognition in low-resource Indian languages is challenging because of complexities like multiple scripts, fonts, text size, and orientations. In this work, we investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages. We show that the transfer of English models to simple synthetic datasets of Indian languages is not practical. Instead, we propose to apply transfer learning techniques among Indian languages and study the transfer learning among six Indian languages with varying complexities in fonts and word length statistics. We achieve gains in Word Recognition Rates (WRRs) on IIIT-ILST Hindi, Telugu, and Malayalam datasets in comparison to previous works. On the MLT-19 Hindi and Bangla datasets and our Gujarati and Tamil datasets, we observe similar increase in WRR respectively, over our baseline models
We additionally release a dataset of around 440 scene images containing 500 Gujarati and 2535 Tamil words.

Paper

Transfer Learning for Scene Text Recognition in Indian Languages

Sanjana Gunna, Rohit Saluja, C.V. Jawahar
Transfer Learning for Scene Text Recognition in Indian Languages, ICDAR 2021 WORKSHOP ON CAMERA-BASED DOCUMENT ANALYSIS AND RECOGNITION (CBDAR 2021, 9TH EDITION), 2021.
[PDF ] | [BibTeX]

@inproceedings{Gunna2021TransferLF,

title={Transfer Learning for Scene Text Recognition in Indian Languages},

author={Sanjana Gunna and Rohit Saluja and C. V. Jawahar},

booktitle={ICDAR Workshops},

year={2021}

}

Dataset

The dataset contains nearly 440 real scene images, 500 cropped scene images for Gujarati script and 2535 for Tamil script and are annotated for scene text bounding boxes and transcriptions.

Terms and Conditions: All images provided as part of the dataset have been collected from freely accessible internet sites. As such, they are copyrighted by their respective authors. The images are hosted in this site, physically located in India. Specifically, the images are provided with the sole purpose to be used for research in developing new models for scene text recognition You are not allowed to redistribute any images in this dataset. By downloading any of the files below you explicitly state that your final purpose is to perform research in the conditions mentioned above and you agree to comply with these terms and conditions.

Download

Dataset Link

Contact

Sanjana Gunna - This email address is being protected from spambots. You need JavaScript enabled to view it.
Rohit Saluja - This email address is being protected from spambots. You need JavaScript enabled to view it.

More Parameters? No Thanks!

Zeeshan Khan, Kartheek Akella , Vinay Namboodiri C.V. Jawahar

CVIT, IIIT-H , CVIT, IIIT-H Univ. of Bath, CVIT, IIIT-H

[Code] [Presentation Video] [ TED Dataset]

Illustration of the evolution of model parameters. (a) shows the multilingual parameters in grey. Through 60\% pruning and retraining, we arrive at (b), here white represents the free weights with value=0. The surviving weights in grey will be fixed for the rest of the method. Now, we train the free parameters on the first bilingual pair (L-1) and arrive at (c), which represents the initial parameters of L-1 in orange, and share weights with the previously trained multilingual parameters in grey. Again, with 50\% pruning and retraining on the current L-1 specific weights in orange, we get the final parameters for L-1 shown in (d) and extract the final mask for L-1 in (f). We repeat the same procedure for all the bilingual pairs and extract the masks for each pair.

Abstract

This work studies the long-standing problems of model capacity and negative interference in multilingual neural machine translation (MNMT). We use network pruning techniques and observe that pruning 50-70% of the parameters from a trained MNMT model results only in a 0.29-1.98 drop in the BLEU score. Suggesting that there exist large redundancies even in MNMT models. These observations motivate us to use the redundant parameters and counter the interference problem efficiently. We propose a novel adaptation strategy, where we iteratively prune and retrain the redundant parameters of an MNMT to improve bilingual representations while retaining the multilinguality. Negative interference severely affects high resource languages, and our method alleviates it without any additional adapter modules. Hence, we call it parameter-free adaptation strategy, paving way for the efficient adaptation of MNMT. We demonstrate the effectiveness of our method on a 9 language MNMT trained on TED talks, and report an average improvement of +1.36 BLEU on high resource pairs. Code will be released here

Paper

More Parameters? No Thanks!

Zeeshan Khan, Kartheek Akella, Vinay Namboodiri and C.V. Jawahar
More Parameters? No Thanks! , Findings of ACL, 2021.
[PDF | [BibTeX]

Official Presentation

Our presentation at Findings of ACL, 2021 is available on : youtube

Contact

Zeeshan Khan - This email address is being protected from spambots. You need JavaScript enabled to view it.
Kartheek Akella - This email address is being protected from spambots. You need JavaScript enabled to view it.

MediTables: A New Dataset and Deep Network for Multi-category Table Localization in Medical Documents

Abstract

We will update soon.

code

DocVisor: A Multi-purpose Web-based Interactive Visualizer for Document Image Analytics

Abstract

We will update soon.

code

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition

Abstract

We introduce SynSE, a novel syntactically guided generative approach for Zero-Shot Learning (ZSL). Our end-to-end approach learns progressively refined generative embedding spaces constrained within and across the involved modalities (visual, language). The inter-modal constraints are defined between action sequence embedding and embeddings of Parts of Speech (PoS) tagged words in the corresponding action description. We deploy SynSE for the task of skeleton-based action sequence recognition. It has been accepted for publication at the 2021 IEEE ICIP.

Architecture

Citation

 
    @misc{gupta2021syntactically,
        title={Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition},
        author={Pranay Gupta and Divyanshu Sharma and Ravi Kiran Sarvadevabhatla},
        year={2021},
        eprint={2101.11530},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

Transfer Learning for Scene Text Recognition in Indian Languages

Sanjana Gunna, Rohit Saluja, and C.V. Jawahar

[Video] [Paper] [ Code]

Abstract

Paper

Transfer Learning for Scene Text Recognition in Indian Languages

Dataset

Download

Contact

More Parameters? No Thanks!

Zeeshan Khan, Kartheek Akella , Vinay Namboodiri C.V. Jawahar

CVIT, IIIT-H , CVIT, IIIT-H Univ. of Bath, CVIT, IIIT-H

[Code] [Presentation Video] [ TED Dataset]

Abstract

Paper

More Parameters? No Thanks!

Official Presentation

Contact

MediTables: A New Dataset and Deep Network for Multi-category Table Localization in Medical Documents

Abstract

DocVisor: A Multi-purpose Web-based Interactive Visualizer for Document Image Analytics

Abstract

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition

Abstract

Architecture

Citation

More Articles …