Retrieval from Large Document Image Collections


Extracting the relevant information out of a large number of documents is quite a challenging and tedious task.The quality of results generated by the traditionally available full-text search engine and text-based image retrieval systems is not very optimal. These Information Retrieval(IR) tasks become more challenging with the non-traditional language scripts for example in the case of Indic scripts. We have developed OCR (Optical Character Recognition) Search Engine which is an attempt to make an Information Retrieval and Extraction (IRE) system that replicates the current state-of-the-art methods using the IRE and basic Natural Language Processing (NLP) techniques. In this project we have tried to demonstrate the study of the methods that are being used for performing search and retrieval tasks. We also present the small descriptions of the functionalities supported in our system along with the statistics of the dataset. We use Indic-OCR developed at CVIT, for generating the text for the OCR Search Engine.

pipeline This is the basic pipeline followed by the OCR Search Engine. Digitised images after the preprocessing steps, ex. denoising, are passed through a segmentation pipeline, which generates line level segmentations for the document. These together are passed in the Indic OCR developed at CVIT that generates the text output . This text output, segmented output (line level segmentations) and document images form the database of the IRE system.

Functionalities implemented in the IRE system

The developed IRE system performs the following functionalities on the user end:

  • Search is based on the keywords entered by the users in this case i.e whether they are single or multiple word query or the query with double quotes.
  • Search is based on the language chosen. It also supports multilingual search which enables users to search the books with multiple languages, for instance in the case of "Bhagawadgitha" where a shlokaand its translation is provided.
  • Search is based on how relevant the search results are and provide the users with the highlighted line/lines containing searched queries within the complete text.
  • Additionally, primary (language, type, author, etc) and secondary(genre and source) filters for the books have been added to ease up the task of search.
  • Our system also supports transliteration which enables users to take the benefit of the phonetics of the other language while typing.

Live Demo

--- We will update the link ---

Code available on : [Code]


datasets The datasets (NDLI data & British library data) contains document images of the digitised books in the Indic languages

The dataset provided for use consists of more than 1.5 million documents in languages Hindi, Telugu and Tamil (~ 5 lacs each). This dataset consists of the books from various genres such as religious texts, historical texts, science and philosophy. Dataset statistics are presented in the table given below.

statistics Statistics of the NDLI dataset and the British Library dataset.

In addition to the existing Indic languages, we are also working on the Bangla data which is provided to us by the British Library. Our OCR model is trying to improve the text predictions for the same which will help in achieving better search results.


This work was supported, in part, by the National Digital Library of India (IIT Kharagpur) who provided us with more than 1.5 million document images in Indian languages (Hindi, Tamil and Telugu). We also thanks British Library for providing us with the ancient document images in Bangla language. I would also like to acknowledge Krishna Tulsyan for assisting the project, initially.


For further information about this project please feel free to contact :
Riya Gupta - This email address is being protected from spambots. You need JavaScript enabled to view it.
Dataset and Server Management: Varun Bhargavan, Aradhana Vinod [varun.bhargavan;aradhana.vinod]

Indiscapes: Instance segmentation networks for layout parsing of historical indic manuscripts


Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world’s literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text lines, pictures, multiple documents per image), we adapt a Fully Convolutional Deep Neural Network architecture for fully automatic, instance-level spatial layout parsing of manuscript images. We demonstrate the effectiveness of proposed architecture on images from the OpalNet dataset. For annotation flexibility and keeping the non-technical nature of domain experts in mind, we also contribute a custom, webbased GUI annotation tool and a dashboard-style analytics portal. Overall, our contributions set the stage for enabling downstream applications such as OCR and word-spotting in historical Indic manuscripts at scale.



To access the code and paper click here

N2NSkip: Learning Highly Sparse Networks using Neuron-to-Neuron Skip Connections

Owing to the overparametrized nature and high memory requirements of classical DNNs, there has been a renewed interest in network sparsification. Is it possible to prune a network at initialization (prior to training) while maintaining rich connectivity, and also ensure faster convergence? We attempt to answer this question by emulating the pattern of neural connections in the brain.

Figure: After a preliminary pruning step, N2NSkip connections are added to the pruned network while maintaining the overall sparsity of the network.


The over-parametrized nature of Deep Neural Networks (DNNs) leads to considerable hindrances during deployment on low-end devices with time and space constraints. Network pruning strategies that sparsify DNNs using iterative prune-train schemes are often computationally expensive. As a result, techniques that prune at initialization, prior to training, have become increasingly popular. In this work, we propose neuron-to-neuron skip (N2NSkip) connections, which act as sparse weighted skip connections, to enhance the overall connectivity of pruned DNNs. Following a preliminary pruning step, N2NSkip connections are randomly added between individual neurons/channels of the pruned network, while maintaining the overall sparsity of the network. We demonstrate that introducing N2NSkip connections in pruned networks enables significantly superior performance, especially at high sparsity levels, as compared to pruned networks without N2NSkip connections. Additionally, we present a heat diffusion-based connectivity analysis to quantitatively determine the connectivity of the pruned network with respect to the reference network. We evaluate the efficacy of our approach on two different preliminary pruning methods which prune at initialization, and consistently obtain superior performance by exploiting the enhanced connectivity resulting from N2NSkip connections.


In this work, inspired by the pattern of skip connections in the brain, we propose sparse, learnable neuron-to-neuron skip (N2NSkip) connections, which enable faster convergence and superior effective connectivity by improving the overall gradient flow in the pruned network. N2NSkip connections regulate overall gradient flow by learning the relative importance of each gradient signal, which is propagated across non-consecutive layers, thereby enabling efficient training of networks pruned at initialization (prior to training). This is in contrast with conventional skip connections, where gradient signals are merely propagated to previous layers. We explore the robustness and generalizability of N2NSkip connections to different preliminary pruning methods and consistently achieve superior test accuracy and higher overall connectivity. Additionally, our work also explores the concept of connectivity in deep neural networks through the lens of heat diffusion in undirected acyclic graphs. We propose to quantitatively measure and compare the relative connectivity of pruned networks with respect to the reference network by computing the Frobenius norm of their heat diffusion signatures at saturation.

N2NSkip1 N2NSkip

Figure: As opposed to conventional skip connections, N2NSkip connections introduce skip connections between non-consecutive layers of the network, and are parametrized by sparse learnable weights.


  • We propose N2NSkip connections which significantly improve the effective connectivity and test performance of sparse networks across different datasets and network architectures. Notably, we demonstrate the generalizability of N2NSkip connections to different preliminary pruning methods and consistently obtain superior test performance and enhanced overall connectivity.
  • We propose a heat diffusion-based connectivity measure to compare the overall connectivity of pruned networks with respect to the reference network. To the best of our knowledge, this is the first attempt at modeling connectivity in DNNs through the principle of heat diffusion.
  • We empirically demonstrate that N2NSkip connections significantly lower performance degradation as compared to conventional skip connections, resulting in consistently superior test performance at high compression ratios

Visualizing Adjacency Matrices

Considering each network as an undirected graph, we construct an n × n adjacency matrix, where n is the total number of neurons in the MLP. To verify the enhanced connectivity resulting from N2NSkip connections, we compare the heat diffusion signature of the pruned adjacency matrices with the heat diffusion signature of the reference network.

Figure: Binary adjacency matrices for (a) Reference network (MLP) (b) Pruned network at a compression of 5x (randomized pruning) (c) N2NSkip network at a compression of 5x (10% N2NSkip connections + 10% sequential connections).

Experimental Results:


Test Accuracy of pruned ResNet50 and VGG19 on CIFAR-10 and CIFAR-100 with either RP or CSP as the preliminary pruning step. The addition of N2NSkip connections leads to a significant increase in test accuracy. Additionally, there is a larger increase in accuracy at network densities of 5% and 2%, as compared to 10%. This observation is consistent for both N2NSkip-RP and N2NSkip-CSP, which indicates that N2NSkip connections can be used as a powerful tool to enhance the performance of pruned networks at high compression rates.

Related Publication:

  • Arvind Subramaniam, Avinash Sharma - N2NSkip: Learning Highly Sparse Networks using Neuron-to-Neuron Skip Connections, British Machine Vision Conference (BMVC 2020).

Quo Vadis, Skeleton Action Recognition ?


n this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition.To begin with, we benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. Toexamine skeleton action recognition 'in the wild', we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videossourced from Kinetics-700, a large-scale action dataset. The results from benchmarking the top performers of NTU-120 onSkeletics-152 reveal the challenges and domain gap induced by actions 'in the wild'. We extend our study to include out-of-contextactions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. Finally, as a new frontier foraction recognition, we introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game DumbCharades and interpretative dance performances. Overall, our work characterizes the strengths and limitations of existing approachesand datasets. It also provides an assessment of top-performing approaches across a spectrum of activity settings and via theintroduced datasets, proposes new frontiers for human action recognition.


To access the code and paper click here

An OCR for Classical Indic Documents Containing Arbitrarily Long Words


OCR for printed classical Indic documents written inSanskrit is a challenging research problem. It involves com-plexities such as image degradation, lack of datasets andlong-length words. Due to these challenges, the word ac-curacy of available OCR systems, both academic and in-dustrial, is not very high for such documents. To addressthese shortcomings, we develop a Sanskrit specific OCRsystem. We present an attention-based LSTM model forreading Sanskrit characters in line images. We introduce adataset of Sanskrit document images annotated at line level.To augment real data and enable high performance for ourOCR, we also generate synthetic data via curated font se-lection and rendering designed to incorporate crucial glyphsubstitution rules. Consequently, our OCR achieves a worderror rate of 15.97% and a character error rate of 3.71%on challenging Indic document texts and outperforms strongbaselines. Overall, our contributions set the stage for ap-plication of OCRs on large corpora of classic Sanskrit textscontaining arbitrarily long and highly conjoined words.

To access the code and paper click here


If you find our work useful in your research, please consider citing:

author = {Dwivedi, Agam and Saluja, Rohit and Kiran Sarvadevabhatla, Ravi},
title = {An OCR for Classical Indic Documents Containing Arbitrarily Long Words},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}