Bringing Semantics in Word Image Representation


In the domain of document images, for decades we have appreciated the need for learning a holistic word level representation, that is popularly used for the task of word spotting. However, in the past we have restricted the representations which only respect the word form, while completely ignoring its meaning. In this work, we attempt to bridge this gap by encoding the notion of semantics by introducing two novel form of word image representations. The first form learns an inflection invariant representation, thereby focusing on the root of the word, while the second form is built along the lines of recent textual word embedding techniques such as  Word2Vec and has much broader scope for semantic relationships. We observe that such representations are useful for both traditional word spotting and also enrich the search results by accounting the semantic nature of the task.




  • We propose a normalized word representation which is invariant to word form inflections. The representation achieves state of the art performance as compared to a conventional verbatim based representation.
  • We introduce a novel semantic representation for word images which respects both its form and meaning, thereby reducing the vocabulary gap that exists between the query and its retrieved results.
  • We demonstrate semantic word spotting task and evaluate the proposed representation on standard IR measures such word analogy and word similarity. 
  • The proposed representation is successfully demonstrated on both historical and modern document collections in printed and handwritten domains across Latin and Indic scripts.

Codes and Dataset :

Will be uploaded soon....
Please contact at This email address is being protected from spambots. You need JavaScript enabled to view it.

Qualitative Results


Related Papers



Spectral Analysis of Brain Graphs

Our research objective is to analyze neuroimaging data using graph-spectral techniques in understanding how the static brain anatomical structure gives rise to dynamic functions during rest. We approach the problem by viewing the human brain as a complex graph of cortical regions and the anatomical fibres which connect them. We employ a range of modelling techniques which utilize methods from spectral graph theory, time series analysis, information theory, graph deep learning etc.


(2) Temporal Multiple Kernel Learning (tMKL) model for predicting resting state FC via characterizing fMRI connectivity dynamics (TMKL)



It has been widely accepted that rs-fMRI signals show non-stationary dynamics over a relatively fixed structural connectome in terms of state transitions. We partition the fMRI timeseries signals using sliding time-windows resulting in time-varying windowed functional connectivity (wFC) matrices. Clusters of all these wFCs are considered as distinct states constituting functional connectivity dynamics (FCD). Transition between states is modeled as a first order Markov process. We use a graph-spectral theoretic approach : temporal - Multiple Kernel Learning model that characterizes FCD as well as predicts the grand average FC or the FC over the entire scanning session of the subject.

Key Findings

  • Using Pearson correlation coefficient between empirical and predicted FCs as the measure of model performance, we compared the performance of the proposed model with two existing approaches: our previous project of Multiple Kernel Learning model and the non-linear dynamic-mean-field (DMF) model. Our model performs better than MKL and DMF models (see figure 1).
  • Grand average FC predicted by our model also has similar network topology with respect to the empirical FC in terms of corresponding community structures. (see figure 2)
  • Community structures in each state show distinct interaction patterns among themselves whose temporal alignment, along with these communities, is recovered by our model, thus playing a major role in the model's superior performance than other models. (see figure 3)

Related Publications

Surampudi, Sriniwas Govinda, Joyneel Misra, Gustavo Deco, Raju Bapi Surampudi, Avinash Sharma, and Dipanjan Roy. "Resting state dynamics meets anatomical structure: Temporal multiple kernel learning (tMKL) model." NeuroImage (2018). Slides Code Paper.

If you use this work, please cite :

  title={Resting state dynamics meets anatomical structure: Temporal multiple kernel learning (tMKL) model},
  author={Surampudi, Sriniwas Govinda and Misra, Joyneel and Deco, Gustavo and Bapi, Raju Surampudi and Sharma, Avinash and Roy, Dipanjan},


(1) generalized Multiple Kernel Learning Model (MKL) for Relating SC and FC (MKL)




A challenging problem in cognitive neuroscience is to relate the structural connectivity (SC) to the functional connectivity (FC) to better understand how large-scale network dynamics underlying human cognition emerges from the relatively fixed SC architecture.

We highlight the shortcomings of the existing diffusion models and propose a multi-scale diffusion scheme. Our multi-scale model is formulated as a reaction-diffusion system giving rise to spatio-temporal patterns on a fixed topology. Our model is analytically tractable and complex enough to capture the details of the underlying biological phenomena. We hypothesize the presence of inter-regional co-activations (latent parameters) that combine diffusion kernels at multiple scales to resemble FC from SC.

Key Findings

  • We established a bio-physical insight into the spatio-temporal aspect of the graph diffusion kernels.
  • Right amount of complexity of the model: Given the strength of the analytical approach and tractability with biological insights, the proposed model could be a suitable method for predicting task-based functional connectivity across different age groups.
  • Multi-scale analysis: Latent parameter matrix seems to successfully encapsulate inter- and intra-regional co-activations learned from the training cohort and enables reasonably good prediction of subject-specific empirical functional connectivity (FC) from the corresponding SC matrix.
  • Generalized diffusion scheme: Proposed MKL framework encompasses previously proposed diffusion schemes.



  1. MKL model's FC prediction is better as compared to that of dynamic mean field (DMF) and single diffusion kernel (SDK) (see figure 1a, 1b & 1c).
  2. Similarity of the FC predicted by the MKL model and the empirical FC in terms of community assignment (see figures 2 & 3).
  3. Model structure supports rich-club organization (see figure 4).
  4. Model passes various perturbation tests suggesting its robustness (see figures 5 & 6).


Related Publications

Sriniwas Govinda Surampudi, Shruti Naik, Raju Bapi Surampudi, Viktor K. Jirsa, Avinash Sharma and Dipanjan Roy. "Multiple Kernel Learning Model for Relating Structural and Functional Connectivity in the Brain." Scientific Reports. doi:10.1038/s41598-018-21456-0  Slide Code Paper

If you use this work, please cite :

  title={Multiple Kernel Learning Model for Relating Structural and Functional Connectivity in the Brain},
  author={Surampudi, Sriniwas Govinda and Naik, Shruti and Surampudi, Raju Bapi and Jirsa, Viktor K and Sharma, Avinash and Roy, Dipanjan},
  journal={Scientific reports},
  publisher={Nature Publishing Group}



Dipanjan Roy

Cognitive Brain Dynamics Lab, NBRC

Personal Page:


Research Assistant Position


City-scale Road Audit using Deep Learning


Road networks in cities are massive and is a critical component of mobility. Fast response to defects, that can occur not only due to regular wear and tear but also because of extreme events like storms, is essential. Hence there is a need for an automated system that is quick, scalable and cost- effective for gathering information about defects. We propose a system for city-scale road audit, using some of the most recent developments in deep learning and semantic segmentation. For building and benchmarking the system, we curated a dataset which has annotations required for road defects. However, many of the labels required for road audit have high ambiguity which we overcome by proposing a label hierarchy. We also propose a multi-step deep learning model that segments the road, subdivide the road further into defects, tags the frame for each defect and finally localizes the defects on a map gathered using GPS. We analyze and evaluate the models on image tagging as well as segmentation at different levels of the label hierarchy.


Related Publicationstract

  • Sudhir Yarram, Girish Varma and C.V. Jawahar City-Scale Road Audit System using Deep Learning,International Conference on Intelligent Robots and Systems Madrid, Spain, 2018 [pdf]


City-scale Road Audit using Deep Learning code available here [Code]


City-scale Road Audit using Deep Learning dataset available here [Dataset]


View-graph Selection Framework for Structure from Motion


View-graph is an essential input to large-scale structure from motion (SfM) pipelines. Accuracy and efficiency of large-scale SfM is crucially dependent on the input view-graph. Inconsistent or inaccurate edges can lead to inferior or wrong reconstruction. Most SfM methods remove `undesirable' images and pairs using several fixed heuristic criteria, and propose tailor-made solutions to achieve specific reconstruction objectives such as efficiency, accuracy, or disambiguation. In contrast to these disparate solutions, we propose an optimization based formulation that can be used to achieve these different reconstruction objectives with task-specific cost modeling that uses and construct a very efficient network flow based formulation for its approximate solution. The abstraction brought on by this selection mechanism separates the challenges specific to datasets and reconstruction objectives from the standard SfM pipeline and improves its generalization. This paper mainly focuses on application of this framework with standard SfM pipeline for accurate and ghost-free reconstructions of highly ambiguous datasets. To model selection costs for this task, we introduce new disambiguation priors based on local geometry. We further demonstrate versatility of the method by using it for the general objective of accurate and efficient reconstruction of large-scale Internet datasets using costs based on well-known SfM priors.



Related Publications

  • Rajvi Shah, Visesh Chari, and P J Narayanan View-graph Selection Framework for SfM, 15th European Conference on Computer Vision, Munich, Germany 2018 [pdf]


Will be made available soon


author = {Shah, Rajvi and Chari, Visesh and J Narayanan, P},
title = {View-graph Selection Framework for SfM},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}

Learning Human Poses from Actions


We consider the task of learning to estimate human pose in still images. In order to avoid the high cost of full supervision, we propose to use a diverse data set, which consists of two types of annotations: (i) a small number of images are labeled using the expensive ground-truth pose; and (ii) other images are labeled using the inexpensive action label. As action information helps narrow down the pose of a human, we argue that this approach can help reduce the cost of training without significantly affecting the accuracy. To demonstrate this we design a probabilistic framework that employs two distributions: (i) a conditional distribution to model the uncertainty over the human pose given the image and the action; and (ii) a prediction distribution, which provides the pose of an image without using any action information. We jointly estimate the parameters of the two aforementioned distributions by minimizing their dissimilarity coefficient, as measured by a task-specific loss function. During both training and testing, we only require an efficient sampling strategy for both the aforementioned distributions. This allows us to use deep probabilistic networks that are capable of providing accurate pose estimates for previously unseen images. Using the MPII data set, we show that our approach outperforms baseline methods that either do not use the diverse annotations or rely on pointwise estimates of the pose.




If you use this work or dataset, please cite :

    title={ Learning Human Poses from Actions},
    author={Arun, Aditya and Jawahar, C.~V and Kumar, M. Pawan},