3D Computer Vision


We are interested in computer vision and machine learning with a focus on 3D scene understanding, reconstruction etc. In particular, we deal with problems where human body is reconstructed from 2D images and analysed in 3D, registration of point clouds of indoor scenes captured from commodity sensors as well as large outdoor scenes captured from LIDAR scanners. As 3D data is represented in various formats like mesh, pointclouds, volumetric representations to name a few, we also design novel algorithms for making 3D data compatible with deep learning framework. We investigate how complex prior knowledge can be incorporated into computer vision algorithms for making them robust to variations in our complex 3D world. Below links are detailed descriptions of our individual works

HumanBody Capture

Surface Reconstruction

People Involved: Abbhinav Venkat, Chaitanya Patel, Yudhik Agrawal, Avinash Sharma

Recovering a 3D human body shape from a monocular image is an ill-posed problem in computer vision with great practical importance for many applications, including virtual and augmented reality platforms, animation industry, e-commerce domain, etc. [See More]

Volumetric Reconstruction

People Involved: Dr.Avinash Sharma, Sai Sagar J, Abbhinav Venkat, Chaitanya Patal, Anubhab Sen, Saurabh Rajguru, Neeraj Bhattan, Yudhik Agrawal, Himansh Sheron   

As we are aware of humans often as the subject of photographs, detecting them and analyzing their shape and pose in 3D is significant for applications such as AR/VR, motion capture etc. Our research objective in this work is to recover 3D human body from a single RGB image. [See More]

ezgif.com optimize1 ezgif.com video2

Classification of 3D data

People Involved: Dr.Avinash Sharma, Sai Sagar J and Raj Manvar

Apart from acquisition of 3D shapes, analyzing these shapes is also an important task. There are several shape analysis tasks like retrieval, classification, dense and sparse correspondence of two shapes, segmentation etc. The core of the 3D shape analysis lies at the 3D shape descriptors. Descriptor construction is usually application dependent. One expects the descriptor to be discriminative, invariant to some transformations or noise and compact i.e. low dimensional. Recently, few deep learning models for shape descriptors have appeared. Nevertheless, many of the existing deep learning works didn’t exploit the fact that 3D shapes are boundary based i.e. much of the information essential for classification is hinged on boundary. Performing convolution in volumetric space hinders the computing efficiency. In the following work, we address these particular drawbacks and construct novel descriptor that accounts for local geometry aware global characterization for rigib 3D shapes. [See More]

mesh pcl

Point Cloud Analysis

People Involved: Dr.Avinash Sharma, Ashish kubade and Nikhilendra

Working on a point cloud involves tackling the inherent challenges as sparseness, lack of order among points, invariance to view ports etc. For LiDAR scans, additionally, we have to handle the scale of the data as a typical outdoor LiDAR scan has point in the order of millions. Breaking such scene, into smaller chunks might would be an initial guess, however, working on such small chunks unlimately results in loss of global semantics of the scene.[See More]

plc 02

LIDAR scan of a valley

Bringing Semantics in Word Image Representation


In the domain of document images, for decades we have appreciated the need for learning a holistic word level representation, that is popularly used for the task of word spotting. However, in the past we have restricted the representations which only respect the word form, while completely ignoring its meaning. In this work, we attempt to bridge this gap by encoding the notion of semantics by introducing two novel form of word image representations. The first form learns an inflection invariant representation, thereby focusing on the root of the word, while the second form is built along the lines of recent textual word embedding techniques such as  Word2Vec and has much broader scope for semantic relationships. We observe that such representations are useful for both traditional word spotting and also enrich the search results by accounting the semantic nature of the task.




  • We propose a normalized word representation which is invariant to word form inflections. The representation achieves state of the art performance as compared to a conventional verbatim based representation.
  • We introduce a novel semantic representation for word images which respects both its form and meaning, thereby reducing the vocabulary gap that exists between the query and its retrieved results.
  • We demonstrate semantic word spotting task and evaluate the proposed representation on standard IR measures such word analogy and word similarity. 
  • The proposed representation is successfully demonstrated on both historical and modern document collections in printed and handwritten domains across Latin and Indic scripts.

Codes and Dataset :

Will be uploaded soon....
Please contact at This email address is being protected from spambots. You need JavaScript enabled to view it.

Qualitative Results


Related Papers



View-graph Selection Framework for Structure from Motion


View-graph is an essential input to large-scale structure from motion (SfM) pipelines. Accuracy and efficiency of large-scale SfM is crucially dependent on the input view-graph. Inconsistent or inaccurate edges can lead to inferior or wrong reconstruction. Most SfM methods remove `undesirable' images and pairs using several fixed heuristic criteria, and propose tailor-made solutions to achieve specific reconstruction objectives such as efficiency, accuracy, or disambiguation. In contrast to these disparate solutions, we propose an optimization based formulation that can be used to achieve these different reconstruction objectives with task-specific cost modeling that uses and construct a very efficient network flow based formulation for its approximate solution. The abstraction brought on by this selection mechanism separates the challenges specific to datasets and reconstruction objectives from the standard SfM pipeline and improves its generalization. This paper mainly focuses on application of this framework with standard SfM pipeline for accurate and ghost-free reconstructions of highly ambiguous datasets. To model selection costs for this task, we introduce new disambiguation priors based on local geometry. We further demonstrate versatility of the method by using it for the general objective of accurate and efficient reconstruction of large-scale Internet datasets using costs based on well-known SfM priors.



Related Publications

  • Rajvi Shah, Visesh Chari, and P J Narayanan View-graph Selection Framework for SfM, 15th European Conference on Computer Vision, Munich, Germany 2018 [pdf]


Will be made available soon


author = {Shah, Rajvi and Chari, Visesh and J Narayanan, P},
title = {View-graph Selection Framework for SfM},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}

Spectral Analysis of Brain Graphs

Our research objective is to analyze neuroimaging data using graph-spectral techniques in understanding how the static brain anatomical structure gives rise to dynamic functions during rest. We approach the problem by viewing the human brain as a complex graph of cortical regions and the anatomical fibres which connect them. We employ a range of modelling techniques which utilize methods from spectral graph theory, time series analysis, information theory, graph deep learning etc.


(2) Temporal Multiple Kernel Learning (tMKL) model for predicting resting state FC via characterizing fMRI connectivity dynamics (TMKL)



It has been widely accepted that rs-fMRI signals show non-stationary dynamics over a relatively fixed structural connectome in terms of state transitions. We partition the fMRI timeseries signals using sliding time-windows resulting in time-varying windowed functional connectivity (wFC) matrices. Clusters of all these wFCs are considered as distinct states constituting functional connectivity dynamics (FCD). Transition between states is modeled as a first order Markov process. We use a graph-spectral theoretic approach : temporal - Multiple Kernel Learning model that characterizes FCD as well as predicts the grand average FC or the FC over the entire scanning session of the subject.

Key Findings

  • Using Pearson correlation coefficient between empirical and predicted FCs as the measure of model performance, we compared the performance of the proposed model with two existing approaches: our previous project of Multiple Kernel Learning model and the non-linear dynamic-mean-field (DMF) model. Our model performs better than MKL and DMF models (see figure 1).
  • Grand average FC predicted by our model also has similar network topology with respect to the empirical FC in terms of corresponding community structures. (see figure 2)
  • Community structures in each state show distinct interaction patterns among themselves whose temporal alignment, along with these communities, is recovered by our model, thus playing a major role in the model's superior performance than other models. (see figure 3)

Related Publications

Surampudi, Sriniwas Govinda, Joyneel Misra, Gustavo Deco, Raju Bapi Surampudi, Avinash Sharma, and Dipanjan Roy. "Resting state dynamics meets anatomical structure: Temporal multiple kernel learning (tMKL) model." NeuroImage (2018). Slides Code Paper.

If you use this work, please cite :

  title={Resting state dynamics meets anatomical structure: Temporal multiple kernel learning (tMKL) model},
  author={Surampudi, Sriniwas Govinda and Misra, Joyneel and Deco, Gustavo and Bapi, Raju Surampudi and Sharma, Avinash and Roy, Dipanjan},


(1) generalized Multiple Kernel Learning Model (MKL) for Relating SC and FC (MKL)




A challenging problem in cognitive neuroscience is to relate the structural connectivity (SC) to the functional connectivity (FC) to better understand how large-scale network dynamics underlying human cognition emerges from the relatively fixed SC architecture.

We highlight the shortcomings of the existing diffusion models and propose a multi-scale diffusion scheme. Our multi-scale model is formulated as a reaction-diffusion system giving rise to spatio-temporal patterns on a fixed topology. Our model is analytically tractable and complex enough to capture the details of the underlying biological phenomena. We hypothesize the presence of inter-regional co-activations (latent parameters) that combine diffusion kernels at multiple scales to resemble FC from SC.

Key Findings

  • We established a bio-physical insight into the spatio-temporal aspect of the graph diffusion kernels.
  • Right amount of complexity of the model: Given the strength of the analytical approach and tractability with biological insights, the proposed model could be a suitable method for predicting task-based functional connectivity across different age groups.
  • Multi-scale analysis: Latent parameter matrix seems to successfully encapsulate inter- and intra-regional co-activations learned from the training cohort and enables reasonably good prediction of subject-specific empirical functional connectivity (FC) from the corresponding SC matrix.
  • Generalized diffusion scheme: Proposed MKL framework encompasses previously proposed diffusion schemes.



  1. MKL model's FC prediction is better as compared to that of dynamic mean field (DMF) and single diffusion kernel (SDK) (see figure 1a, 1b & 1c).
  2. Similarity of the FC predicted by the MKL model and the empirical FC in terms of community assignment (see figures 2 & 3).
  3. Model structure supports rich-club organization (see figure 4).
  4. Model passes various perturbation tests suggesting its robustness (see figures 5 & 6).


Related Publications

Sriniwas Govinda Surampudi, Shruti Naik, Raju Bapi Surampudi, Viktor K. Jirsa, Avinash Sharma and Dipanjan Roy. "Multiple Kernel Learning Model for Relating Structural and Functional Connectivity in the Brain." Scientific Reports. doi:10.1038/s41598-018-21456-0  Slide Code Paper

If you use this work, please cite :

  title={Multiple Kernel Learning Model for Relating Structural and Functional Connectivity in the Brain},
  author={Surampudi, Sriniwas Govinda and Naik, Shruti and Surampudi, Raju Bapi and Jirsa, Viktor K and Sharma, Avinash and Roy, Dipanjan},
  journal={Scientific reports},
  publisher={Nature Publishing Group}



Dipanjan Roy

Cognitive Brain Dynamics Lab, NBRC

Personal Page: https://dipanjanr.com/


Research Assistant Position


City-scale Road Audit using Deep Learning


Road networks in cities are massive and is a critical component of mobility. Fast response to defects, that can occur not only due to regular wear and tear but also because of extreme events like storms, is essential. Hence there is a need for an automated system that is quick, scalable and cost- effective for gathering information about defects. We propose a system for city-scale road audit, using some of the most recent developments in deep learning and semantic segmentation. For building and benchmarking the system, we curated a dataset which has annotations required for road defects. However, many of the labels required for road audit have high ambiguity which we overcome by proposing a label hierarchy. We also propose a multi-step deep learning model that segments the road, subdivide the road further into defects, tags the frame for each defect and finally localizes the defects on a map gathered using GPS. We analyze and evaluate the models on image tagging as well as segmentation at different levels of the label hierarchy.


Related Publicationstract

  • Sudhir Yarram, Girish Varma and C.V. Jawahar City-Scale Road Audit System using Deep Learning,International Conference on Intelligent Robots and Systems Madrid, Spain, 2018 [pdf]


City-scale Road Audit using Deep Learning code available here [Code]


City-scale Road Audit using Deep Learning dataset available here [Dataset]