urdu_cnnUnconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks

People Involved : Mohit Jain, Minesh Mathew, C. V. Jawahar

Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels.


Scene_TextUnconstrained Scene Text and Video Text Recognition for Arabic Script

People Involved : Mohit Jain, Minesh Mathew, C. V. Jawahar

Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and AcTiV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesizing millions of Arabic text images from a large vocabulary of Arabic words and phrases.


pose-aware Pose-Aware Person Recognition

People Involved : Vijay Kumar, Anoop Namboodiri, Manohar Paluri, C. V. Jawahar

Person recognition methods that use multiple body regions have shown significant improvements over traditional face-based recognition. One of the primary challenges in full-body person recognition is the extreme variation in pose and view point. In this work, (i) we present an approach that tackles pose variations utilizing multiple models that are trained on specific poses, and combined using pose-aware weights during testing. (ii) For learning a person representation, we propose a network that jointly optimizes a single loss over multiple body regions. (iii) Finally, we introduce new benchmarks to evaluate person recognition in diverse scenarios and show significant improvements over previously proposed approaches on all the benchmarks including the photo album setting of PIPA.


cartoonProPageThe IIIT-CFW dataset

People Involved : Ashutosh Mishra, Shyam Nandan Rai, Anand Mishra, C. V. Jawahar

The IIIT-CFW is database for the cartoon faces in the wild. It is harvested from Google image search. Query words such as Obama + cartoon, Modi + cartoon, and so on were used to collect cartoon images of 100 public figures. The dataset contains 8928 annotated cartoon faces of famous personalities of the world with varying profession. Additionally, we also provide 1000 real faces of the public figure to study cross modal retrieval tasks, such as, Photo2Cartoon retrieval. The IIIT-CFW can be used for the study spectrum of problems as discussed in our ECCVW paper.


deepfeatembed logoDeep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text

People Involved : Praveen Krishnan, Kartik Dutta and C. V. Jawahar

We propose a deep convolutional feature representation that achieves superior performance for word spotting and recognition for handwritten images. We focus on :- (i) enhancing the discriminative ability of the convolutional features using a reduced feature representation that can scale to large datasets, and (ii) enabling query-by-string by learning a common subspace for image and text using the embedded attribute framework. We present our results on popular datasets such as the IAM corpus and historical document collections from the Bentham and George Washington pages.


matchdocimgs logoMatching Handwritten Document Images

People Involved : Praveen Krishnan and C. V. Jawahar

We address the problem of predicting similarity between a pair of handwritten document images written by different individuals. This has applications related to matching and mining in image collections containing handwritten content. A similarity score is computed by detecting patterns of text re-usages between document images irrespective of the minor variations in word morphology, word ordering, layout and paraphrasing of the content.



visualaestheticVisual Aesthetic Analysis for Handwritten Document Images

People Involved : Anshuman Majumdar, Praveen Krishnan and C. V. Jawahar

We  present  an  approach  for  analyzing  the  visual aesthetic  property  of  a  handwritten  document  page  which matches  with  human  perception.  We  formulate  the  problem at two independent levels: (i) coarse level which deals with the overall layout, space usages between lines, words and margins, and (ii) fine level, which analyses the construction of each word and  deals  with  the  aesthetic  properties  of  writing  styles.  We present our observations on multiple local and global features which can extract the aesthetic cues present in the handwritten documents.


FirstPersonActionRecognitionUsingDeepLearnedDescriptorsFirst Person Action Recognition

People Involved : Suriya Singh, Chetan Arora, C. V. Jawahar






thumbnailPanoramic Stereo Videos With A Single Camera

People Involved : Rajat Aggarwal, Amrisha Vohra, Anoop M. Namboodri

We present a practical solution for generating 360° stereo panoramic videos using a single camera. Current approaches either use a moving camera that captures multiple images of a scene, which are then stitched together to form the final panorama, or use multiple cameras that are synchronized. A moving camera limits the solution to static scenes, while multi-camera solutions require dedicated calibrated setups. Our approach improves upon the existing solutions in two significant ways: It solves the problem using a single camera, thus minimizing the calibration problem and providing us the ability to convert any digital camera into a panoramic stereo capture device. It captures all the light rays required for stereo panoramas in a single frame using a compact custom designed mirror, thus making the design practical to manufacture and easier to use. We analyze several properties of the design as well as present panoramic stereo and depth estimation results.


fiducial thumbnlFace Fiducial Detection by Consensus of Exemplars

People Involved : Mallikarjun B R, Visesh Chari, C. V. Jawahar , Akshay Asthana

An exemplar based approach to detect the facial landmarks. We show that by using a very simple SIFT and HOG based descriptor, it is possible to identify the most accurate fiducial outputs from a set of results produced by regression and mixture of trees based algorithms (which we call candidate algorithms) on any given test image. Our approach manifests as two algorithms, one based on optimizing an objective function with quadratic terms and the other based on simple kNN.



thumbnlFine-Tuning Human Pose Estimation in Videos

People Involved :Digvijay Singh, Vineeth Balasubramanian, C. V. Jawahar

A semi-supervised self-training method for fine-tuning human pose estimations in videos that provides accurate estimations even for complex sequences.




capVSdesFine-Grained Descriptions for Domain Specific Videos

People Involved :Mohak Kumar Sukhwani, C. V. Jawahar

Generation of human like natural descriptions for multimedia content pose an interesting challenge for vision community. In our current work we tackle the challenge of generating descriptions for the videos. The proposed method demonstrates considerable success in generating syntactically and pragmatically correct text for lawn tennis videos and is notably effective in capturing majority of the video content. Unlike any previous work our method focuses on generating exhaustive and richer human like descriptions. We aim to provide reliable descriptions that facilitate the task of video analysis and help understand the ongoing events in the video. Large volumes of text data are used to compute associated text statistics which is thereafter used along with computer vision algorithms to produce relevant descriptions


relativeattributesLearning relative attributes using parts

People Involved :Ramachandruni N Sandeep, Yashaswi Verma, C. V. Jawahar

Our aim is to learn relative attributes using local parts that are shared across categories. First, instead of using a global representation, we introduce a part-based representation combining a pair of images that specifically compares corresponding parts. Then, with each part we associate a locally adaptive “significance coefficient” that represents its discriminative ability with respect to a particular attribute. For each attribute, the significance-coefficients are learned simultaneously with a max-margin ranking model in an iterative manner. Compared to the baseline method , the new method is shown to achieve significant improvements in relative attribute prediction accuracy. Additionally, it is also shown to improve relative feedback based interactive image search.


spiral1 Motion Trajectory Based Video Retrieval

People Involved : Koustav Ghosal & Anoop Namboodiri

Extracting robust motion trajectories from videos is an active problem in Computer Vision. The task is more challenging in unconstrained videos in presence of dynamic background, blur, camera motion and affine irregularities. But once extracted, one can retrieve the videos based on them by giving an online (temporal) sketch as a query. But different individuals have different perception about motion trajectories and hence sketch differently. Through this work we are trying to extract robust trajectories from unconstrained videos and then model the sketch in a way so that the perceptual variability among different sketches is removed.


decomposingbow Decomposing Bag of Words Histograms

People Involved :Ankit Gandhi, Karteek Alahari, C V Jawahar

We aim to decompose a global histogram representation of an image into histograms of its associated objects and regions. This task is formulated as an optimization problem, given a set of linear classifiers, which can effectively discriminate the object categories present in the image. Our decomposition bypasses harder problems associated with accurately localizing and segmenting objects.



actionrecognitionAction Recognition using Canonical Correlation Kernels

People Involved :G Nagendar, C V Jawahar

Action recognition has gained significant attention from the computer vision community in recent years. This is a challenging problem, mainly due to the presence of significant camera motion, viewpoint transitions, varying illumination conditions and cluttered backgrounds in the videos. A wide spectrum of features and representations has been used for action recognition in the past. Recent advances in action recognition are propelled by (i) the use of local as well as global features, which have significantly helped in object and scene recognition, by computing them over 2D frames or over a 3D video volume (ii) the use of factorization techniques over video volume tensors and defining similarity measures over the resulting lower dimensional factors. In this project, we try to take advantages of both these approaches by defining a canonical correlation kernel that is computed from tensor representation of the videos. This also enables seamless feature fusion by combining multiple feature kernels.

imageAnnotation Image Annotation

People Involved :Yashaswi Verma, C V Jawahar

In many real-life scenarios, an object can be categorized into multiple categories. E.g., a newspaper column can be tagged as "political", "election", "democracy"; an image may contain "tiger", "grass", "river"; and so on. These are instances of multi-label classification, which deals with the task of associating multiple labels with single data. Automatic image annotation is a multi-label classification problem that aims at associating a set of text with an image that describes its semantics.


computationalDisplays Computational Displays

People Involved :Parikshit Sakurikar, Revanth N R, Pawan Harish, Nirnimesh and P J Narayanan

Displays have seen many improvements over the years but have many shortcomings still. These include rectangular shape, low color gamut, low dynamic range, lack of focus and context in a scene, lack of 3D viewing, etc. We propose Computational Displays, which employ computation to economically alleviate some of the shortcomings of today's displays.



SceneTextUnderstanding Scene Text Understanding

People Involved :Udit Roy, Anand Mishra, Karteek Alahari and C.V. Jawahar

Scene text recognition has gained significant attention from the computer vision community in recent years. Often images contain text which gives rich and useful information about their content. Recognizing such text is a challenging problem, even more so than the recognition of scanned documents. Scene text exhibits a large variability in appearances, and can prove to be challenging even for the state-of-the-art OCR methods. Many scene understanding methods recognize objects and regions like roads, trees, sky etc in the image successfully, but tend to ignore the text on the sign board. Our goal is to fill this gap in understanding the scene.

gpuproject Parallel Computing using CPU and GPU

People Involved : Aditya Deshpande, Parikshit Sakurikar, Harshit Sureka, K. Wasif, Ishan Misra, Pawan Harish, Vibhav Vineet and P J Narayanan

Commodity graphics hardware has become a cost-effective parallel platform for solving many general problems. New Graphics hardware by Nvidia offers an alternate programming model called CUDA which can be used in more flexible ways than GPGPU.


sfm cover Exploiting SfM Data from Community Collections

People Involved : Aditya Singh, Apurva Kumar, Krishna Kumar Singh, Kaustav Kundu, Rajvi Shah, Aditya Deshpande, Shubham Gupta, Siddharth Choudhary and P J Narayanan

Progress in the field of 3D Computer Vision has led to the development of robust SfM (Structure from Motion) algorithms. These SfM algorithms, allow us to build 3D models of the monuments using community photo collections. We work on improving the speed and accuracy of these SfM algorithms and also on developing tools that use the rich geometry present in the SfM datasets.


rayTracing Ray Tracing on the GPU

People Involved : Srinath Ravichandran, Rohit Nigam, Sashidhar Guntury, Jag Mohan Singh, Suryakant Patidar and P J Narayanan

Ray tracing is used to render photorealistic images in computer graphics. We work on real time ray tracing of static, dynamic and deformable scenes composed of polygons as well as higher order surfaces on the GPU.



Logo Medical Image Processing

People Involved : Gopal Datt Joshi, Mayank Chawla, Arunava Chakravarty, Akhilesh Bontala, Shashank Mujjumdar, Rohit Gautam, Subbu, Sushma

Digital medical images are widely used for diagnostic purposes. Our goal is to develop algorithms for medical image analysis focusing on enhancement, segmentation, multi-modal registration and classification.



terrain Terrain Rendering and Information System

People Involved : Shiben Bhattacharjee, Suryakant Patidar and P J Narayanan

Digital Terrain Model refer to a data model that attempts to provide a three dimensional representation of a continuous surface, since terrains are two and a half dimensional rather than three dimensional.




dgtk Data Generation Tool kit

People Involved :V Krishna

Synthetic data is very useful for validating the algorithms developed for various computer vision and image based rendering algorithms.




dag Recognition of Indian Language Documents

People Involved : Million Meshesha, Balasubramanian Anand, Sesh Kumar, L. Jagannathan, Neeba N V, Venkat Rasagna

The present growth of digitization of documents demands an immediate solution to enable the archived valuable materials searchable and usable by users in order to achieve its objective.



documentRetrieval Retrieval of Document Images

People Involved : Million Meshesha, Balasubramanian Anand, Pramod Sankar K, Anand Kumar

Our approach towards retrieval of document images, avoids explicit recognition of the text. Instead, we perform word matching in the image space. Given a query word, we generate its corresponding word image, and compare it against the words in the documents.



videoprocessing Retrieval from Video Databases

People Involved :Tarun Jain, Anurag Singh Rana, Balakrishna C., Pramod Sankar K., Saurabh Pandey, Balamanohar P., Natraj J.

Digital Libraries of broadcast videos could be easily built with existing technology. The storage, archival, search and retrieval of broadcast videos provide a large number of challenges for the research community. We address these challenges in different novel directions.



robotvision Robotic Vision

People Involved : Abdul Hafez, Visesh Uday Kumar, Supreeth, Anil, D. Santosh

Our research activity is primarily concerned with the geometric analysis of scenes captured by vision sensors and the control of a robot so as to perform set tasks by utilzing the scene intepretation.




ltl title Content Based Image Retrieval : CBIR

People Involved : Pradhee Tandon, P. Suman Karthik, Natraj J., Dhaval Mehta, E. S. V. N. L. S. Diwakar

We strive to enable machines with subjective perception capabilities at par with human their counterparts, especially with regards to images.




hom mini Contours, Textures, Homography and Fourier Domain

People Involved :Sujit Kuthirummal, Paresh Kumar Jain, M. Pawan Kumar, Saurabh Goyal

The aim of this study is to come up with a Fourier representation of contours and then utilise it to estimate two view relationships like homography and also come up with novel invariants.




traitsSmall Biometrics

People Involved :Vandana Roy, Sachin Gupta

The aim of the work is to develop robust and accurate biometric recognition systems, primarily for use in civilian applications. We are currently working on enhancing soft biometric traits such as hand geometry and palm texture, and also on gathering identity information from online handwritten documents.



handwritingSmall Online Handwriting Analysis

People Involved :Naveen Chandra Tewari, Sachin Gupta

Handwritten Recognition refers to mapping of meaningful handwritten lexemes to computer understandable codes. There are many applications in which entering data using handwriting is more convenient than keyboard like in making notes or making hand sketches.




garuda Garuda: A Scalable, Geometry Managed Display Wall

People Involved :Pawan Harish, Nirnimesh and P J Narayanan

Cluster-based tiled display walls simultaneously provide high resolution and large display area (Focus + Context) and are suitable for many applications. They are also cost-effective and scalable with low incremental costs. Garuda is a client-server based display wall solution designed to use off-the-shelf graphics hardware and standard Ethernet network.



dirp Depth-Image Representations

People Involved : Pooja Verlani, Aditi Goswami, Shekhar Dwivedi, Sireesh Reddy K, Sashi Kumar Penta

Depth Images are viable representations that can be computed from the real world using cameras and/or other scanning devices. The depth map provides a 2 and a half D structure of the scene. The depth map gives a visibility-limited model of the scene and can be rendered easily using graphics techniques.



bioVision Biological Vision

People Involved :Gopal Datt Joshi, Kartheek N V, Varun Jampani

The perceptual mechanisms used by different organisms to negotiate the visual world are fascinatingly diverse. Even if we consider only the sensory organs of vertebrates, such as eye, there is much variety.




pose Learning Appearance Models

People Involved :Karteek Alahari, Paresh Jain, Ranjeeth Kumar, Manikandan

Our reseach focuses on learning appearance models from images/videos that can be used for a variety of tasks such as recognition, detection and classification etc.




3dobjrecognition Projected Texture for 3D Object Recognition

People Involved : Avinash Sharma, Nishant Shobhit

We are solving the problem of 3D object recognition. Instead of recovering the 3D of object we are encoding the depth variation in object with deformation induced in pattern projected on object from certain pose. We proposed some effective features which can effectively characterize deformation of projected pattern for the purpose of recognition.



encrypted Security and Privacy of Visual Data

People Involved : Maneesh Upmanyu, Narsimha Raju, Shashank Jagarlamudi, Kowshik Palivela

With a rapid development and acceptablity of computer vision based systems in one's daily life, securing of the visual data has become imperative. Security issues in computer vision primarily originates from the storage, distribution and processing of the personal data, whereas privacy concerns with tracking down of the user's activity. Through this work we address specific security and privacy concerns of the visual data. We propose application specific, computationally efficient and provably secure computer vision algorithms for the encrypted domain.


encryptedSemantic Classification of Boundaries of an RGBD Image

People Involved :Nishit Soni, Anoop M. Namboodiri, C. V. Jawahar, Srikumar Ramalingam





RIAlogo1Retinal Image Analysis

People Involved : Arunava, Ujjwal, Gopal, Akhilesh, Sai, Yogesh

Analysis of retinal images for diagnostic purposes





histopathological Histo Pathological Image Analysis

People Involved : Anisha

Automatic analysis of histo-pathological images.




Brain Brain Image Analysis

People Involved :

Analysis of brain MRI and CT images for dignostic purposes.




hoffman H12 4 compliment Image Reconstruction

People Involved : Neha, Kartheek

Image upsampling on square and rotated lattice.





figureFine-Grain Annotation of Cricket Videos

People Involved : Rahul Anand Sharma, Pramod Sankar K, C. V. Jawahar






finepose Fine Pose Estimation of Known Objects in Cluttered Scene Images

People Involved :Sudipto Banerjee, Sanchit Aggarwal, Anoop M. Namdoodiri





chest heatmap Medical Image Perception

People Involved : Varun & Samrudhdhi Rangrej

Extracting robust motion trajectories from videos is an active problem in Computer Vision. The task is more challenging in unconstrained videos in presence of dynamic background, blur, camera motion and affine irregularities. But once extracted, one can retrieve the videos based on them by giving an online (temporal) sketch as a query. But different individuals have different perception about motion trajectories and hence sketch differently. Through this work we are trying to extract robust trajectories from unconstrained videos and then model the sketch in a way so that the perceptual variability among different sketches is removed.

usecaseCurrency Recognition on Mobile Phones

People Involved : Suriya Singh, Shushman Choudhury, Kumar Vishal and C.V. Jawahar

In this project, we present an application for recognizing currency bills using computer vision techniques, that can run on a low-end smartphone. The application runs on the device without the need for any remote server. It is intended for robust, practical use by the visually impaired.