CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Robotic Vision


Introductionvs

Our research activity is primarily concerned with the geometric analysis of scenes captured by vision sensors and the control of a robot so as to perform set tasks by utilzing the scene intepretation. The former problem is popular in literature as 'Structure from Motion', while the later is often refered as the 'Visual Servoing' problem.

Visual servoing consists in using the information provided by a vision sensor to control the movements of a dynamic system. This research topic is at the intersection of the fields of Computer Vision and Robotics. These fields are the subject of profitable research since many years and are particularly interesting by their very broad scientific and application spectrum. More specifically, we are concerned with enhancing the visual servoing algorithms, both in performance and in applicability so as to widen their use.


Performance Enhancement of Visual Servoing Techniques

Visual servoing is an interesting robotic vision area increasingly being applied to real-world problems. Such an application, however calls for an in-depth analysis of robustness and performance issues in visual servoing tasks. Typically, robustness issues involve handling errors in feature correspondence / pose and depth estimation. On the other hand, performance issues involve generating consistent input in-spite of noisy / varying parameters. We have developed algorithms that incorporate multiple cues in order to achieve consistent performance in presence of noisy features.


Visual Servoing in Uncoventional Environments

vsueMost robotic vision algorithms are proposed by envisaging robots operating in structured environments where the world is assumed rigid and planar. These algorithms fail to provide optimum behavior when the robot has to be controlled with respect to active non-rigid non-planar targets. We have developed a new framework for visual servoing that accomplishes the robot-positioning task even in such unconventional environments. We introduced a novel space-time representation scheme for modeling the deformations of a non-rigid object and proposed a new vision-based approach that exploited the two-view geometry induced by the space-time features to perform the servoing task.


Visual Tracking by Integration of Multiple Cuestracking

Object tracking is an important task in robotic vision, particularly for visual servoing. The tracking problem has been modeled in the robotic literature as a motion estimation problem. Thus 3D model based tracking is considered as a pose estimation problem and 2D planar object tracking as a homography estimation problem. There are two major sources of visual features that are used in marker-less visual tracking, edges and texture. Both visual features have advantages and disadvantages that make them suitable/unsuitable in many scenarios. we are designing a robust integration framework using both edge and texture features. This frame work probabilistically integrates the visual information collected from contour and texture. The integration is based on probabilistic goodness weights for each type of feature.    

Probabilistic Robotic Vision We are also currently investigating the utility of applying the rich literature available in the field of Probabilistic Robotics to Computer Vision Problems. Computer Vision problems often involve processing of noisy data. Probabilistic approaches are then appropriate as they allow for uncertainty to be modeled and propagated through the solution process.


Related Publication

  • D. Santohs and C.V. Jawahar - Visual Servoing in Non-Regid Environment: A Space-Time Approach Proc. of IEEE International Conference on Robotics and Automation(ICRA'07), Roma, Italy, 2007. [PDF]

  • A.H. Abdul Hafez and C. V. Jawahar - Probabilistic Integration of 2D and 3D Cues for Visual Servoing, 9th International Conference on Control,Automation,Robotics and Vision(ICARCV'06), Singapore, 5-8 December, 2006. [PDF]

  • A.H. Abdul Hafez and C. V. Jawahar - Integration Framework for Improved Visual Servoing in Image and Cartesian Spaces, International Conference on Intelligent Robots and Systems(IROS'06), Beijing, China,October 9-15, 2006. [PDF]

  • D. Santosh Kumar and C.V. Jawahar - Visual Servoing in Presence of Non-Rigid Motion, Proc. 18th IEEE International Conference on Pattern Recognition(ICPR'06), Hong Kong, Aug 2006. [PDF]

  • A.H. Abdul Hafex and C.V. Jawahar - Target Model Estimation Using Particle Filters for Visual Servoing, Proc. 18th IEEE International Conference on Pattern Recognition(ICPR'06), Hong Kong, Aug 2006. [PDF]

  • Abdul Hafez, Piyush Janawadkar and C.V. Jawahar - Novel view prediction for improved visual servoing, National Conference on Communcations (NCC) 2006, New Delhi
  • Abdul Hafez, and C.V. Jawahar - Minimizing a Class of Hybrid Error Functions for Optimal Pose Alignment, International Conference on Control, Robotics, Automation and Vision (ICARCV) 2006, Singapore.
  • D. Santosh Kumar and C. V. Jawahar - Robust Homography-based Control for Camera Positioning in Piecewise Planar Environments, Indain Conference on Computer Vision, Graphics and Image Processing (ICVGIP) 2006, Madurai
  • Abdul Hafez, Visesh Chari and C.V. Jawahar - Combine Texture and Edges based on Goodness Weights for Planar Object Tracking International Conference on Robotics and Automation (ICRA) 2007, Rome
  • Abdul Hafez and C. V. Jawahar - A Stable Hybrid Visual Servoing Agorithm, International Conference on Robotics and Automation (ICRA) 2007, Rome

Associated People

  • Abdul Hafez
  • Visesh
  • Supreeth
  • Anil
  • Santosh
  • Dr. C V Jawahar

 

Retrieval from Video Databases


Introduction

Broadcast Television is one of the primary and popular sources of information. With the advent of video recorders, TV programs could be recorded and stored locally. Digital Libraries of broadcast videos could be easily built with existing technology. The storage, archival, search and retrieval of broadcast videos provide a large number of challenges for the research community. The following projects address these challenges.


Building a Digital Library of Broadcast Videostvserverss

Digital libraries of broadcast videos allows one to archive television content for later viewing and reference. The importance and significance of such a library is similar to building a digital library for all books. The collection in the library is built by recording TV. Since it is difficult to record and store all channels simultaneously, a schedule is chosen to record programmes across various channels. An UI allows users to choose the schedule to be recorded for later viewing.

The videos are stored over multiple nodes that act as a storage cluster. An explicit file system structure is maintained for storing the videos. The file system incorporates the meta level information regarding the videos, such as date, time and channel recorded from. This allows users to easily browse and search the library for programs using these details.


Indexing Broadcast Newsnewsss2

Broadcast news is a class of multimedia that is of importance to both the scientific community and the general public. In broadcast news, the requirement is to provide content-level search and retrieval, which is very challenging. Though many broadcast news datasets are available, there is no collection pertaining to news in the Indian context. We built a system to automatically record and build a respository of Indian news broadcasts.

Our present collection consists of more than a month's news telecasts, recorded from 5 different news channels, covering 3 languages. The size of the collection is an ever increasing number, and is currently limited by the storage space available.

The videos are first divided into stories, by detecting the anchor-person. A keyframe is extracted from each of the shots in the story. An effective and intuitive way of visualising the videos is designed such that the user can get a feel of the content without actually needing to stream the videos and see them. Each video is presented as a slide show of the thumbnails of the constituent shots. User can zoom in on any thumbnail by hovering the mouse pointer over it. The adjacent figure shows a screenshot of this UI.


Searching News Using Overlaid Text (Without Characrter Recognition)

natrajblock

There have been several appraoches to video indexing and retrieval, based on spatio-temporal visual features. However, they are not always reliable and do not allow for easy querying. News video have many clues regarding the content of the video, in the form of overlaid text. This text is reliable for video indexing and retrieval. However, the recognition of overlaid text is difficult, due to the limited accuracy of Optical Character Recognisers (OCR). The inaccuracies are more pronounced in the context of Indian languages, which have a complex script and writing style.

To avoid explicit recognition, we use a novel recognition-free approach that matches words in the image space. Each word extracted from the videos is represented as a feature vector, the features carefully chosen to provide invariance to font type, style and size variations. Given a textual query, the word is rendered into an image and features are extracted from it. The query features are compared to those in the database, using a Dynamic Time Warping (DTW) based distance measure. The words in the databse that have high similarity with the query, are obtained, and the source videos are retrieved for the user.


Automatic Annotation of Cricket Videos

cricketscc

Sports videos are another popular class of multimedia. Sports videos are generally long where only a small part of the video is of real interest. The video is a sequence of action scenes, which occur semi-regularly. It was further observed that, for sports such as Cricket, detailed descriptions of these scenes is provided in the form of textual commentary available from online websites (such as www.cricinfo.com). This description is excellent for annotation and retrieval.

However, there is no explicit synchronisation between the textual descriptions, and the video segments that they correspond to. The synchronisation is achieved by using the text to drive the segmentation of the video. The scene categories are modeled in a suitable (visual) feature space, and the text is used to obtain the category of each scene in the video. A hypothetical video is generated from the text, which is aligned with the real video using Dynamic Programming. The scenes in the real video are segmented based on the known scene boundaries of the hypothetical video.

Once segmented, the scenes are automatically annotated with the detailed textual descriptions which allows us to build a Table-of-Contents representation for the entire video. This interface is very intutitve and allows for easy browsing of the video using the corresponding text. The text can now be used to index the videos, which allows for video retrieval using texttual queries (of semantic concepts).


Ongoing Work

In ongoing work, we are addressing various novel directions for enabling retrieval from video collections. In one of the projects for annotating news videos, we are using the people in the news to identify the news content. Faces in the news are annotated with the name of the person, and the news stories can be queried based on the people involved in it.

In another ongoing project, we are exploring the use of various compressed domain techniques for video data retrieval and mining. Videos are generally stored in the MPEG format which is a compressed domain representation. The use of a large number of exisiting techniques for compressed domain, avoids explicit decoding of video, and can convey further information without visual recognition and understanding.


Related Publications

  • Pramod Sankar K., Saurabh Pandey and C. V. Jawahar - Text Driven Temporal Segmentation of Cricket Videos , 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.433-444, 2006. [PDF]

  • C. V. Jawahar, Balakrishna Chennupati, Balamanohar Paluri and Nataraj Jammalamadaka, Video Retrieval Based on Textual Queries , Proceedings of the Thirteenth International Conference on Advanced Computing and Communications, Coimbatore, December 2005. [PDF]

 


Associated People

  • Tarun Jain
  • Anurag Singh Rana
  • Balakrishna C.
  • Pramod Sankar K.
  • Saurabh Pandey
  • Balamanohar P.
  • Natraj J.
  • Dr. C. V. Jawahar

Retrieval of Document Images


Introduction

Large collections of paper-based documents and books are being brought to the digital domain, through the efforts of various digitisation projects. The Digital Library of India initiative has scanned and archived more than a Million books. The automatic transcription of document images using Optical Character Recognition (OCR) is handicapped by the poor accuracy of OCRs, especially for Indian, African and oriental languages. Consequently, retrieval from document image collections is a challenging task.

Our approach towards retrieval of document images, avoids explicit recognition of the text. Instead, we perform word matching in the image space. Given a query word, we generate its corresponding word image, and compare it against the words in the documents. Documents that contain similar-looking words to the query, are retrieved for the user. This enables the user to search and retrieve from document images, using text queries.


Indexing Word Images

millionblockMatching words in the image space should handle the many variations in font size, font type, style, noise, degradations etc. that are present in document images. The features and the matching technique were carefully designed to handle this variety. Further, morphological word variations, such as prefix and suffixes for a given stem word, are identified using an innovative partial matching scheme. We use Dynamic Time Warping (DTW) based matching algorithm which enables us to efficiently exploit the information supplied by local features, that scans vertical strips of the word images.

This matching scheme is used to build an index for the documents. Word images are matched and clustered, such that a cluster contains all instances of a given word from all document images. These clusters allow us to build an indexing scheme for the document images. The index is built in the image space. Each entry in the index corresponds to the various words in the documents, while the index points to the documents that contain the given word. The documents are ranked using the term frequency inverse document frequency, TF/IDF measure. Given a query word, its word image is rendered and matched against the word images in the index. The index term that matches is used to immediately retireve all documents that contain the given query word.

This scheme was successfully demonstrated on tens of books. The search system is able to efficiently and quickly retrieve from document images, given a textual query. The system allows for cross-lingual search using transliteration and dictionary based translation. The system is available here . Popular search queries include arjuna, devotion, said, poolan, etc.


Word Annotation for Search

pramodblockOnline matching of a query word against a search index is a computationally intensive process and thus time consuming. This can be avoided by performing annotation of the word images. Annotation assigns a corresponding text word to each word image. This enables further processing, such as indexing and retrieval, to occur in the text domain. Search and retireval in text domain allows us to build search systems which have interactive response times.

Automatic annotation of word images is performed by the following procedure

  • Cluster Word images from documents, as similar to indexing
  • Generate labeled keyword images, by rendering from text
  • Cluster keyword images, using the techniques used for word images
  • Form associations between clusters of word images and keyword images
  • Annotate each word image with the most similar keyword to each word image

The annotation procedure is a one time offline computation, that results in a text equivalent for documents. A text based search system was built over the annotated documents. We have built a search system for 500 books in the Telugu language, which retrieves documents in real time. The procedure is scalable to large collections, while the retrieval time is unchanged.


Hashing of Word Images

anandblockIn databases, hashing of data is considered an efficient alternative to indexing (and clustering). In hashing, a single function value is computed for each word image. The words that have same (or similar) hash values are placed in the same bin. Effectively, the words with similar hash values are clustered together. With such a scheme, indexing search and retrieval of documents is linear in time complexity, which is a significant prospect.

We are building hash functions such that similar word images are mapped to same hash value. The features extracted from word images are used to hash using Locality Sensitive Hashing (LSH). With LSH, we ensure that for each function, the probability of collision is much higher for words which are close to each other than for those which are far apart. In the first stage, the images are hashed to a temporary hash table using randomly generated hash functions. These hash functions may hash different words into same index. To minimize such errors in hashing, we use the second level of hash function, which are learnable from ground truth data. When a query is given, the primary hash functions are applied to identify the first level hash value. The learned hash functions are then applied to rehash the image to another hash table. From the final hash table a linear search is done to retrieve the relevant word images. Thus the linear searching time is changed to sublinear with the use of locality sensitive hashing.


Related Publications

  • Million Meshesha and C. V. Jawahar - Matching word image for content-based retrieval from printed document images Proceeding of the International Journal on Document Analysis and Recognition, IJDAR 11(1), 29-38, 2008. [PDF]

  • Pramod Sankar K. and C.V. Jawahar - Enabling Search over Large Collections of Telugu Document Images-An Automatic Annotation Based Approach , 5th Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India, LNCS 4338 pp.837-848, 2006. [PDF]

  • Pramod Sankar K., Million Meshesha and C. V. Jawahar - Annotation of Images and videos based on Textual Content without OCR, Workshop on Computation Intensive Methods for Computer Vision(in conjuction with ECCV 2006), 2006. [PDF]

  • A. Balasubramanian, Million Meshesha and C. V. Jawahar - Retrieval from Document Image Collections, Proceedings of Seventh IAPR Workshop on Document Analysis Systems, 2006 (LNCS 3872), pp 1-12. [PDF]

  • C. V. Jawahar, Million Meshesha and A. Balasubramanian, Searching in Document Images, Proceedings of the Indian Conference on Vision, Graphics and Image Processing(ICVGIP), Dec. 2004, Calcutta, India, pp. 622--627. [PDF]

 


Associated People

  • Million Meshesha
  • Balasubramanian Anand
  • Pramod Sankar K.
  • Anand Kumar
  • Dr. C. V. Jawahar

Exploiting SfM Data from Community Collections


Introduction

On public photo sharing sites such as flickr, millions of photographs of monuments are easily available. These are popularly known as community photo collections. Progress in the field of 3D Computer Vision has led to the development of robust SfM (Structure from Motion) algorithms. These SfM algorithms, allow us to build 3D models of the monuments using community photo collections. In this domain of SfM algorithms, we work on improving their speed and accuracy. We have also developed tools which exploit the rich geometry information present in the SfM datasets. This geometry allows us to perform fast image localization, feature triangulation and visualization of user photographs.


Projects involving GPU's and CUDA

1. Visibility Probability Structure from SfM Datasets

VisProbStruct

Large scale reconstructions of camera matrices and point clouds have been created using structure from motion from community photo collections. Such a dataset is rich in information; it represents a sampling of the geometry and appearance of the underlying space. In this work, we encode the visibility information between and among points and cameras as visibility probabilities. We combine this visibility probability structure with a distance measure to prioritize points for fast guided search for the image localization problem. We also define the dual problem of feature triangulation as finding the 3D coordinates of a given image feature and solve it efficiently, again using the visibility probability structure.

GdbBrowser

2. Geometry Directed Photo-Browsing for Personal Photos

Browsers of personal digital photographs all essentially follow the slide show paradigm, sequencing through the photos in the order they are taken. A more engaging way to browse personal photographs, especially of a large space like a popular monument, should involve the geometric context of the space. In this work, we develop a geometry directed photo browser that enables users to browse their personal pictures with the underlying geometry of the space to guide the process. We believe personal photo browsers can provide an enhanced sense of one's own experience with a monument using the underlying 3D-geometric context of the monument.

3. Bundle Adjustment on GPU

Large-scale 3D reconstruction has received a lot of attention recently. Bundle adjustment is a key component of the reconstruction pipeline and often its slowest and most computational resource intensive. Its parallel implementations are rare. In this work, we developed a hybrid implementation of sparse bundle adjustment on the GPU using CUDA, with the CPU working in parallel. The algorithm is decomposed into smaller steps, each of which is scheduled on the GPU or the CPU. We developed efficient kernels for the steps and made use of existing libraries for additional performance. Our implementation outperforms the CPU implementation significantly, achieving a speedup of 30-40 times over the standard CPU implementation for datasets with upto 500 images on an Nvidia Tesla C2050 GPU.


Related Publications

  • Aditya Deshpande, Siddharth Choudhary, P J Narayanan , Krishna Kumar Singh, Kaustav Kundu, Aditya Singh, Apurva Kumar - Geometry Directed Browser for Personal Photographs Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

  • Siddharth Choudhary and P J Narayanan - Visibility Probability Structure from SfM Datasets and Applications Proceedings of 12th European Conference on Computer Vision, 7-13 Oct. 2012, Vol. ECCV 2012, Part-VI, LNCS 7577, Firenze, Italy. [PDF]

  • Siddharth Choudhary, Shubham Gupta and P. J. Narayanan - Practical Time Bundle Adjustment for 3D Reconstruction on GPU Proceedings of ECCV Workshop on Computer Vision on GPU (CVGPU'10),5-11 Sep. 2010, Crete, Greece. [PDF]

 


Associated People

  • Aditya Singh
  • Kaustav Kundu
  • Shubham Gupta
  • Apurva Kumar
  • Rajvi Shah
  • Siddharth Choudhary
  • Krishna Kumar Singh
  • Aditya Deshpande
  • Prof. P J Narayanan

Data Generation Toolkit for CV and IBR


Introduction

Synthetic data is very useful for validating the algorithms developed for various computer vision and image based rendering algorithms. Since all the information about the scene is available in the case of synthetic data, the researchers can easily compare the results achieved by their algorithm to the ideal values to be generated. High resolution real world data can be acquired from the natural scenes, but the accuracy of the data is generally not very high dude to physical limitations of the imaging devices. These are the main reasons for the popularity of the synthetic data in these fields (CV and IBR).

Algorithms which require just the image and no other information can use some of the standard 3D authoring tools for creating complex scenes which mimic the real world scenes. But most of the CV and IBR algorithms require additional information about the scene (for example depth maps, segmentation, alpha matte, etc). The standard 3D authoring tools do not provide enough flexibility for the users to generate their own representations. This is because most of the popular 3D authoring tools are complex, commercial and closed source softwares. Extending such tools with just the plug-in architecture supported by these softwares may not be intuitive. A simple open source tool which can provide a simple UI for generating various representations is desirable to the research community.


Features & Representations

screen

We aim at providing a tool which will enable researchers to generate data sets on their own with ease. The following are some of the key features of our system::

  • An interface which is similar to standard 3D authoring tools.
  • Import facility for using available 3D models.
  • Information about the Resolution at which the image is being rendered.
  • Simple click and drag interface for creating complex scenes.
  • A key frame interface for creating dynamic scenes with various moving, rotating objects.
  • Very intuitive interface for generating high resolution images.

representationsThe basic objective of DGTk is to enable generation of complex representations that are required by CV and IBR algorithms with ease. Our tool provides a very simple intuitive interface for generating the following representations of a scene created in our tool::

  • Point correspondences for points in different camera views.
  • Camera calibration information for all the cameras in the scene. This is provided in terms of K[R|t] matrices in a text file.
  • Layer Depth Images (LDI), this representation was first proposed and used by Shade et al.
  • Depth-maps or the depth information as percieved by the camera.
  • Object-maps, each pixel in the image is assigned R,G,B values based on the unique id of the object it correspond to.
  • Alpha-maps or the alpha matte information of each object in the scene.

Please visit our project page for downloading the tool and some example data sets.


Related Publications

  • Vamsikrishna and P.J. Narayanan - Data Generation Toolkit for Image Based Rendering Algorithms , The 3rd International Conference on Visual Information Engineering 26-28 September 2006 in Bangalore, India. [PDF]


Associated People

  • V. Vamsi Krishna
  • Prof. P. J. Narayanan

More Articles …

  1. Recognition of Indian Language Documents
  2. Terrain Processing
  3. Ray Tracing on the GPU
  4. Scene Text Understanding
  • Start
  • Prev
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Projects
  5. CVIT Projects
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.