Thesis Students

Time Frequency Analysis for Motion Magnification and Detection

Sushma M. (homepage)

Motion can be defined as change in position of an object of interest with respect to time. This thesis explores the methods of analyzing motion using time frequency analysis. In this thesis, we address two problems: (i) Small Motion Magnification in Videos and (ii) Motion Detection in Perfusion Weighted Imaging (PWI).

Human eye and its brain interface can visualize or detect the motion within a certain range of spatial and temporal frequencies. But in most of the cases, it might be possible that frequencies which are below this range also can have useful information. We can simplify this by saying that there can be small motions which are not visible to the naked eye. Even though these small motions are difficult to detect, they may contain useful information. In first part of thesis, we present a semi-automated method to magnify small motions in videos. This method amplifies invisible or hidden motions in videos. To achieve motion magnification, we process the spatial and temporal information obtained from the video itself. Advantage of this work is that it is application independent. Proposed technique estimates required parameters to get desirable results. We demonstrate performance on a few videos. Motion magnification performance is equivalent to existing manual methods.

In second part of thesis, we present a novel automated method to detect motion in perfusion weighted images (PWI), which is a type of magnetic resonance imaging (MRI). In PWI, blood perfusion is measured by injecting an exogenous tracer called bolus into the blood flow of a patient and then tracking it in the brain. PWI requires a long data acquisition time to form a time series of volumes. Hence, motion occurs due to patient's unavoidable movements during a scan, which in turn results into motion corrupted data. There is a necessity of detection of these motion artifacts on captured data for correct disease diagnosis. In PWI, intensity profile gets disturbed due to occurrence of motion and/or bolus passage through the blood vessels. In this work, we propose an efficient time-frequency analysis based motion detection method. We show that proposed method is computationally inexpensive and fast. This method is evaluated on a DSC-MRI sequence with simulated motion of different degrees. We show that our approach detects motion in a few seconds.

Year of completion:	May 2015
Advisor :	Prof. Jayanthi Sivaswamy & Dr. Anubha Gupta

Related Publications

Sushma M., Anubha Gupta and Jayanthi Sivaswamy - Time-Frequency Analysis based Motion Detection in Perfusion Weighted MRI Proceedings of the IEEE National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, 18-21 Dec. 2013, Jodhpur, India. [PDF]
Sushma M., Anubha Gupta and Jayanthi Sivaswamy - Semi-Automated Magnification of Small Motions in Videos Proceedings of 5th International Conference on Pattern Recognition and Machines Intelligence, 10-14 Dec. 2013, Kolkata, India. [PDF]

Downloads

Synthesizing Classifiers for Novel Settings

Viresh Ranjan (homepage)

Computer vision systems have been developed which perform well at recognizing and retrieving natural images as well as document images. However,these systems might not work well under certain scenarios, for instance, when the distribution of the training and the test data do not match. For example, an ocr system may not work on those target fonts which are very different from the fonts used while training. Moreover, such systems might not be able to tackle previously unseen categories. These scenarios limit the real world applications of Computer vision systems to some extent. In this thesis, we tackle these problems by designing classiffers that could work in novel scenarios. We design algorithms for retrieving from document images, as well as recognizing objects and digits in novel settings.

For the document image retrieval task, we consider two different problems. In the first scenario, we tackle the issue of novel query words in a classifier based retrieval system. We present a one-shot learning strategy for the learning of discriminative classifiers given a novel query word. This strategy utilizes the classifiers learned at the training time in order to obtain the classifier corresponding to the underlying query class. This extends the classifier based retrieval paradigm to an unlimited number of classes(words)present in a language. We validate our method on multiple data sets, and compare it with popular alternatives like ocr and word spotting. In the second scenario, we tackle the problem of mismatch between the source and the target style(font). We tackle this problem by style(font)-content(word label)factorization strategy. Based on the style-content factorization, we present a semi-supervised style transfer strategy to transfer word images in the source font to the target font. We also present a nonlinear style content factorization for obtaining style independent representation of word images. We validate both these strategies on scanned document collections as well as multifont synthetic data sets. We show mean average precision gains of upto 0.30 over the baseline using our nonlinear factorization strategy.

For the recognition task, we consider the data set mismatch scenario between the source and the target data. In such a scenario, the classifiers trained on the source data might perform poorly on the target data. We tackle two different tasks in this scenario, i.e. digits recognition and object recognition. The two domains we consider for the digits recognition task are handwritten and printed digits. To tackle the digits recognition task,we present a subspace alignment based strategy. In this approach, labeled source domain data and unlabeled target domain data is used to learn transformations for the two domain which reduces the mismatch between the two domains. A source domain classifier learned after applying the transformations would work well even on the target domain data. We consider the simple nearest neighbor based classifier for validating this claim. For the object recognition task, we present a sparse representation based strategy. A dictionary learned from the source data might not be suitable for sparsely representing target domain data and vice-versa. Hence, we present a partially shared dictionary learning strategy which results in dictionaries which are suitable for representing the source as well as the target domains. We show our results on popular benchmark data sets and show improvement over the state of art approaches. (more...)

Year of completion:	May 2015
Advisor :	Prof. C. V. Jawahar

Related Publications

Viresh Ranjan, Gaurav Harit, C. V. Jawahar - Document Retrieval with Unlimited Vocabulary Proceedings of the IEEE Winter Conferenc on Applications of Computer Vision, 06-09 Jan 2015, Waikoloa Beach, USA. [PDF]
Viresh Ranjan, Gaurav Harit, C. V. Jawahar - Domain Adaptation by by Aligning Locality Preserving Subspaces Proceedings of the Eighth International Conference on Advances in Pattern Recognition,04-07 Jan 2015, Kolkata, India. [PDF]
Viresh Ranjan, Gaurav Harit, C.V. Jawahar - Learning Partially Shared Dictionaries for Domain Adaptation Proceedings of the 12th Asian Conference on Computer Vision,01-05 Nov 2014, Singapore. [PDF]

Viresh Ranjan, Gaurav Harit, C. V. Jawahar - Enhancing Word Image Retrieval in Presence of Font Variations, 22nd International Conference on Pattern Recognition(ICPR) , 2014(Oral), [PDF]

Downloads

Towards Efficient Methods for Word Image Retrieval

Raman Jain (homepage)

The economic feasibility of maintaining large number of documents in digital image formats has created a tremendous demand for robust ways to access and manipulate the information these images contain. This requires research in the area of document image understanding, specifically in the area of document image recognition as well as document image retrieval. There have been many excellent attempts in building robust document analysis systems in industry, academia and research labs. One way to provide traditional database indexing and retrieval capabilities is to fully convert the document to an electronic representation which can be indexed automatically. Unfortunately, there are many factors which prohibit complete conversion including high cost, low document quality, and non-availability of OCRs for non-European languages.

Word spotting has been adopted and used by various researchers as a complementary technique to Optical Character Recognition for document analysis and retrieval. The various applications of word spotting include document indexing, image retrieval and information filtering. The important factors in word spotting techniques are pre-processing, selection and extraction of proper features and image matching algorithms. The Euclidean based algorithm is considered to be a faster matching algorithm. In the word spotting literature the Euclidean based algorithm has been used successfully to compare the features extracted from word images. However, the problem with this approach is that interpolation of images to get same width leads to loss of very useful informations. Dynamic Time Warping based algorithm is more preferable than Euclidean based algorithm.

In this thesis, a new approach based on Weighted Euclidean distance based algorithm has been used innovatively to compare two word images. The various features, i.e., projection profiles, word profiles and transitional features are extracted from the word images which are compared via Weighted Euclidean based algorithm with greater speed and higher accuracy. The experiments have been conducted on a large printed document databases of English language. The average precision rates achieved for this language were 95.48 %. The time taken for matching every two images was 0.03 milli-seconds.

A second line of research performed in this thesis considers keyword spotting for Hindi documents, the task of retrieving all instances of a given word or phrase from a collection of documents. During the course of this thesis, a novel learning based keyword spotting system using recurrent neural networks was developed. To the knowledge of the author, this is the first time that recurrent neural network based method is explored to Indian languages documents. In a set of experiments, its superior performance over state-of-the-art reference systems is shown.

Finally, the above system is applied for exhaustive recognition. The main findings of this thesis are that supervised learning in the form of training can increase the retrieval accuracy. The key to success lies in a well-balanced trade-off between data quality and data quantity when choosing the elements to be added to the training set. Performance evaluation using datasets from different languages shows the effectiveness of our approaches. Extension works are recommended that need further consideration in the future to further the state-of-the-art in document image retrieval. (more...)

Year of completion:	July 2015
Advisor :	Prof. C. V. Jawahar

Related Publications

Raman Jain, Volkmar Frinken, C. V. Jawahar and R. Manmatha - BLSTM Neural Network based Word Retrieval for Hindi Documents Proceedings of 11th International Conference on Document Analysis and Recognition (ICDAR 2011),18-21 September, 2011, Beijing, China. [PDF]
Raman Jain and C. V. Jawahar - Towards More Effective Distance Functions for Word Image Matching Proceedings of Ninth IAPR International Workshop on Document Analysis Systems (DAS'10), pp.363-370, 9-11 June, 2010, Boston, MA, USA. [PDF]

Downloads

Efficient Ray Tracing of Parametric Surgace for Advance Effects

Rohit Nigam (homepage)

Ray Tracing is one of the most important rendering techniques used in computer graphics. Ray traced images are more accurate and photo-realistic as compared to direct rendering. Ray Tracing was earlier considered impractical for rendering scenes at interactive rates because of its high computational cost. However, with the advancements in modern Graphics Processing Units (GPU) and CPUs, ray tracing at interactive rates has now become possible.

Parametric patches have been widely used in many fields to describe a model accurately. They provide a compact and effective way of representing an object and also possess the ability to remain curved on zooming. Ray Tracing of parametric surfaces was considered to be a static process because of the high complexity of intersection algorithms. With advancements in ray tracing techniques and high compute power devices, recent works on ray tracing parametric surfaces have reported near interactive results.

We present a scheme for interactive ray tracing of Bezier bicubic patches using Newton iteration in this dissertation. We use a mixed hierarchy representation as the acceleration structure. This has a bounding volume hierarchy above the patches and a fixed depth subpatch tree below it. This helps reduce the number of ray-patch intersections that needs to be evaluated and provides good initialization for the iterative step, keeping the memory requirements low. We use Newton iteration on the generated list of ray patch intersections in parallel. Our method can exploit the cores of the CPU and the GPU with OpenMP on the CPU and CUDA on the GPU by sharing work between them according to their relative speeds. A data parallel framework is used throughout starting with a list of rays, which is transformed to a list of ray-patch intersections by traversal and then to intersections and a list of secondary rays by root finding. We are able to significantly outperform multi-core CPU implementation and previous GPU implementation using the mixed hierarchy model.

3tp1lvlw

Shadow and reflection rays can be handled exactly in the same manner as a result. The secondary ray list is again sent to the starting of the algorithm to perform mixed hierarchy traversal and intersection tests. We perform fixed depth multiple bounce ray tracing. We also show how our method extends easily to generate soft shadows using area light sources. These effects provide higher realism to the ray traced images.

We render a million pixel image of the Teapot model at 125 fps on a system with an Intel i7 920 and a Nvidia GTX580 for primary rays only and at about 65 fps with one pass of shadow and refection rays. We are able to ray trace bigguy in a box scene with multi-bounce at near interactive rates. We get a speed up of about 5-30x for our hybrid Newton's method implementation over our optimized CPU implementation and about 20-50x over previous GPU implementation of Kajiya's method to ray trace Bezier surfaces. Traversing the mixed hierarchy is the most time consuming step of the algorithm. We expect to see better performance with greater cache size. The hybrid model would be optimal for systems with equal compute power of CPU and GPU. The proposed model is suitable for parallel architecture, hybrid systems and multi-GPU systems.

Global illumination effects have recently started gaining popularity with the progress in parallel architecture. We extend our algorithm to global illumination effects to demonstrate its capabilities. Global illumination effects have not been reported for parametric surfaces. We perform path tracing by tracing a large number of rays per pixel for a fixed depth. Number of samples greatly increase the quality of the image generated. We also perform more advanced effects like ambient occlusion, depth of field, motion blur and glossy surface. We are able to path trace a $512 \times 512$ image with 1000 samples per pixel in about 165 seconds. We report timings for other advanced effects. We find that ray coherence is essential for optimal performance when ray tracing Bezier surfaces on the GPU. Size of the dataset also plays a small part in the overall rendering times. The work done in this dissertation should serve as the starting point to optimally render Bezier surfaces with advanced global illumination techniques. (more...)

Year of completion:	July 2015
Advisor :	P. J. Narayanan

Related Publications

Rohit Nigam, P J Narayanan - Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

Downloads

A Hierarchical System Design for Detection of Glaucoma From Color Fundus Images

Madhulika Jain (homepage)

Glaucoma is an eye disorder which is prevalent in the aging population and causes irreversible loss of vision. Hence, computer aided solutions are of interest for screening purposes. Glaucoma is indicated by structural changes in the optic disc (OD), loss of nerve ?bres and atrophy of the peripapillary region of the OD in retina. In retinal images, most of these appear in the form of subtle variation in appearance. Hence, automated assessment of glaucoma from colour fundus images is a challenging problem. Prevalent approaches aim at detecting the primary indicator, namely, the optic cup deformation relative to the disc and use the ratio of the two diameters in the vertical direction, to classify images as normal or glaucomatous.

We explore the use of global motion pattern-based features to detect glaucoma from images and propose an image representation that serves to accentuate subtle indicators of the disease. These global image features are then used to identify normal cases effectively. The proposed method is demonstrated on a large image dataset consisting of 1845 images annotated by 3 medical experts. The global approach is extended to detect atrophy and two hierarchical system designs are proposed. In the ?rst design, only global analysis is used, while in the second both global and local analysis are employed.

In the ?rst design, the ?rst stage is based on features capturing information mainly of primary indicators while the second stage is based on features extracted for detecting atrophy (secondary visual indicator).

The second design attempts to combine the strengths of global and local analysis of the OD region. Global features are used to remove as many normal cases as possible in the ?rst stage and local features are used to perform a ?ner classi?cation in the second stage. This system has been tested on 1040 images with ground truth collected from 3 glaucoma experts. The results show the hybrid approach offers a good solution for glaucoma screening from retinal images (more...)

Year of completion:	July 2015
Advisor :	Jayanthi Sivaswamy

Related Publications

K Sai Deepak, Madhulika Jain, Gopal Datt Joshi, Jayanthi Sivaswamy - Motion pattern-based image features for glaucoma detection from retinal images Proceedings of the 8th Indian Conference on Vision, Graphics and Image Processing, 16-19 Dec. 2012, Bombay, India. [PDF]

Madhulika Jain, Arunava Chakravarty, Gopal Datt Joshi and Jayanthi Sivaswamy - A Hierarchical and Multifactorial System Design for detection of Glaucoma from Color Fundus Images, in Medical Image Computing and Computer Assisted Intervention 2014 (Not Accepted). [PDF]

Time Frequency Analysis for Motion Magnification and Detection

Sushma M. (homepage)

Related Publications

Downloads

Synthesizing Classifiers for Novel Settings

Viresh Ranjan (homepage)

Related Publications

Downloads

Towards Efficient Methods for Word Image Retrieval

Raman Jain (homepage)

Related Publications

Downloads

Efficient Ray Tracing of Parametric Surgace for Advance Effects

Rohit Nigam (homepage)

Related Publications

Downloads

A Hierarchical System Design for Detection of Glaucoma From Color Fundus Images

Madhulika Jain (homepage)

Related Publications

Downloads

More Articles …