CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Banners
  • Contact Us
  • Login

Analyzing Racket Sports From Broadcast Videos

 


Anurag Ghosh

Abstract

Sports video data is recorded for nearly every major tournament but remains archived and inaccessi-ble to large scale data mining and analytics. However, Sports videos have a inherent temporal structure,due to the nature of sports themselves. For instance, tennis, comprises of points, games and sets played between the two players/teams. Recent attempts in sports analytics are not fully automatic for finer details or have a human in the loop for high level understanding of the game and, therefore, have limited practical applications to large scale data analysis. Many of these applications depend on specialized camera setups and wearable devices which are costly and unwieldy for players and coaches, specially in a resource constrained environments like India. Utilizing very simple and non-intrusive sensor(s) (like a single camera) along with computer vision models is necessary to build indexing and analytics systems. Such systems can be used to sort through huge swathes of data, help coaches look at interesting video segments quickly, mine player data and even generate automatic reports and insights for a coach to monitor. Firstly, we demonstrate a score based indexing approach for broadcast video data. Given a broadcast sport video, we index all the video segments with their scores to create a navigable and searchable match. Even though our method is extensible to any sport with scores, we evaluate our approach on broadcast tennis videos. Our approach temporally segments the rallies in the video and then recognizes the scores from each of the segments, before refining the scores using the knowledge of the tennis scoring system. We finally build an interface to effortlessly retrieve and view the relevant video segments by also automatically tagging the segmented rallies with human accessible tags such as ‘fault’ and ‘deuce’. The efficiency of our approach is demonstrated on broadcast tennis videos from two major tennis tournaments. Secondly, we propose an end-to-end framework for automatic attributes tagging and analysis of broadcast sport videos. We use commonly available broadcast videos of badminton matches and, un- like previous approaches, we do not rely on special camera setups or additional sensors. We propose a method to analyze a large corpus of broadcast videos by segmenting the points played, tracking and recognizing the players in each point and annotating their respective strokes. We evaluate the performance on 10 Olympic badminton matches with 20 players and achieved 95.44% point segmentation accuracy, 97.38% player detection score (mAP@0.5), 97.98% player identification accuracy, and stroke segmentation edit scores of 80.48%. We further show that the automatically annotated videos alone could enable the gameplay analysis and inference by computing understandable metrics such as player’s reaction time, speed, and footwork around the court, etc. Lastly, we adapt our proposed framework for tennis games to mine spatiotemporal and event data from large set of broadcast videos. Our broadcast videos include all Grand Slam matches played between Roger Federer, Rafael Nadal and Novac Djokovic. Using this data, we demonstrate that we can infer the playing styles and strategies of tennis players. Specifically, we study the evolution of famous rivalries of Federer, Nadal, and Djokovic across time. We compare and validate our inferences with expert opinions of their playing styles.

 

Year of completion:  June 2019
 Advisor : C V Jawahar

Related Publications

  • Anurag Ghosh, Suriya Singh and C.V. Jawahar -  Towards Structured Analysis of Broadcast Badminton Videos IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA, 2018 [PDF]

  • Anurag Ghosh and C. V. Jawahar -  SmartTennisTV: Automatic indexing of tennis videos National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]

  • Anurag Ghosh, Yash Patel, Mohak Sukhwani and C.V. Jawahar - Dynamic Narratives for Heritage Tour 3rd Workshop on Computer Vision for Art Analysis (VisART), European Conference on Computer Vision (ECCV), 2016 [PDF]


Downloads

thesis

Efficient Annotation and Knowledge Distillation for Semantic Segmentation

 


Tejaswi Kasarla

Abstract

Scene understanding is a very crucial task in computer vision. With the recent advances in research of autonomous driving, road scene segmentation has become a very important problem to solve. Few exclusive datasets were also proposed to address this research problem. Advances in other methods of deep learning gave rise to deeper and computationally heavy models and were subsequently adapted to solve the problem of semantic segmentation. In general, larger datasets and heavier models are a trademark of segmentation problems. This thesis is a direction to propose methods to reduce annotation effort of datasets and computational efforts of models in semantic segmentation. In the first part of the thesis, we introduce the general sub-domain of semantic segmentation and explain the challenges of dataset annotation effort. We also explain a sub-field of machine learning called active learning which relies on a smart selection of data to learn from, thereby performing better with lesser training. Since obtaining labeled data for semantic segmentation is extremely expensive, active learning can play a huge role in reducing the effort of obtaining that labeled data. We also discuss the computational complexity of the training methods and give an introduction to a method called knowledge distillation which relies on transferring the knowledge from complex objective functions of deeper and heavier models to improve the performance of a simpler model. Thus, we will be able to deploy better performing simpler models on computationally restrained systems such as self-driving cars or mobile phones. The second part is a detailed work on active learning in which we propose a Region-Based Active Learning method to efficiently obtain labelled annotations for semantic segmentation. The advantages of this method include: (a) using lesser labeled data to train a semantic segmentation model (b) proposing efficient models to obtaining the labeled data thereby reducing the annotation effort. (c) transferring model trained on one dataset to another unlabeled dataset with minimal annotation cost which significantly improves the accuracy. Studies on Cityscapes and Mapillary datasets show that this method is effective to reduce the dataset annotation cost. The third part is the work on knowledge distillation and we propose a method to improve the performance of faster segmentation models, which usually suffer from low accuracies due to model compression. This method is based on distilling the knowledge during the training phase from models with high performance to models with low performance through a teacher-student type of architecture. The knowledge distillation allows the student to mimic the performance of the teacher network, thereby improving its performance during inference. We quantitatively and qualitatively validate the results on three different networks for Cityscapes dataset. Overall, in this thesis, we propose methods to reduce the bottlenecks of semantic segmentation tasks, specifically related to road scene understanding: the high cost of obtaining labeled data and the high computational requirement for better performance. We show that smart annotation of partial regions in data can reduce the effort of obtaining labels. We also show that extra supervision from a better performing model can improve the accuracies for computationally faster models which are useful for real-time deployment.

 

Year of completion:  June 2019
 Advisor : Dr. Vineeth N. Balasubramanian,C V Jawahar

Related Publications

  • Tejaswi Kasarla, G. Nagendar, Guruprasad M Hegde,Vineeth N. Balasubramanian and C V Jawahar - Region-based Active Learning for Efficient Labeling in Semantic Segmentation IEEE Winter Conference on Applications of Computer Vision (WACV 2019)Hilton Waikoloa Village, Hawaii


Downloads

thesis

Handwritten Word Recognition for Indic & Latin scripts using Deep CNN-RNN Hybrid Networks

 


Kartik Dutta

Abstract

Handwriting recognition (HWR) in Indic scripts is a challenging problem due to the inherent subtleties in the scripts, cursive nature of the handwriting and similar shape of the characters. Though a lot of research has been done in the field of text recognition, the focus of the vision community has been primarily on English. Furthermore, a lack of publicly available handwriting datasets in Indic scripts has also affected the development of handwritten word recognizers and made direct comparisons across different methods an impossible task in the field. Also, due to this lack annotated data, it becomes challenging to train deep neural networks which contain millions of parameters. These facts are quite surprising considering the fact that there are over 400 million Devanagari speakers in India alone. We first tackle the problem of lack of annotated data using various approaches. We describe a framework for annotating large scale of handwritten word images without the need for manual segmentation and label re-alignment. Two new word level handwritten datasets for Telugu and Devanagari are released which were created using the above mentioned framework. We synthesize synthetic datasets containing millions of realistic images with a large vocabulary for the purpose of pre-training using publicly available Unicode fonts. Later on, pre-training using data from Latin script is also shown to be useful to overcome the shortage of data.Capitalizing on the success of the CNN-RNN Hybrid architecture, we propose various improvements in the architecture and it’s training pipeline to make it even more robust for the purposes of handwriting recognition. We now change the network to use a Resnet-18 like structure for the convolutional part along with adding a spatial transformer network layer. We also use an extensive data augmentation scheme involving multi-scale, elastic, affine and test time distortion. We outperform the previous state-of-the-art methods on existing benchmark datasets for both Latin and Indic scripts by quite some margin. We perform an ablation study to empirically show how the various changes we made to the original CNN-RNN Hybrid network have improved its performance with respect to handwriting recognition. We dive deeper into the working of our networks convolutional layers and verify the robustness of convolutional-features through layer visualizations. We hope the release of the two datasets mentioned in this work along with the architecture and training techniques that we have used instill interest among fellow researchers of the field.

 

Year of completion:  Mar 2019
 Advisor : C V Jawahar

Related Publications

  • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Improving CNN-RNN Hybrid Networks for Handwriting Recognition The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

  • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Towards Spotting and Recognition of Handwritten Words in Indic Scripts The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

  • Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar - Localizing and Recognizing Text in Lecture Videos The 16th International Conference on Frontiers in Handwriting Recognition,Niagara Falls, USA [PDF]

  • Praveen Krishnan, Kartik Dutta and C. V. Jawahar - Word Spotting and Recognition using Deep Embedding, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]

  • Kartik Dutta,Praveen Krishnan, Minesh Mathew and C. V. Jawahar - Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset, Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 24-27 April 2018, Vienna, Austria. [PDF]

  • Kartik Dutta, Praveen Krishnan, Minesh Mathew, and C. V. Jawahar -  Towards Accurate Handwritten Word Recognition for Hindi and Bangla National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]


Downloads

thesis

Graph-Spectral Techniques For Analyzing Resting State Functional Neuroimaging Data


Srinivas Govinda Surampudi

Abstract

Human brain is undoubtedly the most magnificent yet delicate arrangement of tissues. This serves as the seat of such a wide span of cognitive functions and behaviors. The neuronal activities within every neuron, collectively observed over network(s) of these interconnected neurons, manifest themselves into patterns at multiple scales of observations. Many brain imaging techniques such as fMRI, EEG, MEG etc. measure these patterns as electro-magnetic responses. These patterns supposedly play the role of unique neuronal signatures of the vast repertoire of cognitive functions. Experimentally, it is observed that different neuronal populations participate coherently to generate a signature for a cognitive function. These signatures could be investigated at the micro-scale corresponding to responses of individual neurons to external-current stimuli, at the meso-scale related to populations of neurons that show similar metabolic activities and in turn these populations, also known as regions of interest (ROIs), communicate via complex arrangement of anatomical fiber pathways leading to signatures at the macro-scale. The holy grail of neuroscience is thus to computationally decipher the interplay of this complex anatomical network and the complex functional patterns corresponding to the cognitive behaviors at various scales/levels. Each scale of observation, depending on the instruments of measurement, has its own rich spatiotemporal dynamics that interacts with higher and lower levels in complex ways. Large-scale anatomical fiber pathways are represented in a matrix that accounts for inter-population fiber strength known as structural connectivity (SC) matrix. One of the popular modalities to capture large-scale functional dynamics is resting-state fMRI, and statistical dependence between these inter-population BOLD signals is captured in functional connectivity (FC) matrix. There are many models that provide computational accounts for the relationship between these two matrices as deciphering this relationship will provide the mechanism by which cognitive functions arise over the structure. On one hand, there are many non-linear dynamical models that describe the biological phenomenon well but are expensive and intractable. On the other hand there are linear models that compromise on the biological richness but are analytically feasible. This thesis is concerned with the analysis of the temporal dynamics of observed resting-state fMRI signals over the large-scale human cortex. We provide a model that has a bio-physical explanation as well as an analytical expression for FC given SC. Reaction-diffusion systems provide a computational framework for the emergence of excitatory-inhibitory activities at the populations as reactions and their interactions as diffusion over space and time. The spatio-temporal dynamics of the BOLD signal governed by this framework is constrained with respect to the anatomical connections thereby separating the spatial and temporal dynamics. Covariancematrix of this signal is estimated thus getting an estimate of the functional connectivity matrix. The covariance matrix or the BOLD signal in general is expressed in terms of the graph-diffusion-kernels thus forming an analytically elegant expression. Most importantly, the model for FC abstracts out biological details and works in the realm of spectral graph theoretic constructs providing the necessary ease for computational analysis. As this model learns the combination parameters of multiple diffusion kernels and kernels themselves, it is called Multiple Kernel Learning (MKL) model. Apart from superior quantitative performance, the model parameters may act as biomarkers for various cognitive studies. Albeit, the model parameters are learned for a cohort, the model preserves subject-specificity. These parameters can be used as a measure for inter-group differences and dissimilarity identification as has been employed for age-group identification as an example in this thesis. Essentially MKL model partitions FC into two constituents: influence of the underlying anatomical structure into diffusion kernels and the cognitive theme of temporal structure into the model parameters, thus predicting FCs specific to subjects within the cognitive conditions of the cohort. Even though MKL is a cohort based model, it maintains sensitivity towards anatomy. Performance of the model drastically drops down with alterations in SC and model parameters, but does not overfit to the cohort. Resting state fMRI BOLD signals have been observed to show non-stationary dynamics. Such multiple spatio-temporal patterns, represented as dynamic FC matrices, are observed to be cyclically repeating in time motivating use of a generic clustering scheme to identify latent states of dynamics. We propose a novel solution that learns parameters specific to the dynamic states using a graph-theoretic model (temporal-Multiple Kernel Learning, tMKL) and finally predicts the grand average FC of the unseen subjects by leveraging a state transition Markov model. We discover the underlying lower-dimensional manifold of the temporal structure which is further parameterized as a set of local density distributions, or latent transient states. tMKL thus learns a mapping between anatomical graph and the temporal structure. Unlike MKL, tMKL model obeys state-specific optimization formulation and yet performs at par or better than MKL for predicting the grand average FC. Like MKL, tMKL also shows sensitivity towards subject-specific anatomy. Finally, both tMKL and MKL models outperform the state-of-the-art in their own ways by providing bio-physical insights.

 

Year of completion:  Sep 2018
 Advisor : Avinash Sharma And Dipanjan roy

Related Publications

  • Viral Parekh, Ramanathan Subramanian, Dipanjan Roy C.V. Jawahar - An EEG-based Image Annotation System - National Conference on Computer Vision Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2017 [PDF]


Downloads

thesis

Exploration of multiple imaging modalities for Glaucoma detection


Jahnavi Gamalapati S

Abstract

Glaucoma is an eye disease characterized by weakening of nerve cells often resulting in a permanent loss of vision. Glaucoma progression can occur without any physical indication to patients. Hence, early diagnosis of Glaucoma is recommended for preventing the permanent damage to vision. Early Glaucoma is often characterized by thinning of the Retinal Nerve Fiber Layer which is commonly called as RNFL defect (RNFLD). Computer-aided diagnosis (CAD) of eye diseases is popular and is based on automated analysis of fundus images. CAD solutions for diabetic retinopathy have reached more maturity than for glaucoma as the latter is more difficult to detect from fundus images. This is due to the fact that nerve damage appears in the form of subtle change in the background around optic disc. SD-OCT (Spectral Domain - Optical Coherence Tomography), a recently introduced modality, helps to capture 3D information of retina. Hence, it is more reliable for detecting nerve damage in retina compared with fundus imaging. However, a wide usage of OCT is limited due to cost per scan, time and ease of acquisition. This thesis focuses on integrating information from multiple modalities (OCT and fundus) for improving retinal nerve fibre layer defect or detection of RNFLD from fundus images. We examine two key problems in the context of CAD development: i) spatial alignment or registration of two modalities of imaging, namely, 2D fundus and 3D OCT volume images. This can pave way to integrate information across the modalities. Multimodal registration is challenging because of the varied Field of View and noise levels across the modalities. We propose a computationally efficient registration algorithm which is capable of handling complementary nature of modalities. Extensive qualitative and quantitative evaluations are performed to show the robustness of proposed method. ii) Detection of RNFLD from fundus images with good accuracy. We propose a novel CAD solution which utilises information from the 2 modalities for learning a model and uses it to predict the presence of RNFLD from fundus images. The problem is posed as learning from 2 modalities (fundus, OCT images) and predicting from only one (fundus images) with the other (OCT) as missing data. Our solution consists of a deep neural network architecture which learn modality independent representations. In the final part of the thesis we explore the scope of a new imaging modality angiography-Optical Coherence Tomography (A-OCT) in diagnosing Glaucoma. Two case studies are reported which help in understanding the progression of Retinal Nerve Fiber Layer thickness, Capillary Density in normaland glaucoma effected patients. The experiments on new modality has shown potential for considering it as a reliable biomarker along with existing modalities.

 

Year of completion:  July 2018
 Advisor : Jayanthi Sivaswamy

Related Publications

  • Tarannum Mansoori, JS Gamalapati and Jayanthi Sivaswamy - Radial Peripapillary Capillary Density Measurement Using Optical Coherence Tomography Angiography in Early Glaucoma Journal of glaucoma 26.5 (2017): 438-443. [PDF]

  • Pujitha Appan K, Jahnavi Gamalapati S and Jayanthi Sivaswamy - Detection of neovascularization in retinal images using semi-supervised learning Biomedical Imaging (ISBI 2017), 2017 IEEE 14th International Symposium on. IEEE, 2017. [PDF]

  • T Mansoori, JS Gamalapati and Jayanthi Sivaswamy - Measurement of radial peripapillary capillary density in the normal human retina using optical coherence tomography angiography Journal of glaucoma 26.3 (2017): 241-246. [PDF]


Downloads

thesis

More Articles …

  1. Learning Deep and Compact Models for Gesture Recognition
  2. Gender Differences in Facial Emotion Perception for User Profiling via Implicit Behavioral Signals
  3. Machine Learning for Source-code Plagiarism Detection
  4. Tackling Low Resolution for Better Scene Understanding
  • Start
  • Prev
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • Next
  • End
  1. You are here:  
  2. Home
  3. Research
  4. Thesis
  5. Thesis Students
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.