CVIT Projects

Towards Structured Analysis of Broadcast Badminton Videos

Abstract

Sports video data is recorded for nearly every major tournament but remains archived and inaccessible to large scale data mining and analytics. It can only be viewed sequentially or manually tagged with higher-level labels which is time consuming and prone to errors. In this work, we propose an end-to-end framework for automatic attributes tagging and analysis of sport videos. We use commonly available broadcast videos of matches and, unlike previous approaches, does not rely on special camera setups or additional sensors. Our focus is on Badminton as the sport of interest.
We propose a method to analyze a large corpus of badminton broadcast videos by segmenting the points played, tracking and recognizing the players in each point and annotating
their respective badminton strokes. We evaluate the performance on 10 Olympic matches with 20 players and achieved 95.44% point segmentation accuracy, 97.38% player detection score (mAP@0.5), 97.98% player identification accuracy, and stroke segmentation edit scores of 80.48%. We further show that the automatically annotated videos alone could enable the gameplay analysis and inference by computing understandable metrics such as player’s reaction time, speed, and footwork around the court, etc.

Related Publications

Anurag Ghosh, Suriya Singh and C.V. Jawahar, Towards Structured Analysis of Broadcast Badminton Videos, IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA, 2018. [PDF] [Supp]

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{ghosh2018towards,
    title={Towards Structured Analysis of Broadcast Badminton Videos},
    author={Ghosh, Anurag and Singh, Suriya and Jawahar, C.~V.},
    booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA},
    pages={9},
    year={2018}
}

Dataset

The annotations can be downloaded from here: Dataset

The annotations are provided as EAF Files, which can be opened in ELAN annotation tool (available at https://tla.mpi.nl/tools/tla-tools/elan/download/). The links to the youtube videos are provided in supplementary (will be required to view the annotations with the video).
We will be releasing the features soon!

Find me a sky : a data-driven method for color-consistent sky search & replacement

Abstract

Replacing overexposed or dull skies in outdoor photographs is a desirable photo manipulation. It is often necessary to color correct the foreground after replacement to make it consistent with the new sky. Methods have been proposed to automate the process of sky replacement and color correction. However, many times a color correction is unwanted by the artist or may produce unrealistic results. We propose a data-driven approach to sky-replacement that avoids color correction by finding a diverse set of skies that are consistent in color and natural illumination with the query image foreground. Our database consists of ∼1200 natural images spanning many outdoor categories. Given a query image, we retrieve the most consistent images from the database according to L2 similarity in feature space and produce candidate composites. The candidates are re-ranked based on realism and diversity. We used pre-trained CNN features and a rich set of hand-crafted features that encode color statistics, structural layout, and natural illumination statistics, but observed color statistics to be the most effective for this task. We share our findings on feature selection and show qualitative results and a user-study based evaluation to show the effectiveness of the proposed method.

Major Contributions

A novel approach for aesthetic enhancement of an image with a dull background by replacing it's sky using a data-driven method for sky search and replacement.
Curated dataset of 1246 images with interesting and appealing skies.
Identified relevant features helpful in color consistent sky search.
Quantative and qualitative proof for the hypothesis that images with matching foreground color properties can have interchangeble backgrounds.

Related Publications

Saumya Rawat, Siddhartha Gairola, Rajvi Shah, and P J Narayanan Find me a sky : a data-driven method for color-consistent sky search & replacement, International Conference on Multimedia Modeling (MMM), February 5-7, 2018. [ Paper ] [ Slides ]

Dataset:

The dataset consists of 1246 images with corresponding binary masks indicating sky and non-sky regions in the folders image and mask respectively. It has been curated from 415 Flickr images with diverse skies (collected by [1]) and 831 outdoor images curated from the ADE20K Dataset[2]. ADE20K dataset consists of ∼ 22K images with 150 semantic categories like sky, road, grass. The images with sky category were first filtered to a set of ∼6K useful images for which the sky region made > 40% of the total image. These images were manually rated between 1 to 5 for aesthetic appeal of the skies by two human raters and only the images with average scores higher than 3 were added to the final dataset
Download : Dataset

Bibtex

If you use this work or dataset, please cite :
Rawat S., Gairola S., Shah R., Narayanan P.J. (2018) Find Me a Sky: A Data-Driven Method for Color-Consistent Sky Search and Replacement. In: Schoeffmann K. et al. (eds) MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science, vol 10704. Springer, Cham

Associated People

Saumya Rawat
Siddhartha Gairola
Rajvi Shah
P J Narayanan

References:

Y.-H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, and M.-H. Yang Sky is not the limit: Semantic aware sky replacement. ACM Trans. Graph, 35(4), 2016.
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba Scene parsing through ade20k dataset, In Proc. IEEE CVPR, 2017.

HWNet - An Efficient Word Image Representation for Handwritten Documents

Overview

We propose a deep convolutional neural network named HWNet v2 (successor to our earlier work [1]) for the task of learning efficient word level representation for handwritten documents which can handle multiple writers and is robust to common forms of degradation and noise. We also show the generic nature of our representation and architecture which allows it to be used as off-the-shelf features for printed documents and building state of the art word spotting systems for various languages.

t-SNE embedding of word image representation.

Major Highlights :

We show an efficient usage of synthetic data which is practically available free of cost to pre-train a deep network. As part of this work, build and released synthetic handwritten dataset of 9 million word images referred as IIIT-HWS dataset.
We present an adapted version of ResNet-34 architecture with ROI pooling which learns discriminative features with variable sized word images.
We introduce realistic augmentation of training data which includes multiple scales and elastic distortion which mimics the natural process of handwriting.
Our representation leads to state of the art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size.

Architecture

( Left )Flowchart showing the transfer learning process where we first pre-train the network on synthetic data and later fine-tune it on real corpus.
( Right )The HWNet features are taken from the penultimate layer of the network.

Codes and Dataset :

For codes visit GitHub Project: https://github.com/kris314/hwnet
IIIT-HWS Dataset: http://cvit.iiit.ac.in/research/projects/cvit-projects/matchdocimgs#iiit-hws

Visualizations

(left )The receptive fields having the maximum activations at a particular layer and a neuron in a controlled setting of input
( Right )Four possible reconstructions of sample word images shown in columns. These are re-constructed from the representation of final layer of HWNet.

Qualitative results of word spotting. In each row, the left most image is the query and remaining are the top
retrieved word images based on features similarity.

Contact

Praveen Krishnan

Unconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks

Abstract

Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels. This discards the need to segment the input image into its constituent characters/glyphs, which is often arduous for scripts like Urdu. Furthermore, past and future contexts modelled by bidirectional recurrent layers aids the transcription. We outperform previous state-of-theart techniques on the synthetic UPTI dataset. Additionally, we publish a new dataset curated by scanning printed Urdu publications in various writing styles and fonts, annotated at the line level. We also provide benchmark results of our model on this dataset.

Major Contributions

Establish new state-of-the-art for Urdu OCR.
Release IIIT-Urdu OCR Dataset and ascertain it's superiority in terms of complexity as compared to previous benchmark datasets.
Provide benchmark dataset and results for IIIT-Urdu OCR Dataset.

Related Publications

Mohit Jain, Minesh Mathew and C.V. Jawahar, Unconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks, 4th Asian Conference on Pattern Recognition (ACPR 2017), Nanjing, China, 2017. [PDF]

Downloads

IIIT-Urdu OCR Dataset
- Curated by scanning multiple Urdu print books and magazines.
- Contains 1610 Urdu OCR line images.
- Images are annotated at the line level.

Download : Dataset ( 49.2 MBs ) || Readme

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{jain2017unconstrained,
    title={Unconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks},
    author={Jain, Mohit and Mathew, Minesh and Jawahar, C.~V.},
    booktitle={4th Asian Conference on Pattern Recognition (ACPR 2017), Nanjing, China},
    pages={6},
    year={2017}
  }

Associated People

Unconstrained Scene Text and Video Text Recognition for Arabic Script

Abstract

Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and AcTiV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesizing millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the CRNN model which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.

Fig 1: Examples of Arabic Scene Text and Video Text recognized by our model.
Our work deals only with the recognition of cropped words/lines. The bounding boxes were provided manually.

Major Contributions

Establish new state-of-the-art for Arabic Video Text Recognition.
Provide benchmark for Arabic Scene Text Recognition.
Release new dataset (IIIT-Arabic) for Arabic Scene Text Recognition task.

Related Publications

Mohit Jain, Minesh Mathew and C.V. Jawahar, Unconstrained Scene Text and Video Text Recognition for Arabic Script, Proceedings of 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 2017. [PDF]

Downloads

IIIT-Arabic Dataset
- Curated by downloading freely available images containing Arabic script from Google Images.
- Contains 306 full images containing Arabic and English script.
- The full images are annotated at the word-level rendering 2198 Arabic and 2139 English word images.

Download : Dataset (108.2 MBs) || Readme

Bibtex

If you use this work or dataset, please cite :

@InProceedings{JainASAR17,
  author    = "Jain, M., Mathew, M. and Jawahar, C.~V.",
  title     = "Unconstrained Scene Text and Video Text Recognition for Arabic Script",
  booktitle = "1st International Workshop on Arabic Script Analysis and Recognition, Nancy, France",
  year      = "2017",
}

Towards Structured Analysis of Broadcast Badminton Videos

Abstract

Related Publications

Bibtex

Dataset

Find me a sky : a data-driven method for color-consistent sky search & replacement

Abstract

Major Contributions

Related Publications

Dataset:

Bibtex

Associated People

References:

HWNet - An Efficient Word Image Representation for Handwritten Documents

Overview

Major Highlights :

Architecture

Visualizations

Related Papers

Contact

Unconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks

Abstract

Major Contributions

Related Publications

Downloads

Bibtex

Associated People

Unconstrained Scene Text and Video Text Recognition for Arabic Script

Abstract

Major Contributions

Related Publications

Downloads

Bibtex

Associated People

More Articles …