Human Shape Capture and Tracking at Home


Abstract

Human body tracking typically requires specialized capture set-ups. Although pose tracking is available in consumer devices like Microsoft Kinect, it is restricted to stick figures visualizing body part detection. In this paper, we propose a method for full 3D human body shape and motion capture of arbitrary movements from the depth channel of a single Kinect, when the subject wears casual clothes. We do not use the RGB channel or an initialization procedure that requires the subject to move around in front of the camera. This makes our method applicable for arbitrary clothing textures and lighting environments, with minimal subject intervention. Our method consists of 3D surface feature detection and articulated motion tracking, which is regularized by a statistical human body model. We also propose the idea of a Consensus Mesh (CMesh) which is the 3D template of a person created from a single view point. We demonstrate tracking results on challenging poses and argue that using CMesh along with statistical body models can improve tracking accuracies. Quantitative evaluation of our dense body tracking shows that our method has very little drift which is improved by the usage of CMesh.

 

IMG

Major Contributions

  • Temporal tracking on monocular depth input for subjects wearing casual clothes, using a statistical human body model.
  • Creation of 3D template mesh (Consensus mesh) of a person using a single Kinect.
  • Simple capture setup of one Kinect, captured dataset and code to be released soon.

Related Publications

  • Gaurav Mishra, Saurabh Saini, Kiran Varanasi and P.J. Narayanan, Human Shape Capture and Tracking at Home, IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA, 2018.. [PDF] [Supp] [ WACV Presentation ]

Code and Dataset:

We will release code and captured dataset for this project soon.

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{Mishra2018,
    title={Human Shape Capture and Tracking at Home},
    author={Mishra, Gaurav and Saini, Saurabh and Varanasi, Kiran and Narayanan, P.~J.},
    booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA},
    pages={10},
    year={2018}
}

Word Spotting in Silent Lip Videos


Abstract

Our goal is to spot words in silent speech videos without explicitly recognizing the spoken words, where the lip motion of the speaker is clearly visible and audio is absent. Existing work in this domain has mainly focused on recognizing a fixed set of words in word-segmented lip videos, which limits the applicability of the learned model due to limited vocabulary and high dependency on the model's recognition performance.
Our contribution is two-fold:
  1. we develop a pipeline for recognition-free retrieval, and show its performance against recognition-based retrieval on a large-scale dataset and another set of out-of-vocabulary words.
  2. We introduce a query expansion technique using pseudo-relevant feedback and propose a novel re-ranking method based on maximizing the correlation between spatio-temporal landmarks of the query and the top retrieval candidates. Our word spotting method achieves 35% higher mean average precision over recognition-based method on large-scale LRW dataset. Finally, we demonstrate the application of the method by word spotting in a popular speech video "The great dictator" by Charlie Chaplin) where we show that the word retrieval can be used to understand what was spoken perhaps in the silent movies.

 

IMG

Related Publications

  • Abhishek Jha, Vinay Namboodiri and C.V. Jawahar, Word Spotting in Silent Lip Videos, IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA, 2018. [ PDF ] [Supp] [ Code To be released Soon ] [ Models To be released soon ]

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{jha2018word,
    title={Word Spotting in Silent Lip Videos},
    author={Jha, Abhishek and Namboodiri, Vinay and Jawahar, C.~V.},
    booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA},
    pages={10},
    year={2018}

}
 

Dataset

Lip Reading in the Wild Dataset: [ External Link ]
Grid Corpus: [ External Link ]
Charlie Chaplin, The Great Dictator speech [ Youtube ]


Towards Structured Analysis of Broadcast Badminton Videos


Abstract

Sports video data is recorded for nearly every major tournament but remains archived and inaccessible to large scale data mining and analytics. It can only be viewed sequentially or manually tagged with higher-level labels which is time consuming and prone to errors. In this work, we propose an end-to-end framework for automatic attributes tagging and analysis of sport videos. We use commonly available broadcast videos of matches and, unlike previous approaches, does not rely on special camera setups or additional sensors. Our focus is on Badminton as the sport of interest.
We propose a method to analyze a large corpus of badminton broadcast videos by segmenting the points played, tracking and recognizing the players in each point and annotating
their respective badminton strokes. We evaluate the performance on 10 Olympic matches with 20 players and achieved 95.44% point segmentation accuracy, 97.38% player detection score (mAP@0.5), 97.98% player identification accuracy, and stroke segmentation edit scores of 80.48%. We further show that the automatically annotated videos alone could enable the gameplay analysis and inference by computing understandable metrics such as player’s reaction time, speed, and footwork around the court, etc.

 

IMG

Related Publications

  • Anurag Ghosh, Suriya Singh and C.V. Jawahar, Towards Structured Analysis of Broadcast Badminton Videos, IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA, 2018. [PDF] [Supp]

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{ghosh2018towards,
    title={Towards Structured Analysis of Broadcast Badminton Videos},
    author={Ghosh, Anurag and Singh, Suriya and Jawahar, C.~V.},
    booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, CA, USA},
    pages={9},
    year={2018}
}
 

Dataset

The annotations can be downloaded from here: Dataset

The annotations are provided as EAF Files, which can be opened in ELAN annotation tool (available at https://tla.mpi.nl/tools/tla-tools/elan/download/). The links to the youtube videos are provided in supplementary (will be required to view the annotations with the video).
We will be releasing the features soon!


SmartTennisTV: Automatic indexing of tennis videos


Abstract

In this paper, we demonstrate a score based indexing approach for tennis videos. Given a broadcast tennis video (BTV), we index all the video segments with their scores to create a navigable and searchable match. Our approach temporally segments the rallies in the video and then recognizes the scores from each of the segments, before refining the scores using the knowledge of the tennis scoring system. We finally build an interface to effortlessly retrieve and view the relevant video segments by also automatically tagging the segmented rallies with human accessible tags such as ’fault’ and ’deuce’. The efficiency of our approach is demonstrated on BT’s from two major tennis tournaments.

 


Related Publications

  • Anurag Ghosh and C.V. Jawahar, SmartTennisTV: Automatic indexing of tennis videos, National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, IIT Mandi, HP, India, 2017 [PDF] [Slides] ( Best Paper Award Winner )

Bibtex

If you use this work or dataset, please cite :

 @inproceedings{ghosh2017smart,
    title={SmartTennisTV: Automatic indexing of tennis videos},
    author={Ghosh, Anurag and Jawahar, C.~V.},
    booktitle={National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2017), IIT Mandi, HP, India, 2017},
    pages={10},
    year={2017}
}

Dataset

We will be releasing the dataset soon. The link will be provided here.

Find me a sky : a data-driven method for color-consistent sky search & replacement


Abstract

Replacing overexposed or dull skies in outdoor photographs is a desirable photo manipulation. It is often necessary to color correct the foreground after replacement to make it consistent with the new sky. Methods have been proposed to automate the process of sky replacement and color correction. However, many times a color correction is unwanted by the artist or may produce unrealistic results. We propose a data-driven approach to sky-replacement that avoids color correction by finding a diverse set of skies that are consistent in color and natural illumination with the query image foreground. Our database consists of ∼1200 natural images spanning many outdoor categories. Given a query image, we retrieve the most consistent images from the database according to L2 similarity in feature space and produce candidate composites. The candidates are re-ranked based on realism and diversity. We used pre-trained CNN features and a rich set of hand-crafted features that encode color statistics, structural layout, and natural illumination statistics, but observed color statistics to be the most effective for this task. We share our findings on feature selection and show qualitative results and a user-study based evaluation to show the effectiveness of the proposed method.

 

#

Major Contributions

  • A novel approach for aesthetic enhancement of an image with a dull background by replacing it's sky using a data-driven method for sky search and replacement.
  • Curated dataset of 1246 images with interesting and appealing skies.
  • Identified relevant features helpful in color consistent sky search.
  • Quantative and qualitative proof for the hypothesis that images with matching foreground color properties can have interchangeble backgrounds.

Related Publications

  • Saumya Rawat, Siddhartha Gairola, Rajvi Shah, and P J Narayanan Find me a sky : a data-driven method for color-consistent sky search & replacement, International Conference on Multimedia Modeling (MMM), February 5-7, 2018. [ Paper ] [ Slides ]

Dataset:

The dataset consists of 1246 images with corresponding binary masks indicating sky and non-sky regions in the folders image and mask respectively. It has been curated from 415 Flickr images with diverse skies (collected by [1]) and 831 outdoor images curated from the ADE20K Dataset[2]. ADE20K dataset consists of ∼ 22K images with 150 semantic categories like sky, road, grass. The images with sky category were first filtered to a set of ∼6K useful images for which the sky region made > 40% of the total image. These images were manually rated between 1 to 5 for aesthetic appeal of the skies by two human raters and only the images with average scores higher than 3 were added to the final dataset
Download : Dataset


Bibtex

If you use this work or dataset, please cite :
Rawat S., Gairola S., Shah R., Narayanan P.J. (2018) Find Me a Sky: A Data-Driven Method for Color-Consistent Sky Search and Replacement. In: Schoeffmann K. et al. (eds) MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science, vol 10704. Springer, Cham


Associated People

  • Saumya Rawat
  • Siddhartha Gairola
  • Rajvi Shah
  • P J Narayanan

References:

  1. Y.-H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, and M.-H. Yang Sky is not the limit: Semantic aware sky replacement. ACM Trans. Graph, 35(4), 2016.
  2. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba Scene parsing through ade20k dataset, In Proc. IEEE CVPR, 2017.