Fine-Tuning Human Pose Estimation in Videos

Digvijay Singh Vineeth Balasubramanian C. V. Jawahar

Overview

We propose a semi-supervised self-training method for fine-tuning human pose estimations in videos that provides accurate estimations even for complex sequences. We surpass state-of-the-art on most of the datasets used and also show a gain over the baseline on our new dataset of unrestricted sports videos. The self-training model presented has two components: a static Pictorial Structure based model and a dynamic ensemble of exemplars. We present a pose quality criteria that is primarily used for batch selection and automatic parameter selection. The same criteria works as a low-level pose evaluator used in post-processing. We set a new challenge by introducing a full human body-parts annotated complex dataset, CVIT-SPORTS, which contains complex videos from the sports domain. The strength of our method is demonstrated by adapting to videos of complex activities such as cricket-bowling, cricket-batting, football as well as available standard datasets.

Here we release our implementation of [1] for MATLAB software. To read more about the method, check the pdf on the left.

Downloads

Filename	Description	Size
fine_tuning_pose.tar.gz	Matlab code for fine-tuning human pose estimation in videos.	94 MB
README	Description on running the code and other info.	4.0 KB
cvit_sports_videos.tar.gz	CVIT-SPORTS-Videos dataset of 11 video sequences from cricket domain.	66 MB

References

[1] D. Singh, V. Balasubramanian, C. V. Jawahar. Fine-Tuning Human Pose Estimations in Videos . WACV 2016.

[2] Y. Yang, D. Ramanan. Articulated Pose Estimation using Flexible Mixtures of Parts. CVPR 2011.

[3] A. Cherian, J. Marial, K. Alahari, C. Schmid. Mixing Body-Part Sequences for Human Pose Estimation. CVPR 2014.

[4] B. Sapp, D. Weiss, B. Taskar. Parsing Human Motion with Stretchable Models. CVPR 2011.

[5] T. Malisiewicz, A. Gupta, A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond. ICCV 2011.