Action Recognition using Canonical Correlation Kernels

Introduction

Action recognition has gained significant attention from the computer vision community in recent years. This is a challenging problem, mainly due to the presence of significant camera motion, viewpoint transitions, varying illumination conditions and cluttered backgrounds in the videos. A wide spectrum of features and representations has been used for action recognition in the past. Recent advances in action recognition are propelled by (i) the use of local as well as global features, which have significantly helped in object and scene recognition, by computing them over 2D frames or over a 3D video volume (ii) the use of factorization techniques over video volume tensors and defining similarity measures over the resulting lower dimensional factors. In this project, we try to take advantages of both these approaches by defining a canonical correlation kernel that is computed from tensor representation of the videos. This also enables seamless feature fusion by combining multiple feature kernels.

yt volleyball spiking

Canonical Correlation Kernel (CCK)

We represent a video using a 3D tensor, then 3D tensor is flattened into three 2D tensors/matrices. The CCK, which is based on canonical correlation analysis (CCA) and its kernelized version (KCCA), between the two videos is defined as the sum of correlations between the corresponding flattened matrices obtained from both CCA and KCCA. The overview of the computation of CCK is given below

flow chart New

Results

We tested CCK on four popular action recognition datasets: Cambridge, UCF, KTH and Youtube. CCK kernels are computed over Intensity values, HOG, HOF, SIFT and MBH features. For combining different CCK feature kernels, we have used simple weighted scheme. All these kernels are used with SVM classifier, with one-vs-rest approach.

	Cambridge CCK DT	UCF CCK DT	KTH CCK DT	Youtube CCK DT
Pixels	93.1 -	93.5 -	97.5 -	82.5 -
HOG	89.0 -	83.8 83.8	98.3 86.5	83.2 74.5
SIFT	95.1 -	85.7 -	98.6 -	79.1 -
HOF	95.2 -	81.5 77.6	94.3 93.2	80.4 72.8
MBH	75.1 -	80.4 84.8	98.9 95.0	80.1 83.9
Combined	96.4 -	93.5 88.2	98.9 94.2	86.3 84.2

Comparison of CCK with DT (Dense trajectories) over different features.

	Cambridge	UCF	KTH	Youtube
TCCA [CVPR 2007]	82	-	95.3	-
Product Manifold [CVPR 2010]	88	-	97	-
Tangent Bundle [FG 2011]	91	88	97	-
Dense trajectories [CVPR 2011]	-	88.2	94.2	84.2
*Le et al.* [CVPR 2011]**	-	86.5	93.9	75.8
*Ikizler-Cinbis et al.* [ECCV 2010]**	-	-	-	75.2
*Jiang Wang et al.* [CVPR 2011]**	-	-	93.8	-

Proposed (Using pixel values)	93.1	93.5	97.5	82.5
Proposed (Using multiple features)	96.4	93.5	98.9	86.3
Proposed (CCK feature kernels + DT feature kernels)	97.2	93.5	98.9	86.6

Comparison of CCK with other methods.

Related Publications

G Nagendar, Sai Ganesh, Mahesh Goud, C V Jawahar - Action Recognition using Canonical Correlation Kernels The 11th Asian Conference on Computer Vision, 5-9 Nov. 2012, Daejeon, Korea. [PDF]