Action Recognition using Canonical Correlation Kernels


Introduction

Action recognition has gained significant attention from the computer vision community in recent years. This is a challenging problem, mainly due to the presence of significant camera motion, viewpoint transitions, varying illumination conditions and cluttered backgrounds in the videos. A wide spectrum of features and representations has been used for action recognition in the past. Recent advances in action recognition are propelled by (i) the use of local as well as global features, which have significantly helped in object and scene recognition, by computing them over 2D frames or over a 3D video volume (ii) the use of factorization techniques over video volume tensors and defining similarity measures over the resulting lower dimensional factors. In this project, we try to take advantages of both these approaches by defining a canonical correlation kernel that is computed from tensor representation of the videos. This also enables seamless feature fusion by combining multiple feature kernels.

 

yt volleyball spiking

5

2

                                                                                                                                


Canonical Correlation Kernel (CCK)

We represent a video using a 3D tensor, then 3D tensor is flattened into three 2D tensors/matrices. The CCK, which is based on canonical correlation analysis (CCA) and its kernelized version (KCCA), between the two videos is defined as the sum of correlations between the corresponding flattened matrices obtained from both CCA and KCCA. The overview of the computation of CCK is given below

flow chart New

 


Results

We tested CCK on four popular action recognition datasets: Cambridge, UCF, KTH and Youtube. CCK kernels are computed over Intensity values, HOG, HOF, SIFT and MBH features. For combining different CCK feature kernels, we have used simple weighted scheme. All these kernels are used with SVM classifier, with one-vs-rest approach.

 

  Cambridge
CCK   DT
    UCF
CCK   DT
    KTH
CCK   DT
 Youtube
CCK   DT
Pixels   93.1     - 93.5     - 97.5     - 82.5     -
HOG   89.0     - 83.8   83.8 98.3   86.5 83.2   74.5
SIFT   95.1     - 85.7     - 98.6     - 79.1     -
HOF   95.2     - 81.5   77.6 94.3   93.2 80.4   72.8
MBH   75.1     - 80.4   84.8 98.9   95.0 80.1   83.9
Combined   96.4     - 93.5   88.2 98.9   94.2 86.3   84.2

Comparison of CCK with DT (Dense trajectories) over different features.

 

 Cambridge
  UCF  
  KTH  
  Youtube  
TCCA  [CVPR 2007]
82
-
95.3
-
Product Manifold  [CVPR 2010]
88
-
97
-
Tangent Bundle  [FG 2011]
91
88
97
-
Dense trajectories  [CVPR 2011]
-
88.2
94.2
84.2
Le et al.   [CVPR 2011]
-
86.5
93.9
75.8
Ikizler-Cinbis et al.   [ECCV 2010]
-
-
-
75.2
Jiang Wang et al.   [CVPR 2011]
-
-
93.8
-
         
Proposed (Using pixel values)
93.1
93.5
97.5
82.5
Proposed (Using multiple features)
96.4
93.5
98.9
86.3
Proposed (CCK feature kernels + DT feature kernels)
97.2
93.5
98.9
86.6

Comparison of CCK with other methods.


Related Publications

  • G Nagendar, Sai Ganesh, Mahesh Goud, C V Jawahar - Action Recognition using Canonical Correlation Kernels The 11th Asian Conference on Computer Vision, 5-9 Nov. 2012, Daejeon, Korea. [PDF]