From Traditional to Modern: Domain Adaptation for Action Classification in Short Social Video Clips
Short internet video clips like vines present a significantly wild distribution compared to traditional video datasets. In this paper, we focus on the problem of unsupervised action classification in wild vines using traditional labeled datasets. To this end, we use a data augmentation based simple domain adaptation strategy. We utilise semantic word2vec space as a common subspace to embed video features from both, labeled source domain and unlabelled target domain. Our method incrementally augments the labeled source with target samples and iteratively modifies the embedding function to bring the source and target distributions together. Additionally, we utilise a multi-modal representation that incorporates noisy semantic information available in form of hash-tags. We show the effectiveness of this simple adaptation technique on a test set of vines and achieve notable improvements in performance.
The distribution of video are targeting is vine.co. These are recorded by the users under unconstrained environment. These videos contain significant camera shakes, lighting variability, abrupt shots etc. The challenge is to utilise an existing dataset for action videos to significantly gather relevant vines without investing manual labour.
Another set of challenge is to merge visual, textual and hash-tag information of a vine to perform the above stated segregation.
- We attempt to solve the problem of classifying action in vines by adapting classifiers trained for source dataset to target dataset. This we perform by iteratively selecting high confidence videos and modifying the learnt embedding function.
- We also provide the 3000 vine videos used in our work along with their hash-tags as provided by the uploaders.
|Code.tar||Link||Code for running the main classification along with other utility programs|
|Vine.tar||Link||Download link for vines|