Deep Learning Frameworks for Human Motion Analysis

Neeraj Battan

Abstract

The human body is a complex structure consisting of multiple organs, muscles and bones. However, in regard to modeling human motion, the information about fixed set of joint locations (with fixed bone length constraint) is sufficient to express the temporal evolution of poses. Thus, one can represent a human motion/activity as temporally evolving skeleton sequence. In this thesis, we primarily explore two key problems related to modeling of human motion, the first is efficient indexing & retrieval of human motion sequences and the second one is automated generation/synthesis of novel human motion sequences given input class prior. 3D Human Motion Indexing & Retrieval is an interesting problem due to the rise of several data-driven applications aimed at analyzing and/or re-utilizing 3D human skeletal data, such as data-driven animation, analysis of sports biomechanics, human surveillance, etc. Spatio-temporal articulations of humans, noisy/missing data, different speeds of the same motion, etc. make it challenging and several of the existing states of the art methods use hand-craft features along with optimization-based or histogrambased comparison in order to perform retrieval. Further, they demonstrate it only for very small datasets and few classes. We make a case for using a learned representation that should recognize the motion as well as enforce a discriminative ranking. To that end, we propose, a 3D human motion descriptor learned using a deep network. Our learned embedding is generalizable and applicable to real-world data - addressing the aforementioned challenges and further enables sub-motion searching in its embedding space using another network. Our model exploits the inter-class similarity using trajectory cues and performs far superior in a self-supervised setting. State of the art results on all these fronts is shown on two large scale 3D human motion datasets - NTU RGB+D and HDM05. In regard to the second research problem of Human Motion Generation/Synthesis, we aimed at long-term human motion synthesis that can aid to human-centric video generation [9] with potential applications in Augmented Reality, 3D character animations, pedestrian trajectory prediction, etc. Longterm human motion synthesis is a challenging task due to multiple factors like long-term temporal dependencies among poses, cyclic repetition across poses, bi-directional and multi-scale dependencies among poses, variable speed of actions, and a large as well as partially overlapping space of temporal pose variations across multiple class/types of human activities. This paper aims to address these challenges to synthesize a long-term (> 6000 ms) human motion trajectory across a large variety of human activity 1Video for Human Motion Synthesis. 2Video for Human Motion Indexing and Retrieval. classes (> 50). We propose a two-stage motion synthesis method to achieve this goal, where the first stage deals with learning the long-term global pose dependencies in activity sequences by learning to synthesize a sparse motion trajectory while the second stage addresses the synthesis of dense motion trajectories taking the output of the first stage. We demonstrate the superiority of the proposed method over SOTA methods using various quantitative evaluation metrics on publicly available datasets. In summary, this thesis successfully addressed two key research problems, namely, efficient human motion indexing & retrieval and long-term human motion synthesis. In doing so, we explore different machine learning techniques that analyze the human motion in a sequence of poses (or called as frames), generate human motion, detect human pose from an RGB image, and index human motion.

Year of completion:	March 2021
Advisor :	Anoop M Namboodiri,Madhava Krishna