Monocular 3D Human Body Reconstruction
Monocular 3D human reconstruction is a very relevant problem due to numerous applications to the entertainment industry, e-commerce, health care, mobile-based AR/VR platforms, etc. However, it is severely ill-posed due to self-occlusions from complex body poses and shapes, clothing obstructions, lack of surface texture, background clutter, single view, etc. Conventional approaches address these challenges by using different sensing systems - marker-based, marker-less multi-view cameras, inertial sensors, and 3D scanners. Although effective, such methods are often expensive and have limited wide-scale applicability. In an attempt to produce scalable solutions, a few have focused on fitting statistical body models to monocular images, but are susceptible to the costly optimization process. Recent efforts focus on using data-driven algorithms such as deep learning to learn priors directly from data. However, they focus on template model recovery, rigid object reconstruction, or propose paradigms that don’t directly extend to recovering personalized models. To predict accurate surface geometry, our first attempt was VolumeNet, which predicted a 3D occupancy grid from a monocular image. This was the first of its kind model for non-rigid human shapes at that time. To circumvent the ill-posed nature of this problem (aggravated by an unbounded 3D representation), we follow the ideology of providing maximal training priors with our unique training paradigms, to enable testing with minimal information. As we did not impose any body-model based constraint, we were able to recover deformations induced by free-form clothing. Further, we extended VolumeNet to PoShNet by decoupling Pose and Shape, in which we learn the volumetric pose first, and use it as a prior for learning the volumetric shape, thereby recovering a more accurate surface. Although volumetric regression enables recovering a more accurate surface reconstruction, they do so without an animatable skeleton. Further, such methods yield reconstructions of low resolution at higher computational cost (regression over the cubic voxel grid) and often suffer from an inconsistent topology via broken or partial body parts. Hence, statistical body models become a natural choice to offset the ill-posed nature of this problem. Although theoretically, they are low dimensional, learning such models has been challenging due to the complex non-linear mapping from the image to the relative axis-angle representation. Hence, most solutions rely on different projections of the underlying mesh (2D/3D keypoints, silhouettes, etc.). To simplify the learning process, we propose the CR framework that uses classification as a prior for guiding the regression’s learning process. Although recovering personalized models with high-resolution meshes isn’t a possibility in this space, the framework shows that learning such template models can be difficult without additional supervision. As an alternative to directly learning parametric models, we propose HumanMeshNet to learn an “implicitly structured point cloud”, in which we make use of the mesh topology as a prior to enable better learning. We hypothesize that instead of learning the highly non-linear SMPL parameters, learning its corresponding point cloud (although high dimensional) and enforcing the same parametric template topology on it is an easier task. This proposed paradigm can theoretically learn local surface deformations that the body model based PCA space can’t capture. Further, going ahead, attempting to produce highresolution meshes (with accurate geometry details) is a natural extension that is easier in 3D space than in the parametric one. In summary, in this thesis, we attempt to address several of the aforementioned challenges and empower machines with the capability to interpret a 3D human body model (pose and shape) from a single image in a manner that is non-intrusive, inexpensive and scalable. In doing so, we explore different 3D representations that are capable of producing accurate surface geometry, with a long-term goal of recovering personalized 3D human models.
|Year of completion: