Refining 3D Human Digitization Using Learnable Priors

Routhu Snehith Goud

Abstract

The 3D reconstruction of a human from a monocular RGB image is an interesting yet very challenging research problem in the field of computer vision. It has various applications in the movie industry, the gaming industry, and AR/VR applications. Hence it is important to recover detailed geometric 3D reconstruction of humans in order to enhance the realism. The problem is ill-posed in nature owing to high freedom of human pose, self-occlusions, loose clothing, camera viewpoint, and illumination. Though it is possible to generate accurate geometric reconstructions using structured light scanners or multi-view cameras these are cost expensive and require a specific setup e.g., numerous cameras, controlled illumination, etc. With the current advancement of deep learning techniques, the focus of the community shifted to the 3D reconstruction of people from monocular RGB images. The goal of the thesis is toward 3D Human Digitization of people in loose clothing with accurate person-specific details. The problem of reconstruction of people in loose clothing is difficult as the topology of clothing is different from the human body. A multi-layered shape representation called PeeledHuman was able to deal with loose clothing and occlusions but it suffered from discontinuity or distorted body parts in the occluded regions. To overcome this we propose peeled semantic segmentation maps which provide the semantic information of the body parts across multiple peeled layers. These Peeled semantic maps help the network in predicting consistent depth for the pixels belonging to the same body part across different peeled maps. Our proposed Peeled Segmentation maps help in improving reconstruction both quantitatively and qualitatively. Additionally, the 3D semantic segmentation labels have various applications, for example, can be used to extract the clothing from the reconstructed output. The face plays an important role in 3D Human digitization, it gives a person their identity and the realism enhances with high-frequency facial details. However, the face appears in a smaller region of the image which captures the complete body and makes the task of high-fidelity face reconstruction along with the body even more challenging. We reconstruct person-specific facial geometry along with the complete body by incorporating a facial prior and refining it further using our proposed framework. Another common challenge faced by the existing methods is the surface noise i.e, false geometric edges generated because of the textural edges present in the image space. We address this problem by incorporating a wrinkle map prior that distinguishes the geometrical from the textural edges coming from image space. In summary, in this thesis, we address the problem of 3D human digitization from monocular images. We evaluate our proposed solution on various existing 3D human datasets and demonstrate that our proposed solutions outperform the existing state-of-the-art methods. The limitations present in the proposed methods were mentioned and potential solutions on how to address them have been briefly discussed. In the end, some future directions that can be potentially explored based on the proposed solutions have been discussed. Finally, we proposed efficient and robust methods to recover accurate and personalized 3D humans from images.

Year of completion:	December 2022
Advisor :	Avinash Sharma