On the Democratization of Realistic 3D Head Avatar Generation and Reconstruction

Pranav Manu

Abstract

The need for photorealistic head avatars has risen in the past decades, owing to the rising interest in the AR/VR media formats. An accurate representation of the head will be required in the near future, which is essential to facilitate communication between users, essentially enabling telepresence. The need for an improved in-person form of remote communication was made more clear during the recent COVID-19 pandemic and the ensuing lockdown, where millions of people had to stay away from their families and workplace for an extensive period of time. Besides, realistic facial avatars have proved immensely helpful in the movie and gaming industry, where they have often been used to either modify the actors’ appearance itself, or to drive an entirely virtual but realistically looking digital character, depending on the demands of the narrative.

Capturing and reconstructing a realistic-looking head-avatar is not trivial, and requires an expensive setup of multiple synced cameras and lights, and a mathematical understanding of how light interacts with the skin, hair, cornea, etc. The capture of each subject is laborious and time-consuming. The creation of digital faces that are indistinguishable from real ones is a formidable challenge due to the ”uncanny valley” phenomenon, where even minor deviations from realistic appearance can render a digital face unsettling to human observers. However, to achieve the applications of realistic head avatars in telepresence and AR/VR, the capture and thus creation of realistic digital replicas must be made accessible. Therefore, a need has arisen to search for methods that can reconstruct and create digital replicas that are photorealistic but also cheap. Our thesis aims to tackle this problem statement in two ways, one from the perspective of digital replica generation and the other from the perspective of creating a digital replica through reconstruction.

Our initial approach to make the creation of digital replicas efficient is a textured head generation method conditioned on a descriptive text. We aim to create a method that can generate a realistic-looking head avatar from a text description in an efficient manner, without requiring the manual intervention of artists or the use of highly specialised software like Blender or Maya. Therefore, it can generate textured head assets within seconds. However, the texture-based synthesis approach suffered from reduced realism because of the effects of baked-in lighting. Therefore, an approach is required that could construct a head avatar along with accurate material properties, such that it can be placed in any environment.

Year of completion:	June 2025
Advisor 1 :	Dr. Avinash Sharma
Advisor 2 :	Prof. PJ Narayanan