Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild
Sindhu B Hegde* , K R Prajwal* Rudrabha Mukhopadhyay*, Vinay Namboodiri and C.V. Jawahar
IIIT Hyderabad University of Oxford Univ. of Bath
ACM-MM, 2022
[ Code ] | [ Paper ] | [ Demo Video ]
We address the problem of generating speech from silent lip videos for any speaker in the wild. Previous works train either on large amounts of data of isolated speakers or in laboratory settings with a limited vocabulary. Conversely, we can generate speech for the lip movements of arbitrary identities in any voice without any additional speaker-specific fine-tuning. Our new VAE-GAN approach allows us to learn strong audio-visual associations despite the ambiguous nature of the task.
Abstract
Paper
Demo
--- COMING SOON ---
Contact
- Sindhu Hegde -
This email address is being protected from spambots. You need JavaScript enabled to view it. - K R Prajwal -
This email address is being protected from spambots. You need JavaScript enabled to view it. - Rudrabha Mukhopadhyay -
This email address is being protected from spambots. You need JavaScript enabled to view it.