CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us
  • Login

Efficient Identity-preserving Face Swapping in Scenic Sketches


Ankith Varun J

Abstract

We present an efficient framework for identity-preserving face swapping into scenic portrait sketches. First, we analyze latent identity representations via a StyleGAN-based face cartoonization pipeline. Our analysis reveals domain-dependent shifts in StyleGAN’s identity encodings when mapping photos to sketches, underscoring the need for cross-domain consistency. Second, we introduce Portrait Sketching StyleGAN (PS-StyleGAN), a novel GAN architecture with attentive affine-transform blocks. These blocks learn to modulate a pretrained StyleGAN’s style codes by joint content-and-style attention, thereby learning style-specific identity transformations. PS-StyleGAN requires only a few photo-sketch pairs (and short training) to learn each style, making it broadly adaptable. Thus, PS-StyleGAN produces editable, expressive portrait sketches that faithfully preserve the input identity. Third, we extend diffusion-based generative modeling by adapting InstantID for sketch synthesis. Our diffusion module integrates strong semantic identity embeddings and landmark guidance (inspired by InstantID’s IdentityNet) to steer high-fidelity portrait generation. This yields personalized sketches with markedly improved identity fidelity while requiring only zero-shot (tuning-free) personalization. Finally, motivated by recent diffusion-based face-swapping advances, we build a real-time end-to-end pipeline. By encoding facial identity and pose as conditioning inputs and optimizing the diffusion process for speed, our system can seamlessly embed user faces into tourist-style sketches on the fly. Extensive evaluations on standard benchmarks confirm that our methods significantly enhance identity preservation and visual quality compared to prior approaches, advancing the state of the art in generative portrait stylization and face swapping

 

Year of completion:   October 2025
 Advisor :  Dr. Anoop Namboodiri

Related Publications


    Downloads

    Ankith Varun Thesis

    A Computational Framework for Ink Bleed Suppression in Handwritten Document Images


    Shrikant Baronia

    Abstract

    Ink-bleed is a common form of degradation in handwritten document images, where ink from the reverse side or adjacent lines seeps into the visible content, impairing readability and adversely affecting downstream tasks such as Optical Character Recognition (OCR). Traditional image processing techniques often fall short in preserving the fine structure of handwritten strokes while removing bleedthrough noise.

    This thesis presents a machine learning-based computational framework for ink bleed suppression using a layer separation approach. The proposed method models the document image as a combination of content and bleed-through layers and employs a Dual Layer Markov Random Field (DL-MRF) architecture to learn the separation in a supervised setting. A synthetic dataset with controlled bleed artifacts was constructed for training, along with augmentation strategies to simulate various bleed intensities and patterns.

    The model was evaluated on both synthetic and real handwritten document images. The results demonstrate significant improvement over traditional filtering techniques and baseline learning models, preserving content integrity while effectively reducing ink bleed.

    This work contributes towards robust document image restoration, with applications in digital archiving, historical manuscript preservation, and pre-processing pipelines for handwriting analysis systems.

     

    Year of completion:  June 2025
     Advisor : Prof. Anoop M Namboodiri

    Related Publications


      Downloads

      AI Assisted Screening of Oral Potentially Malignant Disorders Using Smartphone Photographic Images - An Indian Cohort Study


      Talwar Vivek Jayant

      Abstract

      The escalating prevalence of Oral Potentially Malignant Disorders (OPMDs) and oral cancer in lowand middle-income countries presents a critical challenge, exacerbated by limited resources that hinder population screening in remote areas. The study evaluates the efficacy of artificial intelligence (AI) and digital imaging diagnostics as tools for OPMD detection in the Indian population, utilizing smartphone-captured oral cavity images. Trained front-line healthcare workers (FHWs) contributed a dataset comprising 1,120 suspicious and 1,058 non-suspicious images. Various deep-learning models, including DenseNets and Swin Transformers, were assessed for image-classification performance. The best-performing model was then tested on an independent external set of 440 images collected by untrained FHWs. DenseNet201 and Swin Transformer (base) models exhibited high classification accuracy on the internal test set, achieving F1-scores of 0.84 (CI 0.79–0.89) and 0.83 (CI 0.78–0.88). However, performance declined on the external set—characterized by significant variation in image quality—with DenseNet201 yielding the highest F1-score of 0.73 (CI 0.67–0.78). The AI model demonstrates potential for identifying suspicious versus non-suspicious oral lesions via photographic images. This image-based solution holds promise for facilitating early screening, detection, and timely referral for OPMDs.

       

      Year of completion:  June 2025
       Advisor 1 : Dr. P.K. Vinod
       Advisor 2 : Prof. C.V. Jawahar

      Related Publications


        Downloads

        On the Democratization of Realistic 3D Head Avatar Generation and Reconstruction


        Pranav Manu

        Abstract

        The need for photorealistic head avatars has risen in the past decades, owing to the rising interest in the AR/VR media formats. An accurate representation of the head will be required in the near future, which is essential to facilitate communication between users, essentially enabling telepresence. The need for an improved in-person form of remote communication was made more clear during the recent COVID-19 pandemic and the ensuing lockdown, where millions of people had to stay away from their families and workplace for an extensive period of time. Besides, realistic facial avatars have proved immensely helpful in the movie and gaming industry, where they have often been used to either modify the actors’ appearance itself, or to drive an entirely virtual but realistically looking digital character, depending on the demands of the narrative.

        Capturing and reconstructing a realistic-looking head-avatar is not trivial, and requires an expensive setup of multiple synced cameras and lights, and a mathematical understanding of how light interacts with the skin, hair, cornea, etc. The capture of each subject is laborious and time-consuming. The creation of digital faces that are indistinguishable from real ones is a formidable challenge due to the ”uncanny valley” phenomenon, where even minor deviations from realistic appearance can render a digital face unsettling to human observers. However, to achieve the applications of realistic head avatars in telepresence and AR/VR, the capture and thus creation of realistic digital replicas must be made accessible. Therefore, a need has arisen to search for methods that can reconstruct and create digital replicas that are photorealistic but also cheap. Our thesis aims to tackle this problem statement in two ways, one from the perspective of digital replica generation and the other from the perspective of creating a digital replica through reconstruction.

        Our initial approach to make the creation of digital replicas efficient is a textured head generation method conditioned on a descriptive text. We aim to create a method that can generate a realistic-looking head avatar from a text description in an efficient manner, without requiring the manual intervention of artists or the use of highly specialised software like Blender or Maya. Therefore, it can generate textured head assets within seconds. However, the texture-based synthesis approach suffered from reduced realism because of the effects of baked-in lighting. Therefore, an approach is required that could construct a head avatar along with accurate material properties, such that it can be placed in any environment.

         

        Year of completion:  June 2025
         Advisor 1 : Dr. Avinash Sharma
         Advisor 2 : Prof. PJ Narayanan

        Related Publications


          Downloads

          Seeing, Describing and Remembering: A Study on Audio Descriptions and Video Memorability


          Eshika Khandelwal

          Abstract

          The human brain undergoes continuous structural changes throughout the lifespan, driven by a complex interplay of aging processes, environmental influences, and disease-related mechanisms. Patterns of structural change—particularly atrophy associated with tissue loss and shrinkage—emerge gradually over time and are observable using medical imaging techniques. While these changes are shaped by common biological mechanisms, they are also highly individualized, influenced by factors such as lifestyle, and neurological conditions like Alzheimer’s Disease (AD), Parkinson’s disease, tumors, and stroke. Understanding the progression of these changes—both at the individual level and across populations—is critical for advancing our knowledge of healthy aging and the dynamics of neurodegenerative disease.

          To study how brain structure evolves over time, researchers rely on longitudinal neuroimaging: repeated imaging of the same individuals at multiple timepoints. Unlike cross-sectional imaging, which captures a single snapshot per subject, longitudinal scans provide a temporal sequence that enables direct observation of anatomical trajectories. These sequences allow for the measurement of rates of change, identification of early biomarkers, and modeling of disease progression in a subject-specific manner.

          However, acquiring complete longitudinal datasets in practice remains challenging. Subject dropout, missed clinical visits, and protocol variability often result in missing scans, interrupting the temporal continuity required for accurate modeling. These gaps limit the effectiveness of methods that rely on temporally complete inputs and can bias downstream analyses. Imputing the missing scan to complete the subject’s imaging timeline is therefore a critical step toward enabling robust longitudinal modeling and improving our understanding of neurodegenerative processes.

           

          Year of completion:  June 2025
           Advisor : Makarand Tapaswi

          Related Publications


            Downloads

            More Articles …

            1. The Anatomy of Synthesis: Simulating Changes in the Human Brain over Time through Diffeomorphic Deformations
            2. Cinematic Video Editing: Integrating Audio-Visual Perception and Dialogue Interpretation
            3. Towards understanding Compositionality in Vision-Language Models
            4. Face Sketch Generation and Recognition
            • Start
            • Prev
            • 1
            • 2
            • 3
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9
            • 10
            • Next
            • End
            1. You are here:  
            2. Home
            3. Research
            4. MS Thesis
            5. Thesis Students
            Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.