CVIT Home CVIT Home
  • Home
  • People
    • Faculty
    • Staff
    • PhD Students
    • MS Students
    • Alumni
    • Post-doctoral
    • Honours Student
  • Research
    • Publications
    • Journals
    • Books
    • MS Thesis
    • PhD Thesis
    • Projects
    • Resources
  • Events
    • Talks and Visits
    • Major Events
    • Visitors
    • Summer Schools
  • Gallery
  • News & Updates
    • News
    • Blog
    • Newsletter
    • Past Announcements
  • Contact Us

Continual Learning in Interactive Medical Image Segmentation


Kushal Borkar

Abstract

Automated segmentation of medical image volumes holds immense potential to significantly reduce the time and effort required from medical experts for annotation. However, leveraging machine learning for this task remains a formidable challenge due to variations in imaging modalities, the inherent complexity of medical images, and the limited availability of labeled patient data. While existing interactive segmentation methods and foundational models incorporate user-provided prompts to iteratively refine segmentation masks, they often fail to learn the continuity and inter-related information between consecutive slices in a 3D medical image volume. This limitation leads to inconsistencies, spatial discontinuities, and loss of anatomical coherence, ultimately affecting the reliability of segmentation results in clinical applications.

The work proposes a novel interactive segmentation framework that dynamically updates model parameters during inference using a test-time training paradigm guided by user-provided scribbles. Unlike traditional approaches, our method preserves crucial spatial and contextual information from both previously processed slices within the same medical volume and the training dataset through a studentteacher learning mechanism. By leveraging sequential dependencies across slices, our approach ensures smoother and more anatomically consistent segmentation masks while integrating prior knowledge from the training distribution.

We extensively evaluated our framework across diverse datasets, encompassing CT, MRI, and microscopic cell images, demonstrating its superior performance in both efficiency and accuracy. Our method significantly reduces user annotation time by a factor of 6.72× compared to manual annotation workflows and factor of 1.93× compared to state-of-the-art interactive segmentation methods. Furthermore, when benchmarked against foundational segmentation models, our framework achieves a Dice score of 0.9 within just 3–4 user interactions—substantially improving upon the 5–8 interactions required by existing models. This reduction in required interactions translates to a more streamlined and intuitive annotation process for volumetric CT and MRI scans.

 

Year of completion:  January 2025
 Advisor 1 : Prof. C. V. Jawahar
 Advisor 2 : Prof. Chetan Arora

Related Publications


Downloads

thesis

Controllable Structured Object Generation


Amruta Priyadarshan Muthal

Abstract

While text-to-image (T2I) models have demonstrated remarkable improvements in spatial consistency and positioning of large scene components, they struggle with fine-grained structural control, particularly for anatomically complex objects with multiple interconnected parts. This limitation manifests as anatomical inconsistencies, missing or hallucinated components and incorrect part connectivity, critical failures for applications requiring precise structural fidelity. Recent developments in controlled image generation have produced models capable of high-fidelity synthesis. However, these models either rely on sophisticated conditioning inputs such as detailed pose skeletons or segmentation masks that require human expertise.

We introduce PLATO, a novel two-stage framework that bridges this gap by enabling precise, part-controlled object generation from simple, intuitive inputs: an object category and a list of constituent parts. The first stage employs PLayGen, our novel part layout generator built upon a Graph Convolutional Network Variational Autoencoder (GCN-VAE) architecture. PLayGen consists of a GCN Refinement Decoder that iteratively refines part placements through graph-based message passing, capturing inter-part spatial relationships during refinement. To address the challenge of training with non-intersecting bounding boxes and small parts, we introduce the Dynamic Margin IOU (DyI) loss, which dynamically adjusts margins based on box dimensions to ensure meaningful gradients throughout training. Additionally, our Differential Refinement loss scheme enables the model to learn both coarse positioning and fine-grained adjustments simultaneously. The overall loss formulation also has the effect of stabilizing the training process while improving structural coherence.

In the second stage, PLayGen’s synthesized layout guides a custom-tuned ControlNet-based diffusion model through a novel visual conditioning format using color-coded inscribed ellipses combined with part-aware text prompts. This design enforces both spatial constraints and semantic consistency, resulting in anatomically accurate, high-fidelity object generations that faithfully contain precisely the user-specified parts.

 

Year of completion:   December 2025
 Advisor : Dr. Ravi Kiran Sarvadevabhatla

Related Publications


Downloads

Efficient Identity-preserving Face Swapping in Scenic Sketches


Ankith Varun J

Abstract

We present an efficient framework for identity-preserving face swapping into scenic portrait sketches. First, we analyze latent identity representations via a StyleGAN-based face cartoonization pipeline. Our analysis reveals domain-dependent shifts in StyleGAN’s identity encodings when mapping photos to sketches, underscoring the need for cross-domain consistency. Second, we introduce Portrait Sketching StyleGAN (PS-StyleGAN), a novel GAN architecture with attentive affine-transform blocks. These blocks learn to modulate a pretrained StyleGAN’s style codes by joint content-and-style attention, thereby learning style-specific identity transformations. PS-StyleGAN requires only a few photo-sketch pairs (and short training) to learn each style, making it broadly adaptable. Thus, PS-StyleGAN produces editable, expressive portrait sketches that faithfully preserve the input identity. Third, we extend diffusion-based generative modeling by adapting InstantID for sketch synthesis. Our diffusion module integrates strong semantic identity embeddings and landmark guidance (inspired by InstantID’s IdentityNet) to steer high-fidelity portrait generation. This yields personalized sketches with markedly improved identity fidelity while requiring only zero-shot (tuning-free) personalization. Finally, motivated by recent diffusion-based face-swapping advances, we build a real-time end-to-end pipeline. By encoding facial identity and pose as conditioning inputs and optimizing the diffusion process for speed, our system can seamlessly embed user faces into tourist-style sketches on the fly. Extensive evaluations on standard benchmarks confirm that our methods significantly enhance identity preservation and visual quality compared to prior approaches, advancing the state of the art in generative portrait stylization and face swapping

 

Year of completion:   October 2025
 Advisor :  Dr. Anoop Namboodiri

Related Publications


    Downloads

    Ankith Varun Thesis

    A Computational Framework for Ink Bleed Suppression in Handwritten Document Images


    Shrikant Baronia

    Abstract

    Ink-bleed is a common form of degradation in handwritten document images, where ink from the reverse side or adjacent lines seeps into the visible content, impairing readability and adversely affecting downstream tasks such as Optical Character Recognition (OCR). Traditional image processing techniques often fall short in preserving the fine structure of handwritten strokes while removing bleedthrough noise.

    This thesis presents a machine learning-based computational framework for ink bleed suppression using a layer separation approach. The proposed method models the document image as a combination of content and bleed-through layers and employs a Dual Layer Markov Random Field (DL-MRF) architecture to learn the separation in a supervised setting. A synthetic dataset with controlled bleed artifacts was constructed for training, along with augmentation strategies to simulate various bleed intensities and patterns.

    The model was evaluated on both synthetic and real handwritten document images. The results demonstrate significant improvement over traditional filtering techniques and baseline learning models, preserving content integrity while effectively reducing ink bleed.

    This work contributes towards robust document image restoration, with applications in digital archiving, historical manuscript preservation, and pre-processing pipelines for handwriting analysis systems.

     

    Year of completion:  June 2025
     Advisor : Prof. Anoop M Namboodiri

    Related Publications


      Downloads

      AI Assisted Screening of Oral Potentially Malignant Disorders Using Smartphone Photographic Images - An Indian Cohort Study


      Talwar Vivek Jayant

      Abstract

      The escalating prevalence of Oral Potentially Malignant Disorders (OPMDs) and oral cancer in lowand middle-income countries presents a critical challenge, exacerbated by limited resources that hinder population screening in remote areas. The study evaluates the efficacy of artificial intelligence (AI) and digital imaging diagnostics as tools for OPMD detection in the Indian population, utilizing smartphone-captured oral cavity images. Trained front-line healthcare workers (FHWs) contributed a dataset comprising 1,120 suspicious and 1,058 non-suspicious images. Various deep-learning models, including DenseNets and Swin Transformers, were assessed for image-classification performance. The best-performing model was then tested on an independent external set of 440 images collected by untrained FHWs. DenseNet201 and Swin Transformer (base) models exhibited high classification accuracy on the internal test set, achieving F1-scores of 0.84 (CI 0.79–0.89) and 0.83 (CI 0.78–0.88). However, performance declined on the external set—characterized by significant variation in image quality—with DenseNet201 yielding the highest F1-score of 0.73 (CI 0.67–0.78). The AI model demonstrates potential for identifying suspicious versus non-suspicious oral lesions via photographic images. This image-based solution holds promise for facilitating early screening, detection, and timely referral for OPMDs.

       

      Year of completion:  June 2025
       Advisor 1 : Dr. P.K. Vinod
       Advisor 2 : Prof. C.V. Jawahar

      Related Publications


        Downloads

        More Articles …

        1. On the Democratization of Realistic 3D Head Avatar Generation and Reconstruction
        2. Seeing, Describing and Remembering: A Study on Audio Descriptions and Video Memorability
        3. The Anatomy of Synthesis: Simulating Changes in the Human Brain over Time through Diffeomorphic Deformations
        4. Cinematic Video Editing: Integrating Audio-Visual Perception and Dialogue Interpretation
        • Start
        • Prev
        • 1
        • 2
        • 3
        • 4
        • 5
        • 6
        • 7
        • 8
        • 9
        • 10
        • Next
        • End
        1. You are here:  
        2. Home
        3. Research
        4. MS Thesis
        5. Thesis Students
        Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.