Continual Learning in Interactive Medical Image Segmentation

Kushal Borkar

Abstract

Automated segmentation of medical image volumes holds immense potential to significantly reduce the time and effort required from medical experts for annotation. However, leveraging machine learning for this task remains a formidable challenge due to variations in imaging modalities, the inherent complexity of medical images, and the limited availability of labeled patient data. While existing interactive segmentation methods and foundational models incorporate user-provided prompts to iteratively refine segmentation masks, they often fail to learn the continuity and inter-related information between consecutive slices in a 3D medical image volume. This limitation leads to inconsistencies, spatial discontinuities, and loss of anatomical coherence, ultimately affecting the reliability of segmentation results in clinical applications.

The work proposes a novel interactive segmentation framework that dynamically updates model parameters during inference using a test-time training paradigm guided by user-provided scribbles. Unlike traditional approaches, our method preserves crucial spatial and contextual information from both previously processed slices within the same medical volume and the training dataset through a studentteacher learning mechanism. By leveraging sequential dependencies across slices, our approach ensures smoother and more anatomically consistent segmentation masks while integrating prior knowledge from the training distribution.

We extensively evaluated our framework across diverse datasets, encompassing CT, MRI, and microscopic cell images, demonstrating its superior performance in both efficiency and accuracy. Our method significantly reduces user annotation time by a factor of 6.72× compared to manual annotation workflows and factor of 1.93× compared to state-of-the-art interactive segmentation methods. Furthermore, when benchmarked against foundational segmentation models, our framework achieves a Dice score of 0.9 within just 3–4 user interactions—substantially improving upon the 5–8 interactions required by existing models. This reduction in required interactions translates to a more streamlined and intuitive annotation process for volumetric CT and MRI scans.

Year of completion:	January 2025
Advisor 1 :	Prof. C. V. Jawahar
Advisor 2 :	Prof. Chetan Arora