Targeted Segmentation: Leveraging Localization with DAFT for Improved Medical Image Segmentation
Samruddhi Shastri
Abstract
Medical imaging plays a pivotal role in modern healthcare, providing clinicians with crucial insights into the human body’s internal structures. However, extracting meaningful information from medical images, such as X-rays and Computed Tomography (CT) scans, remains a challenging task, particularly in the context of accurate segmentation. This thesis presents a novel two-stage Deep Learning (DL) pipeline designed to address the limitations of existing single-stage models and improve segmentation performance in two critical medical imaging tasks: pneumothorax segmentation in chest radiographs and multi-organ segmentation in abdominal CT scans. The first stage of the proposed pipeline focuses on localizing target organs or lesions within the image. This initial localization stage utilizes a specialized module tailored to the specific organ/lesion and image type. This stage outputs a “localization map” highlighting the most probable regions where the target resides, guiding the next step. The second stage, fine-grained segmentation, precisely delineates the organ/lesion boundaries. This is achieved by combining UNet, known for its ability to capture both general and detailed features, with Dynamic Affine Feature-Map Transform (DAFT) modules that dynamically adjust information within the network. This combined approach leads to more accurate boundary delineation, meticulously outlining the exact borders of the target organ/lesion after roughly locating it in the first stage. An application of the proposed pipeline focuses on pneumothorax segmentation, leveraging not only the image data but also the accompanying free-text radiology reports. By incorporating text-guided attention and DAFT, the pipeline produces low-dimensional region-localization maps, significantly reducing false positive predictions and improving segmentation accuracy. Extensive experiments on the CANDID-PTX dataset demonstrate the efficacy of the approach, achieving a Dice Similarity Coefficient (DSC) of 0.60 for positive cases and 0.052 False Positive Rate (FPR) for negative cases, with DSC ranging from 0.70 to 0.85 for medium and large pneumothoraces. Another application of the proposed pipeline involves multi-organ segmentation in abdominal CT scans, where accurate delineation of organ boundaries is crucial for various medical tasks. The proposed Guided-nnUNet leverages spatial guidance from a ResNet-50-based localization map in the first stage, followed by DAFT-enhanced 3D U-Net (nn-UNet implementation). Evaluation on the AMOS and Beyond The Cranial Vault (BTCV) datasets demonstrates a significant improvement over baseline models, with an average increase of 7% and 9% on the respective datasets. Moreover, Guided-nnUNet outperforms state-of-the-art (SOTA) methods, including MedNeXt, by 3.6% and 5.3% on the AMOS and BTCV datasets, respectively. Overall, this thesis proposes a novel two-stage deep learning pipeline for medical image segmentation, demonstrating its effectiveness in handling a wide range of anatomical structures and image modalities (2D X-ray, 3D CT) for both single-organ (e.g., pneumothorax segmentation in chest radiographs) and multi-organ segmentation tasks (e.g., abdominal CT scans). This comprehensive approach offers significant advancements and contributes to improved medical image analysis, potentially leading to better healthcare outcomes.
Year of completion: | June 2024 |
Advisor : | Jayanthi Sivaswamy |