Towards Trustworthy and Fair Medical Image Analysis Models

 

Abstract:

Although Deep Learning (DL) models have been shown to perform very well on various medical imaging tasks, inference in the presence of pathology presents several challenges to common models. These challenges impede the integration of DL models into real clinical workflows. Deployment of these models into real clinical contexts requires: (1) that the confidence in DL model predictions be accurately expressed in the form of uncertainties and (2) that they exhibit robustness and fairness across different sub-populations. In this talk, we will look at our recent work, where we developed an uncertainty quantification score for the task of Brain Tumour Segmentation. We evaluated the score's usefulness during the two consecutive Brain Tumour Segmentation (BraTS) challenges, BraTS 2019 and BraTS 2020. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Additionally, we combine the aspect of uncertainty estimates with fairness across demographic subgroups into the picture. By performing extensive experiments on multiple tasks, we show that popular ML methods for achieving fairness across different subgroups, such as data-balancing and distributionally robust optimization, succeed in terms of the model performances for some of the tasks. However, this can come at the cost of poor uncertainty estimates associated with the model predictions. At last, we talk about our ongoing work on fairness mitigation framework in terms of calibration. Although several methods have been shown to successfully mitigate biases across subgroups in terms of accuracy, they do not consider calibration across different subgroups of these models. To this end, we propose a novel two-stage method Cluster-Focal. Extensive experiments on two different medical image classification datasets show that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, outperforming recent baselines.

Bio:

Raghav Mehta is a Ph.D. candidate in the Department of Electrical and Computer Engineering at McGill University. He works with Prof. Tal Arbel in the Probabilistic Vision Group, Centre for Intelligent Machines. His primary research is in the field of neuroimage analysis and machine learning. Specifically, he works on quantifying and leveraging uncertainty in deep neural networks for the medical image analysis pipeline. Previously, Raghav completed his master's from IIIT Hyderabad. He was one of the main students in the project, The Construction of Brain Atlas for Young

Robustness and Safety. Raghav has received several awards, including the MEITA scholarship, reviewer award winner for MIDL, and best paper awards at DART and UNSURE workshops in MICCAI.