Explaining Deep Survival Analysis Models for Heterogenous Data

Title: Explaining Deep Survival Analysis Models for Heterogenous Data

 

Type: Master Thesis

Student: Moritz Wagner

Supervisor: Sebastian Pölsterl, Christian Wachinger

Status: Finished on 05.05.2021

 

Abstract:

The aim of this work is to predict the progressions from mild cognitive impairment (MCI) to Alzheimer’s disease (AD) based on heterogeneous data provided by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Jack Jr et al., 2008). We consider a slice of the coronal plane of the 3D MRIs of the brain and tabular biomarker data. To pursue this, we leverage state of the art methods in the area of Deep Survival Analysis. While the predictive performances are already promising, it is still a challenging task to inte- grate them into medical diagnosis systems. This is due to a lack of transparency and interpretability of these algorithms and their predictions (Singh et al., 2020). To enhance interpretability, Shapley values (Shapley, 1953) depict a prominent choice to determine which structures in the brain are responsible for an either accelerated or decelerated disease progression. Shapley values, however, rely on a specified baseline against which the considered MRI and its corresponding prediction is compared. To identify a suitable baseline has turned out to be challenging (Sturmfels et al., 2020). The literature refers to that as the baseline selection problem (Shih et al., 2020). We argue that the optimal baseline must represent a meaningful and contrasting example to the original MRI. If the original MRI contributes to an accelerated/decelerated progression, the baseline must contribute to a decelerated/accelerated progression. To ascertain meaningfulness, we re- quire the baseline to represent a realistic sample which differs from the original MRI only in those features that are directly linked to AD progression. The latter criterion prevents us from selecting a sample from the available data, but rather requires to synthetically generate a hypothetical MRI. To pursue this, we rely on the general ideas of image-to- image translation. We propose a novel and unique framework – the baseline generator – that allows to uniquely identify an optimal baseline for each MRI. While similar methods have already been proposed for binary classification (Bass et al., 2020), our proposed framework is applicable to survival analysis. Within this work it will become evident why this general conceptual transfer is essential. Due to the limited scope of this thesis, we refrain from applying the baseline generator framework on the ADNI data. Instead, we identify a unique simulation setting to fully verify the functioning of the established framework. By doing so, we can conclude that the framework fills a non-negligible gap for making survival times predictions – based on unstructured image data – interpretable. We argue that this serves as a decisive step to enhance interpretability of predictions of AD progression.