Explaining decisions of Graph Convolutional Neural Networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer

Chereda, Hryhorii; Bleckmann, Annalen; Menck, Kerstin; Perera-Bel, Júlia; Stegmaier, Philip; Auer, Florian; Leha, Andreas; Beißbarth, Tim

doi:10.1101/2020.08.05.238519

Cited by 5 publications

(7 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Graph-CNNs, trained with word2vec-embedding-based networks, have shown a good enough performance to classify metastatic events vs. non-metastatic events. Predictions of Graph-CNN applied to the same gene expression data used in this study with the HPRD PPI were explained in a recent study and provided patient-specific subnetworks [27]. An interesting research question brought up by this study is whether patient-specific subnetwork genes predicted using an embedding-based gene-gene network would give different insights into the tumor biology of a patient than those predicted using PPI networks.…”

Section: Discussionmentioning

confidence: 99%

“…One of the approaches for validation of the embedding networks is to analyze how the underlying molecular network influences performance of the machine learning method utilizing prior knowledge. The Graph-CNN [41] method was applied on the breast cancer dataset introduced in section 2.3 in recent studies [26][27]. We subtracted the minimal value (5.84847) of the data from each cell of the quantile normalized gene expression matrix to keep the gene expression values non-negative.…”

Section: Graph-convolutional Neural Network (Cnn)mentioning

confidence: 99%

“…CNN models were also effective for NLP tasks such as text classification [22-23]. They have been used in bioinformatics [24], namely in drug discovery and genomics [25], and motivated further progress on graph structured prior information with promising results on the prediction of metastatic events [26-27].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Text mining-based word representations for biomedical data analysis and machine learning tasks

Alachram

Chereda

Beißbarth

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Biomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on breast cancer gene expression data to predict the occurrence of metastatic events. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed best for the metastatic event prediction task compared to other networks. Word representations as produced by text mining algorithms like word2vec, therefore capture biologically meaningful relations between entities.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Graph-convolutional Neural Network (Cnn)mentioning

confidence: 99%

See 1 more Smart Citation

Text mining-based word representations for biomedical data analysis and machine learning tasks

Alachram

Chereda

Beißbarth

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Table 1 provides the GEO accession numbers of the samples used in this study, along with the sample statistics. Similar to the approach used in (Chereda et al, 2019), we used the RMA probe-summary algorithm (Irizarry et al, 2003) to process each dataset, after which they were combined based on the HG-U133A array probe names, and quantile normalization was applied across all datasets. In cases where multiple probes were mapped to one gene, the probe with the highest average value was taken.…”

Section: Gene Expression Datasetsmentioning

confidence: 99%

MetastaSite: Predicting metastasis to different sites using deep learning with gene expression data

Albaradei

Albaradei²,

Alsaedi³

et al. 2022

Front. Mol. Biosci.

View full text Add to dashboard Cite

Deep learning has massive potential in predicting phenotype from different omics profiles. However, deep neural networks are viewed as black boxes, providing predictions without explanation. Therefore, the requirements for these models to become interpretable are increasing, especially in the medical field. Here we propose a computational framework that takes the gene expression profile of any primary cancer sample and predicts whether patients’ samples are primary (localized) or metastasized to the brain, bone, lung, or liver based on deep learning architecture. Specifically, we first constructed an AutoEncoder framework to learn the non-linear relationship between genes, and then DeepLIFT was applied to calculate genes’ importance scores. Next, to mine the top essential genes that can distinguish the primary and metastasized tumors, we iteratively added ten top-ranked genes based upon their importance score to train a DNN model. Then we trained a final multi-class DNN that uses the output from the previous part as an input and predicts whether samples are primary or metastasized to the brain, bone, lung, or liver. The prediction performances ranged from AUC of 0.93–0.82. We further designed the model’s workflow to provide a second functionality beyond metastasis site prediction, i.e., to identify the biological functions that the DL model uses to perform the prediction. To our knowledge, this is the first multi-class DNN model developed for the generic prediction of metastasis to various sites.

show abstract

“…Graph Neural Networks (GNNs) [29], a powerful technology for learning knowledge from graph-structured data, are gaining increasing attention in today's world, where graph-structured data such as social networks [12,27], molecular structures [6,25], traffic flows [19,21,41,47], and knowledge graphs [32] are widely used. GNNs work by propagating and fusing messages from neighboring nodes on the graph using message-passing mechanisms.…”

Section: Introductionmentioning

confidence: 99%

MixupExplainer: Generalizing Explanations for Graph Neural Networks with Data Augmentation

Zhang

Luo

Wang³

2023

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Graph Neural Networks (GNNs) have received increasing attention due to their ability to learn from graph-structured data. However, their predictions are often not interpretable. Post-hoc instance-level explanation methods have been proposed to understand GNN predictions. These methods seek to discover substructures that explain the prediction behavior of a trained GNN. In this paper, we shed light on the existence of the distribution shifting issue in existing methods, which affects explanation quality, particularly in applications on real-life datasets with tight decision boundaries. To address this issue, we introduce a generalized Graph Information Bottleneck (GIB) form that includes a label-independent graph variable, which is equivalent to the vanilla GIB. Driven by the generalized GIB, we propose a graph mixup method, MixupExplainer, with a theoretical guarantee to resolve the distribution shifting issue. We conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our proposed mixup approach over existing approaches. We also provide a detailed analysis of how our proposed approach alleviates the distribution shifting issue. CCS CONCEPTS• Computing methodologies → Neural networks; Artificial intelligence; • Human-centered computing → Human computer interaction (HCI).

show abstract

Explaining decisions of Graph Convolutional Neural Networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer

Cited by 5 publications

References 57 publications

Text mining-based word representations for biomedical data analysis and machine learning tasks

Text mining-based word representations for biomedical data analysis and machine learning tasks

MetastaSite: Predicting metastasis to different sites using deep learning with gene expression data

MixupExplainer: Generalizing Explanations for Graph Neural Networks with Data Augmentation

Contact Info

Product

Resources

About