Machine learning methods have been remarkably successful for a wide range of application areas in the extraction of essential information from data. An exciting and relatively recent development is the uptake of machine learning in the natural sciences, where the major goal is to obtain novel scientific insights and discoveries from observational or simulated data. A prerequisite for obtaining a scientific outcome is domain knowledge, which is needed to gain explainability, but also to enhance scientific consistency. In this article we review explainable machine learning in view of applications in the natural sciences and discuss three core elements which we identified as relevant in this context: transparency, interpretability, and explainability. With respect to these core elements, we provide a survey of recent scientific works that incorporate machine learning and the way that explainable machine learning is used in combination with domain knowledge from the application areas. * R. Roscher and J. Garcke contributed equally to this work arXiv:1905.08883v3 [cs.LG]
The Genome-Wide Association Study (GWAS) is the study design of choice for detecting common genetic risk factors for multifactorial diseases. The performance of full Genome-Wide Interaction Analyses (GWIA) has always been considered computationally challenging. Two-stage strategies to reduce the amount of numerical analysis require the detection of single marker effects or prior pathophysiological hypotheses before the analysis of interaction. This prevents the detection of pure epistatic effects. Our case-control study in idiopathic generalized epilepsy demonstrates that a full GWIA is feasible through use of data compression, specific data representation, interleaved data organization, and parallelization of the analysis on a multi-processor system. Following extensive quality control of the genotypes, our final list of top interaction hits contains only pairs of interacting SNPs with negligible marginal effects. The TOP HIT interaction was between a SNP-pair intragenic to gene DNER (chr 2) and gene CTNNA3 (chr 10). Both of these genes are functionally involved in neuronal migration, synaptogenesis, and the formation of neuronal circuits. Our results therefore indicate a possible interaction between these two genes in epileptogenesis. Results from GWAS are beginning to reveal a ‘missing heritability’ in complex traits and diseases. Systematic, hypothesis-free analysis of epistatic interaction (GWIA) may help to close this increasingly recognized gap in heritability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.