2022
DOI: 10.1101/2022.05.20.492818
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ExplaiNN: interpretable and transparent neural networks for genomics

Abstract: Sequence-based deep learning models, particularly convolutional neural networks (CNNs), have shown superior performance on a wide range of genomic tasks. A key limitation of these models is the lack of interpretability, slowing their broad adoption by the genomics community. Current approaches to model interpretation do not readily reveal how a model makes predictions, can be computationally intensive, and depend on the implemented architecture. Here, we introduce ExplaiNN, an adaptation of neural additive mod… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
16
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(17 citation statements)
references
References 81 publications
1
16
0
Order By: Relevance
“…Emphasizing the neomorphic activity of IRF4 T95R , more than 35% of the peaks (versus <10% in IRF4 WT ) corresponded to "non-ChIPable" IRF4 regions [i.e., they are not reported in the ReMap database (21), which aggregates IRF4 ChIPseq data from B cells, T cells, plasmablasts, and various cell lines], and about 33% do not overlap any of the >1 million candidate cisregulatory elements from ENCODE (22) (versus ~7% in IRF4 WT ). Applying a new deep learning tool, ExplaiNN (explainable neural networks) (23), we separately identified motifs de novo in four different datasets, including the patient and healthy control ChIP-seq datasets and two custom datasets describing the binding of IRF4 to either AICE or EICE sites in GM12878 cells. Next, we used these motifs to initialize a "surrogate" ExplaiNN model in a process known as transfer learning with which to evaluate their importance toward the IRF4 T95R -specific, IRF4 WT -specific, or common component of the ChIP-seq data (fig.…”
Section: T95r Changes Both the Genome-wide Binding Landscape Of Irf4 ...mentioning
confidence: 99%
See 1 more Smart Citation
“…Emphasizing the neomorphic activity of IRF4 T95R , more than 35% of the peaks (versus <10% in IRF4 WT ) corresponded to "non-ChIPable" IRF4 regions [i.e., they are not reported in the ReMap database (21), which aggregates IRF4 ChIPseq data from B cells, T cells, plasmablasts, and various cell lines], and about 33% do not overlap any of the >1 million candidate cisregulatory elements from ENCODE (22) (versus ~7% in IRF4 WT ). Applying a new deep learning tool, ExplaiNN (explainable neural networks) (23), we separately identified motifs de novo in four different datasets, including the patient and healthy control ChIP-seq datasets and two custom datasets describing the binding of IRF4 to either AICE or EICE sites in GM12878 cells. Next, we used these motifs to initialize a "surrogate" ExplaiNN model in a process known as transfer learning with which to evaluate their importance toward the IRF4 T95R -specific, IRF4 WT -specific, or common component of the ChIP-seq data (fig.…”
Section: T95r Changes Both the Genome-wide Binding Landscape Of Irf4 ...mentioning
confidence: 99%
“…Then, the enrichment of each 8-nucleotide oligomer was computed as the logarithm to the base 2, resulting from the division between the number of occurrences of that 8-nucleotide oligomer in the last and first SELEX cycles. Motifs were obtained using ExplaiNN (23) (see the "Deep learning models" section in the Supplementary Materials and Methods).…”
Section: Ht-selexmentioning
confidence: 99%
“…Interpretation of trained models has been crucial for deciphering aspects of the cis-regulatory code and is a core aspect of the EUGENe workflow (interpret module, Figure 1c). There are many strategies for model interpretation in genomics [44][45][46][47][48][49][50][51] , but three categories are repeatedly used and thus implemented in EUGENe: filter visualization, feature attribution and in silico experimentation (Supplementary Figure 1). Filter visualization is applicable to model architectures that begin with a set of convolutional filters and involves using the set of sequences that significantly activate a given filter (maximally activating subsequences) to generate a position frequency matrix (PFM) (Supplementary Figure 1a).…”
Section: Resultsmentioning
confidence: 99%
“…Similarly, though any user could develop their own methods for benchmarking EUGENe models against shallow machine learning models like gkm-SVMs 74 or random forests 75 , we plan on integrating functionality for automating this process. Finally, we plan on expanding EUGENe's dataset, model, metric and interpretation [45][46][47][48][49]51 library to encompass a larger portion of those available in the field.…”
Section: Discussionmentioning
confidence: 99%
“…Intriguingly, novel applications may go beyond classic inferential tasks and include other aims, such as efficient data compression or generation of synthetic experimental data sets. Likewise, solutions for making neural networks a “transparent-box,” such as neural additive models ( Novakovsky et al 2022 ) and symbolic metamodeling ( Alaa and van der Schaar 2019 ), will facilitate the adoption of deep learning among empiricists.…”
Section: Discussionmentioning
confidence: 99%