2020
DOI: 10.1101/2020.01.31.927996
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Self-Attention Model for Inferring Cooperativity between Regulatory Features

Abstract: Motivation:Deep learning has demonstrated its predictive power in modeling complex biological phenomena such as gene expression. The value of these models hinges not only on their accuracy, but also on the ability to extract biologically relevant information from the trained models. While there has been much recent work on developing feature attribution methods that provide the most important features for a given sequence, inferring cooperativity between regulatory elements, which is the hallmark of phenomena … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 43 publications
0
5
0
Order By: Relevance
“…saliency maps ( Simonyan et al, 2013 ) and 4) higher-order interactions among sequence elements, which can be assessed e.g. by using association rule analysis ( Naulaerts et al, 2015 ; Zrimec et al, 2020 ), second-order perturbations ( Koo et al, 2018 ), self-attention networks ( Ullah and Ben-Hur, 2020 ) or by visualizing kernels in deeper layers ( Maslova et al, 2020 ) [interested readers are referred to ( Eraslan et al, 2019a ; Koo and Ploenzke, 2020a )]. Moreover, attention mechanisms were recently shown to be more effective in discovering known TF-binding motifs compared to non-attentive DNNs ( Park et al, 2020 ), as the learned attention weights correlate with informative inputs, such as DNase-Seq coverage and DNA motifs ( Chen et al, 2021 ), and they can provide better interpretation than other established feature visualization methods, such as saliency maps ( Lanchantin et al, 2016 ; Singh et al, 2017 ).…”
Section: Learning the Protein-dna Interactions Initiating Gene Expressionmentioning
confidence: 99%
“…saliency maps ( Simonyan et al, 2013 ) and 4) higher-order interactions among sequence elements, which can be assessed e.g. by using association rule analysis ( Naulaerts et al, 2015 ; Zrimec et al, 2020 ), second-order perturbations ( Koo et al, 2018 ), self-attention networks ( Ullah and Ben-Hur, 2020 ) or by visualizing kernels in deeper layers ( Maslova et al, 2020 ) [interested readers are referred to ( Eraslan et al, 2019a ; Koo and Ploenzke, 2020a )]. Moreover, attention mechanisms were recently shown to be more effective in discovering known TF-binding motifs compared to non-attentive DNNs ( Park et al, 2020 ), as the learned attention weights correlate with informative inputs, such as DNase-Seq coverage and DNA motifs ( Chen et al, 2021 ), and they can provide better interpretation than other established feature visualization methods, such as saliency maps ( Lanchantin et al, 2016 ; Singh et al, 2017 ).…”
Section: Learning the Protein-dna Interactions Initiating Gene Expressionmentioning
confidence: 99%
“…Whereas ECHO and ChromeGCN [13] explicitly leverage chromatin contacts, DNA interactions are implicitly captured by SATORI [12] and Basenji [11]. SATORI captures TF-TF interactions by combining CNN with self-attention mechanisms.…”
Section: Discussionmentioning
confidence: 99%
“…According to whether 3D chromatin organization is utilized, we categorize the deep learning based computational works for predicting chromatin features into sequence-based and graph-based models. Well-known sequence-based models, such as DeepSEA [7], DanQ [8], DeepBind [9], Basset [10], Basenji [11], and SATORI [12], predict chromatin features only from DNA sequences and ignore the informative chromatin structures. To the best of our knowledge, the only graph-based chromatin feature prediction model ChromeGCN [13] uses a gated graph convolution network to leverage the neighborhood information from a 1kb resolution Hi-C contact map, but it does not fully characterize cooperation among chromatin features.…”
Section: Introductionmentioning
confidence: 99%
“…However, extracting these interactions is not straightforward. Alternatively, MHA, a key component of transformers, can provide an "interpretable" attention map to reveal learned activity between pairs of convolutional filters (Ullah & Ben-Hur, 2021). However, such an attention map is only intrinsically interpretable if the convolutional layer(s) learn robust motif representations and are identifiable in the attention maps.…”
mentioning
confidence: 99%
“…Recently, several hybrid networks that build upon convolutional layers with architectures developed for natural language processing, including bidirectional long-short-term memory (BiLSTM) (Quang & Xie, 2016;Minnoye et al, 2020), multi-head attention (MHA) (Li et al, 2020;Ullah & Ben-Hur, 2021), and transformer encoders (Ji et al, 2020;Avsec et al, 2021a), have demonstrated improved performance relative to pure CNNs. BiLSTMs can, in principle, capture long-range motif interactions.…”
mentioning
confidence: 99%