2020
DOI: 10.1101/2020.11.23.394478
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Integrating long-range regulatory interactions to predict gene expression using graph convolutional networks

Abstract: Long-range spatial interactions among genomic regions are critical for regulating gene expression and their disruption has been associated with a host of diseases. However, when modeling the effects of regulatory factors on gene expression, most deep learning models either neglect long-range interactions or fail to capture the inherent 3D structure of the underlying biological system. This prevents the field from obtaining a more comprehensive understanding of gene regulation and from fully leveraging the stru… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(13 citation statements)
references
References 58 publications
0
13
0
Order By: Relevance
“…In RNA-seq GEP, there exists two tasks, namely binary gene expression classification and gene expression value regression for each protein-coding gene. Since gene expression can be predicted from histone marks, several models [6, 5, 31] are proposed to predict RNA-seq gene expression from several histone mark profiles within-cell type (i,e., using the cell-type specific histone marks to predict gene expression in the same cell type). Compared to these models, EPCOT utilized less epigenomic feature data (EPCOT does not require histone modification profiles as inputs) and achieved comparable performances in within-cell type prediction.…”
Section: Resultsmentioning
confidence: 99%
“…In RNA-seq GEP, there exists two tasks, namely binary gene expression classification and gene expression value regression for each protein-coding gene. Since gene expression can be predicted from histone marks, several models [6, 5, 31] are proposed to predict RNA-seq gene expression from several histone mark profiles within-cell type (i,e., using the cell-type specific histone marks to predict gene expression in the same cell type). Compared to these models, EPCOT utilized less epigenomic feature data (EPCOT does not require histone modification profiles as inputs) and achieved comparable performances in within-cell type prediction.…”
Section: Resultsmentioning
confidence: 99%
“…In this way, GATs weigh enhancer-enhancer (E-E) and enhancerpromoter (E-P) interactions, based on the features learned in the promoter and enhancer bins, in order to predict CAGE values more accurately. However, non-attention-based GNNs such as graph convolutional networks (GCN) (Kipf and Welling, 2017;Bigness et al, 2021) fail to learn the importance of individual interactions. It has been shown that GATs outperform GCNs in other machine learning contexts as well (Veličković et al, 2018).…”
Section: Resultsmentioning
confidence: 99%
“…However, these previous linear models use a fixed assignment of regulatory elements to genes without incorporating 3D interaction data, while current deep learning models consider relatively local features, such as promoters and at most nearby enhancers, and therefore cannot capture the impact of distal regulatory elements, which can be 1Mb or farther away from gene promoters. There are two studies that tried to use 3D data to predict gene expression (Bigness et al, 2021;Zeng et al, 2019), but they failed to address important aspects of modeling gene regulation. The most directly relevant one, GC-MERGE (Bigness et al, 2021), uses histone modification and Hi-C data to predict gene expression (RNA-seq) using graph convolutional networks (GCN); however, they do not provide any insight about gene regulation rules, such as finding functional enhancers or revealing the role of TF binding motifs on gene regulation.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…There are few emerging studies that explicitly take the 3D chromatin interactions into consideration to predict the gene expression. One such example is GC-MERGE 12 , a graph neural network (GNN) to propagate information between interacting genomic regions to predict the expression levels of genes. Although it is a proof-of-concept model that cannot be applied to genes without any chromatin interactions and only performs 10kbp genomic bin-level predictions but not at gene-level, it still underscores the promise of modeling epigenomic contexts of distal genomic regions along with that of promoters.…”
Section: Introductionmentioning
confidence: 99%