2021
DOI: 10.3389/fmolb.2021.673363
|View full text |Cite
|
Sign up to set email alerts
|

Learning the Regulatory Code of Gene Expression

Abstract: Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
41
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(42 citation statements)
references
References 279 publications
(481 reference statements)
1
41
0
Order By: Relevance
“…We believe our approach has the potential to generate alternative or better binders for these complex targets, as well as to unveil the sequence motifs that are enriched or avoided in these high-quality aptamers. The same approach can be also useful to model RNA and DNA regulatory sequences and their interaction with proteins in the key processes such as transcription regulation [52, 26, 56, 23]. Lastly, our modeling and design methods are also readily applicable to other selection-amplification protocols, such as phage display for antibody discovery [14, 24] or directed protein evolution studies [4, 42], which have much larger space of possible sequences (20 L for length L ) compared to aptamers (4 L ).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We believe our approach has the potential to generate alternative or better binders for these complex targets, as well as to unveil the sequence motifs that are enriched or avoided in these high-quality aptamers. The same approach can be also useful to model RNA and DNA regulatory sequences and their interaction with proteins in the key processes such as transcription regulation [52, 26, 56, 23]. Lastly, our modeling and design methods are also readily applicable to other selection-amplification protocols, such as phage display for antibody discovery [14, 24] or directed protein evolution studies [4, 42], which have much larger space of possible sequences (20 L for length L ) compared to aptamers (4 L ).…”
Section: Discussionmentioning
confidence: 99%
“…Over the last decade, deep neural networks (DNN) have become a popular machine learning tool in many areas, such as image recognition or natural language processing, and are now increasingly applied in chemical and biological data processing workflow [22, 56, 23, 6]. However, training DNNs typically requires large datasets, which can be challenging and expensive to obtain from biological experiments.…”
Section: Introductionmentioning
confidence: 99%
“…This contributes to the awareness that gene expression regulation spans different coding and noncoding regions that include the enhancer, promoter, untranslated regions (UTRs), and terminator [2,8] (Figure 1A). It is affected by the enzymatic accessibility of DNA defined by chromatin and epigenetic states [3,10]. Since mRNA abundance is a result of both mRNA synthesis and degradation, it is controlled not only by TFs and core promoters, but by a more complex set of cis-acting elements carried mostly by UTRs [11].…”
Section: Gene Regulatory Structure Jointly Controls Expression Patternsmentioning
confidence: 99%
“…Since mRNA abundance is a result of both mRNA synthesis and degradation, it is controlled not only by TFs and core promoters, but by a more complex set of cis-acting elements carried mostly by UTRs [11]. Whereas promoter regions were found to explain up to 96% of the variation of gene expression according to DNNs, coding regions can explain up to 69% and 5′ and 3′ UTRs as much as 89% [10]. The different regions also carry complementary information, with different parts coevolving and predictive of the activity of others [2].…”
Section: Gene Regulatory Structure Jointly Controls Expression Patternsmentioning
confidence: 99%
“…Deep learning is currently making a great impact across all these related fields. Its applications have already been reviewed for general omics [ 126 , 127 , 128 , 129 ], gene function prediction [ 130 , 131 ], disease prediction [ 132 ], predicting the impact of genetic variation in genomics [ 129 , 133 ], predicting gene regulatory networks [ 133 , 134 ], regulatory genomics [ 135 ], sequence motifs of transcription factors and enhancers [ 133 , 134 , 136 , 137 , 138 , 139 , 140 , 141 , 142 ], variant calling and pathogenicity scores [ 143 ], precision medicine [ 144 , 145 ], pharmacogenomics [ 128 ], and even the prediction of CRISPR targets [ 146 ].…”
Section: The Virtual Gene Concept Can Define a Practical Research Pro...mentioning
confidence: 99%