2023
DOI: 10.1038/s41586-023-06661-w
|View full text |Cite
|
Sign up to set email alerts
|

Hold out the genome: a roadmap to solving the cis-regulatory code

Carl G. de Boer,
Jussi Taipale

Abstract: Gene expression is regulated by transcription factors (TFs) that work together to read cis-regulatory DNA sequences. The "cis-regulatory code" -how cells interpret DNA sequences to determine when, where, and how much genes should be expressed -has proven to be exceedingly complex 1,2 . Recently, advances in the scale and resolution of functional genomics assays and Machine Learning (ML) have enabled significant progress towards deciphering this code 3-6 . However, the cis-regulatory code will likely never be s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(9 citation statements)
references
References 173 publications
0
9
0
Order By: Relevance
“…The further growth of libraries above 10 8 elements requires the application of synthetic DNA fragments not related to known DNA genome sequences [46]. However, the analysis of such large libraries usually employs ML or DL to identify relations between DNA sequence properties and promoter activity [46,50,[241][242][243]251,262,263]. To improve these ML and DL approaches, they are trained on synthetic, random DNA fragments to test a larger sequence space; models trained on such synthetic data can predict genomic activity better than those solely trained on genome DNA [46,251].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The further growth of libraries above 10 8 elements requires the application of synthetic DNA fragments not related to known DNA genome sequences [46]. However, the analysis of such large libraries usually employs ML or DL to identify relations between DNA sequence properties and promoter activity [46,50,[241][242][243]251,262,263]. To improve these ML and DL approaches, they are trained on synthetic, random DNA fragments to test a larger sequence space; models trained on such synthetic data can predict genomic activity better than those solely trained on genome DNA [46,251].…”
Section: Discussionmentioning
confidence: 99%
“…However, the analysis of such large libraries usually employs ML or DL to identify relations between DNA sequence properties and promoter activity [46,50,[241][242][243]251,262,263]. To improve these ML and DL approaches, they are trained on synthetic, random DNA fragments to test a larger sequence space; models trained on such synthetic data can predict genomic activity better than those solely trained on genome DNA [46,251]. More complex relationships between synthetic promoter structure, activity and DNA methylation or histone acetylation status are efficiently addressed by multilayered DL algorithms, while the ML models could not grasp these labyrinthine qualities precisely [242,[247][248][249][250].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In terms of distal CREs or enhancers, the regulatory role of DNAm is proposed to be highly context dependent [ 99 ]. Deciphering this cis -regulatory code is now being aided by ML and large-scale functional assays [ 100 ].…”
Section: The Epigenome Is a Co-ordinated Multi-layered Apparatusmentioning
confidence: 99%
“…A major challenge in the field is determining how to train more complex deep learning models for applications outside of the most data-rich systems. A proposed solution is to substantially increase data volume by performing assays on randomly generated synthetic sequences, and then evaluating models trained on these sequences using true genomic sequences ( De Boer et al 2020 , de Boer and Taipale 2024 ). The reasoning behind this approach is that the genome does not contain sufficient variation to learn all aspects of the cis-regulatory code.…”
Section: Introductionmentioning
confidence: 99%