2024
DOI: 10.1101/2024.03.13.583868
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation

Adam Y. He,
Charles G. Danko

Abstract: Our understanding of how the DNA sequences of cis-regulatory elements encode transcription initiation patterns remains limited. Here we introduce CLIPNET, a deep learning model trained on population-scale PRO-cap data that accurately predicts the position and quantity of transcription initiation with single nucleotide resolution from DNA sequence. Interpretation of CLIPNET revealed a complex regulatory syntax consisting of DNA-protein interactions in five major positions between -200 and +50 bp relative to the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 105 publications
0
2
0
Order By: Relevance
“…For each gene, we fetched an individual's two 49-kilobase (kb) consensus sequences centered on the gene's TSS (GENCODE 30 v26). We one-hot encoded each sequence, and used the average of two one-hot encoded matrices as our input 24 . While other approaches are possible 7 , we found this representation to be reasonable because:…”
Section: Inputsmentioning
confidence: 99%
See 1 more Smart Citation
“…For each gene, we fetched an individual's two 49-kilobase (kb) consensus sequences centered on the gene's TSS (GENCODE 30 v26). We one-hot encoded each sequence, and used the average of two one-hot encoded matrices as our input 24 . While other approaches are possible 7 , we found this representation to be reasonable because:…”
Section: Inputsmentioning
confidence: 99%
“…Previous work used far fewer individuals and did not evaluate across them. 7,24 To address this we developed Performer, a fine-tuning strategy that implements cross-individual training and evaluation of sequence-to-expression neural network models. Briefly, we modified the Enformer architecture 15 by replacing the output head with one that predicts tissue-specific gene expression as a scalar value rather than a genomic track and implemented fine-tuning with Enformer's weights as starting values for the parameters in the model trunk and a custom loss function (Methods).…”
mentioning
confidence: 99%