2022
DOI: 10.1101/2022.08.05.502903
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transfer learning identifies sequence determinants of regulatory element accessibility

Abstract: Dysfunction of regulatory elements through genetic variants is a central mechanism in the pathogenesis of disease. To better understand disease etiology, there is consequently a need to understand how DNA encodes regulatory activity. Deep learning methods show great promise for modeling of biomolecular data from DNA sequence, but are limited to large input data for training. Here, we develop ChromTransfer, a transfer learning method that uses a pre-trained, cell-type agnostic model of open chromatin regions as… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 59 publications
0
6
0
Order By: Relevance
“…CNN-based enhancer modeling has recently gained traction, due to their capacity for the interpretation of enhancer grammar 12, 19, 20, 74 A key limitation of CNN models is that they require large input data sets for training. While training on small data sets may lead to overfitting, transfer learning from sequence models trained with large data sets has been recently shown as a robust alternative 75 , Here, we propose several transfer learning applications, whereby the first (topic-based) model is fine-tuned either to learn cell state (in our case, hepatocyte zonation) or enhancer activity (based on MPRA data). The topic-CNN could recapitulate the core hepatocyte code, with sequence features associated with the same TFs as identified by SCENIC+.…”
Section: Discussionmentioning
confidence: 99%
“…CNN-based enhancer modeling has recently gained traction, due to their capacity for the interpretation of enhancer grammar 12, 19, 20, 74 A key limitation of CNN models is that they require large input data sets for training. While training on small data sets may lead to overfitting, transfer learning from sequence models trained with large data sets has been recently shown as a robust alternative 75 , Here, we propose several transfer learning applications, whereby the first (topic-based) model is fine-tuned either to learn cell state (in our case, hepatocyte zonation) or enhancer activity (based on MPRA data). The topic-CNN could recapitulate the core hepatocyte code, with sequence features associated with the same TFs as identified by SCENIC+.…”
Section: Discussionmentioning
confidence: 99%
“…For each of the 27 genomics tasks, we compared the performance of ChatNT with the state-of-the-art method for the respective dataset. These included the convolutional neural networks DeepSTARR [11], ChromTransfer [56], APARENT2 [21] and Saluki [19]; and the fine-tuned foundation models based on Nucleotide Transformer [26], agroNT [55], DNABERT [27] and ESM2 [48]. We used different performance metrics per task to follow the same metric used in the respective studies.…”
Section: Baselines For the Genomics Tasksmentioning
confidence: 99%
“…This collection includes a non-redundant subset of tasks from the Nucleotide Transformer [26] and the BEND [54] benchmarks, complemented with relevant tasks from the plant AgroNT benchmark [55] and human ChromTransfer [56]. These benchmarks have been extensively used in the literature, come from different research groups, and represent diverse DNA processes and species.…”
Section: A New Curated Genomics Instructions Dataset Of Biologically ...mentioning
confidence: 99%
See 1 more Smart Citation
“…While steady-state models can use all detected REs (~100s of thousands), only a subset of these REs are expected to respond to perturbation of a single TF (<10s of thousands), limiting the number of available training examples. Transfer learning, in which deep learning models are "pre-trained" on a larger set of related examples and then finetuned to predict the desired task, has recently emerged as an attractive solution to this type of problem, enabling use of deep learning models in data-limited settings [15][16][17][18][19] .…”
Section: Introductionmentioning
confidence: 99%