2020
DOI: 10.1101/2020.01.23.915405
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping

Abstract: 10Predicting quantitative effects of gene regulatory elements (GREs) on gene expression is a longstanding 11 challenge in biology. Machine learning models for gene expression prediction may be able to address 12 this challenge, but they require experimental datasets that link large numbers of GREs to their 13 quantitative effect. However, current methods to generate such datasets experimentally are either 14 restricted to specific applications or limited by their technical complexity and error-proneness. Here … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
38
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 15 publications
(39 citation statements)
references
References 61 publications
1
38
0
Order By: Relevance
“…One significant challenge of using deep learning to predict biological function is the inherent difficulty in understanding learned patterns in a way that helps researchers to elucidate biological mechanisms underlying model predictions. Recent work has been developed to visualize sequence features by mapping learned convolutional filters to biologically relevant sequence motifs 45,46 . Additional methods have been established to address how models link biological theory, including alternative network architectures 47 , and the use of saliency maps 48,49 , which reveal the regions of input that deep-learning models weigh most heavily and therefore pay the most attention to when making predictions.…”
Section: Resultsmentioning
confidence: 99%
“…One significant challenge of using deep learning to predict biological function is the inherent difficulty in understanding learned patterns in a way that helps researchers to elucidate biological mechanisms underlying model predictions. Recent work has been developed to visualize sequence features by mapping learned convolutional filters to biologically relevant sequence motifs 45,46 . Additional methods have been established to address how models link biological theory, including alternative network architectures 47 , and the use of saliency maps 48,49 , which reveal the regions of input that deep-learning models weigh most heavily and therefore pay the most attention to when making predictions.…”
Section: Resultsmentioning
confidence: 99%
“…A recent, innovative study that used machine learning focused on predicting the influence of different 5 0 UTR sequences in E. coli (Hö llerer et al, 2020). The study developed an innovative reporter system, based on a recombinase protein, to quantify the expression from a large library of randomized 5 0 UTR sequences (Figure 4).…”
Section: Randomization Smart Selection and Machine Learningmentioning
confidence: 99%
“…The latter approach was demonstrated for the expression of a randomized 5 0 UTR library mediating expression of a recombinase that flips a nearby DNA modification site. In the same single-sequencing read, the 5 0 UTR variant can be identified and whether the site was flipped or not, allowing high-quality, large-scale data on expression levels (Hö llerer et al, 2020). Analysis of generated large-scale data is typically performed by multiple regression analysis and, recently, by machine-learning algorithms.…”
Section: Overview Of a Typical Workflow Randomizing Gene Regulatorymentioning
confidence: 99%
“…sequencing a genomic region multiple times, is an NGS approach that can be applied to track the genetic heterogeneity in a library of cell isolates. Höllerer et al introduced a wildly applicable DNA-based phenotypic recording approach to generate huge datasets linking regulators to quantitative functional readouts of high precision, only relying on sequencing short tag DNA elements 77 . The technique implements a site-specific recombinase, a regulator that controls recombinase expression, and a DNA substrate modifiable by the recombinase.…”
mentioning
confidence: 99%
“…Overall, results achieved from combinatorial optimization and integration of these data into available computational predictive models, can provide better understanding of the whole cellular system 95 . Höllerer et al, implemented next-generation sequencing to assess the quantitative expression effect of extremely large sets of RBSs 77 . They expanded from these large-scale datasets using a novel deep learning approach that combines ensembling and uncertainty modelling to predict the function of untested RBSs with high accuracy.…”
mentioning
confidence: 99%