2019
DOI: 10.1021/acssynbio.9b00061
|View full text |Cite
|
Sign up to set email alerts
|

Rapid, Heuristic Discovery and Design of Promoter Collections in Non-Model Microbes for Industrial Applications

Abstract: Well-characterized promoter collections for synthetic biology applications are not always available in industrially relevant hosts. We developed a broadly applicable method for promoter identification in atypical microbial hosts that requires no a priori understanding of cis-regulatory element structure. This novel approach combines bioinformatic filtering with rapid empirical characterization to expand the promoter toolkit and uses machine learning to improve the understanding of the relationship between DNA … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(15 citation statements)
references
References 77 publications
0
15
0
Order By: Relevance
“…With higher nucleotide differences of 35%, 200 samples provided reasonable prediction qualities (g). Libraries including multiple sigma factors (e, f) (Gilman et al, 2019) resulted in lower prediction qualities, which parallels studies on heterogenous data in E. coli (Cambray et al, 2018) and yeast (Liya et al, 2021). More libraries are necessary to narrow the required sample size over the whole sequence diversity spectrum.…”
Section: Rfmentioning
confidence: 90%
See 1 more Smart Citation
“…With higher nucleotide differences of 35%, 200 samples provided reasonable prediction qualities (g). Libraries including multiple sigma factors (e, f) (Gilman et al, 2019) resulted in lower prediction qualities, which parallels studies on heterogenous data in E. coli (Cambray et al, 2018) and yeast (Liya et al, 2021). More libraries are necessary to narrow the required sample size over the whole sequence diversity spectrum.…”
Section: Rfmentioning
confidence: 90%
“…A comprehensive table with numerical values of Figure4in the Supplementary Data. Multiple transcription factors are responsible for gene expression in the data set ofGilman et al (2019), hence is not applicable (N.A. ).…”
mentioning
confidence: 99%
“…One disadvantage of this and related models is that the majority, if not all, of the data used to train the computational models comes from a model organism. This scarcity of host-specific information leads to poor model performance in non-model species (Umarov and Solovyev, 2017;Bharanikumar et al, 2018;Gilman et al, 2019). However, even in model organisms, computational models can give erroneous predictions.…”
Section: Host-specific Contextmentioning
confidence: 99%
“…Currently, a commonly used method for extracting sequence features is position weight matrix [34], but the approach may not be transferable to different species [33]. Another problem with promoter strength prediction is the relative lack of data, particularly in cases where machine learning is applied to experimentally characterized promoters [35,36]. But, use of genome-wide RNA-seq data may provide sufficient data that significantly improves machine learning based predictions of promoter strength.…”
Section: Optimization Of Gene Expression Regulatory Elementsmentioning
confidence: 99%