Model-driven promoter strength prediction based on a fine-tuned synthetic promoter library in<i>Escherichia coli</i>

Zhao, Mei; Zhou, Shenghu; Wu, Longtao; Deng, Yu

doi:10.1101/2020.06.25.170365

Cited by 8 publications

(19 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The sequence extraction reduces the sample size because more redundant sequences are generated, and sequence diversity increased because nucleotide diversity covers a shorter sequence. Figure 3 shows that the partitioned dataset 'k' (from Zhao et al (2020)) maintained the quality of prediction from the original dataset: the F1-CoV and top three feature contribution increased proportionally to the original data (B, C and D). This proportional movement was not displayed by the extracted sequences ('h', 'i') from Meng et al (2013).…”

Section: Liebal Et Al 2020: Exp2ipynbmentioning

confidence: 99%

“…without detailed processing. For quality assessment, we generated three additionally data sets based on partitioned sequence regions from the Meng et al (2013) and the Zhao et al (2020) data sets. The original sequence from Meng et al (2013) is 224 nucleotides (nt) long and we extracted the starting 40 nt ('h') and last 40 nt ('i'), assuming that few predictive positions are contained in the new sequences.…”

Section: Liebal Et Al 2020: Exp2ipynbmentioning

confidence: 99%

“…The original sequence from Meng et al (2013) is 224 nucleotides (nt) long and we extracted the starting 40 nt ('h') and last 40 nt ('i'), assuming that few predictive positions are contained in the new sequences. Zhao et al (2020) tested 113 nt ranging from upstream regulating elements down to the coding sequence. Again the first 40 nt ('k') were extracted, corresponding to the upstream regulating element.…”

Section: Liebal Et Al 2020: Exp2ipynbmentioning

confidence: 99%

“…We were interested to examine how the performance is affected when subsequences were extracted from libraries. We chose the libraries from Meng et al (2013) and Zhao et al (2020) because they report high feature number with low sequence diversity. The starting ('h') and ending ('i') 40 nt were extracted from the promoter sequence in Meng et al (2013), which had low features with importance.…”

Section: Liebal Et Al 2020: Exp2ipynbmentioning

confidence: 99%

“…For example, Meng et al (2013Meng et al ( , 2017 analyzed 98  70 promoter sequences and fine-tuned heterologous expression with predicted synthetic promoters. The same system was analyzed by Zhao et al (2020) with over 3500 promoter sequences. In Bacillus subtilis Liu et al (2018) employed a synthetic promoter library with 214 sequences to fine-tune pathway activity for metabolite overproduction.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Exp2Ipynb: A general machine-learning workflow for the analysis of promoter libraries

Liebal

Koebbing

Blank

2020

Preprint

View full text Add to dashboard Cite

Strain engineering in biotechnology modifies metabolic pathways in microorganisms to overproduce target metabolites. To modify metabolic pathway activity in bacteria, gene expression is an effective and easy manipulated process, specifically the promoter sequence recognized by sigma factors. Promoter libraries are generated to scan the expression activity of different promoter sequences and to identify sequence positions that predict activity. To maximize information retrieval, a well-designed experimental setup is required. We present a computational workflow to analyse promoter libraries; by applying this workflow to seven libraries, we aim to identify critical design principles. The workflow is based on a Python Jupyter Notebook and covers the following steps: (i) statistical sequence analysis, (ii) sequence-input to expression-output predictions, (iii) estimator performance evaluation, and (iv) new sequence prediction with defined activity. The workflow can process multiple promoter libraries, across species or reporter proteins, and classify or regress expression activity. The strongest predictions in the sample libraries were achieved when the promoters in the library were recognized by a single sigma factor and a unique reporter system. A trade-off between sample size and sequence diversity reduces prediction quality, and we present a relationship to estimate the minimum sample size. The workflow guides the user through analysis and machine-learning training, is open source and easily adaptable to include alternative machine-learning strategies and to process sequence libraries from other expression-related problems. The workflow is a contribution to increase insight to the growing application of high-throughput experiments and provides support for efficient strain engineering.

show abstract

Section: Liebal Et Al 2020: Exp2ipynbmentioning

confidence: 99%

Section: Liebal Et Al 2020: Exp2ipynbmentioning

confidence: 99%

Section: Liebal Et Al 2020: Exp2ipynbmentioning

confidence: 99%

Section: Liebal Et Al 2020: Exp2ipynbmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Exp2Ipynb: A general machine-learning workflow for the analysis of promoter libraries

Liebal

Koebbing

Blank

2020

Preprint

View full text Add to dashboard Cite

show abstract

Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants

Choi

Kim

Koo

2023

Biotechnol Bioproc E

View full text Add to dashboard Cite

Codon-Restrained Method for Both Eliminating and Creating Intragenic Bacterial Promoters

2022

View full text Add to dashboard Cite

Future applications of synthetic biology will require refactored genetic sequences devoid of internal regulatory elements within coding sequences. These regulatory elements include cryptic and intragenic promoters, which may constitute up to a third of the predicted Escherichia coli promoters. The promoter activity is dependent on the structural interaction of core bases with a σ factor. Rational engineering can be used to alter key promoter element nucleotides interacting with σ factors and eliminate downstream transcriptional activity. In this paper, we present codon-restrained promoter silencing (CORPSE), a system for removing intragenic promoters. CORPSE exploits the DNA-σ factor structural relationship to disrupt σ 70 promoters embedded within gene coding sequences with a minimum of synonymous codon changes. Additionally, we present an inverted CORPSE system, iCORPSE, which can create highly active promoters within a gene sequence while not perturbing the function of the modified gene.

show abstract

Model-driven promoter strength prediction based on a fine-tuned synthetic promoter library inEscherichia coli

Cited by 8 publications

References 56 publications

Exp2Ipynb: A general machine-learning workflow for the analysis of promoter libraries

Exp2Ipynb: A general machine-learning workflow for the analysis of promoter libraries

Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants

Codon-Restrained Method for Both Eliminating and Creating Intragenic Bacterial Promoters

Contact Info

Product

Resources

About