We present PromoterPredict, a dynamic multiple regression approach to predict the strength of Escherichia coli promoters binding the σ70 factor of RNA polymerase. σ70 promoters are ubiquitously used in recombinant DNA technology, but characterizing their strength is demanding in terms of both time and money. We parsed a comprehensive database of bacterial promoters for the −35 and −10 hexamer regions of σ70-binding promoters and used these sequences to construct the respective position weight matrices (PWM). Next we used a well-characterized set of promoters to train a multivariate linear regression model and learn the mapping between PWM scores of the −35 and −10 hexamers and the promoter strength. We found that the log of the promoter strength is significantly linearly associated with a weighted sum of the −10 and −35 sequence profile scores. We applied our model to 100 sets of 100 randomly generated promoter sequences to generate a sampling distribution of mean strengths of random promoter sequences and obtained a mean of 6E-4 ± 1E-7. Our model was further validated by cross-validation and on independent datasets of characterized promoters. PromoterPredict accepts −10 and −35 hexamer sequences and returns the predicted promoter strength. It is capable of dynamic learning from user-supplied data to refine the model construction and yield more robust estimates of promoter strength. PromoterPredict is available as both a web service () and standalone tool (). Our work presents an intuitive generalization applicable to modelling the strength of other promoter classes.
Riboswitches are cis-regulatory genetic elements that use an aptamer to control gene expression. Specificity to cognate ligand and diversity of such ligands have expanded the functional repertoire of riboswitches to mediate mounting apt responses to sudden metabolic demands and signal changes in environmental conditions. Given their critical role in microbial life, riboswitch characterisation remains a challenging computational problem. Here we have addressed the issue with advanced deep learning frameworks, namely convolutional neural networks (CNN), and bidirectional recurrent neural networks (RNN) with Long Short-Term Memory (LSTM). Using a comprehensive dataset of 32 ligand classes and a stratified train-validate-test approach, we demonstrated the accurate performance of both the deep learning models (CNN and RNN) relative to conventional hyperparameter-optimized machine learning classifiers on all key performance metrics, including the ROC curve analysis. In particular, the bidirectional LSTM RNN emerged as the best-performing learning method for identifying the ligand-specificity of riboswitches with an accuracy >0.99 and macro-averaged F-score of 0.96. An additional attraction is that the deep learning models do not require prior feature engineering. A dynamic update functionality is built into the models to factor for the constant discovery of new riboswitches, and extend the predictive modeling to new classes. Our work would enable the design of genetic circuits with custom-tuned riboswitch aptamers that would effect precise translational control in synthetic biology. The associated software is available as an open-source Python package and standalone resource for use in genome annotation, synthetic biology, and biotechnology workflows.
Riboswitches are cis-regulatory genetic elements that use an aptamer to control gene expression. Specificity to cognate ligand and diversity of such ligands have expanded the functional repetoire of riboswitches to mediate mounting apt responses to sudden metabolic demands and signal changes in environmental conditions. Given their critical role in microbial life, and novel uses in synthetic biotechnology, riboswitch characterisation remains a challenging computational problem hitherto tackled with probabilitistic frameworks and state-of-the-art machine learning. Here we have addressed the issue with advanced deep learning frameworks, namely convolutional neural networks (CNN), and bidirectional recurrent neural networks (RNN) with Long Short-Term Memory (LSTM). Using a comprehensive dataset of 32 ligand classes and a stratified train-validate-test approach, we demonstrated the superior performance of both the deep models (CNN and RNN) relative to other conventional machine learning classifiers on all key performance metrics, including the ROC curve analysis. In particular, the bidirectional LSTM RNN emerged as the best-performing learning method for identifying the ligand-specificity of riboswitches with an accuracy > 0.99 and F-score of 0.96. A dynamic update functionality is inbuilt to account for the discovery of new riboswitches and extend the predictive modelling to any number of new additional classes. Our work would be valuable in the design and assembly of genetic circuits and the development of the next generation of antibiotics. The software is freely available as a Python package and standalone resource for wide use in genome annotation and biotechnology workflows. Availability:PyPi package: riboflow @ https://pypi.org/project/riboflow Repository with Standalone suite of tools: https://github.com/RiboswitchClassifier Language: Python 3.6 with numpy, keras, and tensorflow libraries. Introduction:Riboswitches are ubiquitous and critical metabolite-sensing gene expression regulators in bacteria that are capable of folding into at least two alternative conformations of 5'UTR mRNA secondary structure, which functionally switch gene expression between on and off states [1][2][3]. The selection of conformation is dictated by the presence and binding of ligand cognate to the aptamer domain of a given riboswitch [4][5][6]. Cognate ligands are key metabolites that mediate responses to internal metabolic or external stimuli. Consequent to conformational changes, riboswitches ultimately weaken transcriptional termination or occlude the ribosome binding site thereby inhibiting translation initiation of associated genes [7][8]. Riboswitches provide an intriguing window into the 'RNA world' biology [9][10][11][12] and there is evidence of their wider distribution in complex genomes [13][14][15][16]. The modular properties of riboswitches have engendered the possibility of synthetic control of gene expression [17], and combined with the ability to engineer binding to an ad hoc ligand, riboswitches have turned out to ...
We present PromoterPredict, a dynamic multiple regression approach to predict the strength of Escherichia coli promoters binding the σ 70 factor of RNA polymerase. σ 70 promoters are ubiquitously used in recombinant DNA technology, but characterizing their strength is demanding in terms of both time and money. Using a well-characterized set of promoters, we trained a multivariate linear regression model and found that the log of the promoter strength is significantly linearly associated with a weighted sum of the -10 and -35 sequence profile scores. It was found that the two regions contributed almost equally to the promoter strength. PromoterPredict accepts -10 and -35 hexamer sequences and returns the predicted promoter strength. It is capable of dynamic learning from usersupplied data to refine the model construction and yield more confident estimates of promoter strength.Availability: Open source code and a standalone executable with both dynamic model-building and prediction are available (under GNU General Public License 3.0) at https://github.com/PromoterPredict, and require Python 2.7 or greater. PromoterPredict is also available as a web service at https://promoterpredict.com.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.