Riboswitches are cis-regulatory genetic elements that use an aptamer to control gene expression. Specificity to cognate ligand and diversity of such ligands have expanded the functional repetoire of riboswitches to mediate mounting apt responses to sudden metabolic demands and signal changes in environmental conditions. Given their critical role in microbial life, and novel uses in synthetic biotechnology, riboswitch characterisation remains a challenging computational problem hitherto tackled with probabilitistic frameworks and state-of-the-art machine learning. Here we have addressed the issue with advanced deep learning frameworks, namely convolutional neural networks (CNN), and bidirectional recurrent neural networks (RNN) with Long Short-Term Memory (LSTM). Using a comprehensive dataset of 32 ligand classes and a stratified train-validate-test approach, we demonstrated the superior performance of both the deep models (CNN and RNN) relative to other conventional machine learning classifiers on all key performance metrics, including the ROC curve analysis. In particular, the bidirectional LSTM RNN emerged as the best-performing learning method for identifying the ligand-specificity of riboswitches with an accuracy > 0.99 and F-score of 0.96. A dynamic update functionality is inbuilt to account for the discovery of new riboswitches and extend the predictive modelling to any number of new additional classes. Our work would be valuable in the design and assembly of genetic circuits and the development of the next generation of antibiotics. The software is freely available as a Python package and standalone resource for wide use in genome annotation and biotechnology workflows.
Availability:PyPi package: riboflow @ https://pypi.org/project/riboflow Repository with Standalone suite of tools: https://github.com/RiboswitchClassifier Language: Python 3.6 with numpy, keras, and tensorflow libraries.
Introduction:Riboswitches are ubiquitous and critical metabolite-sensing gene expression regulators in bacteria that are capable of folding into at least two alternative conformations of 5'UTR mRNA secondary structure, which functionally switch gene expression between on and off states [1][2][3]. The selection of conformation is dictated by the presence and binding of ligand cognate to the aptamer domain of a given riboswitch [4][5][6]. Cognate ligands are key metabolites that mediate responses to internal metabolic or external stimuli. Consequent to conformational changes, riboswitches ultimately weaken transcriptional termination or occlude the ribosome binding site thereby inhibiting translation initiation of associated genes [7][8]. Riboswitches provide an intriguing window into the 'RNA world' biology [9][10][11][12] and there is evidence of their wider distribution in complex genomes [13][14][15][16]. The modular properties of riboswitches have engendered the possibility of synthetic control of gene expression [17], and combined with the ability to engineer binding to an ad hoc ligand, riboswitches have turned out to ...