2022
DOI: 10.1371/journal.pcbi.1010240
|View full text |Cite
|
Sign up to set email alerts
|

Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators

Abstract: It is well-established that neural networks can predict or identify structural motifs of non-coding RNAs (ncRNAs). Yet, the neural network based identification of RNA structural motifs is limited by the availability of training data that are often insufficient for learning features of specific ncRNA families or structural motifs. Aiming to reliably identify intrinsic transcription terminators in bacteria, we introduce a novel pre-training approach that uses inverse folding to generate training data for predict… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 67 publications
(93 reference statements)
0
4
0
Order By: Relevance
“…This was revealed by the validation study as well as our observation that TSS-Captur's predicted transcripts often aligned only with subsequences within the RFAM entries, indicating an overestimation of the predicted length. In the future, we may consider integrating more recent tools or other data sources for the prediction of the 3'-end, such as TermNN [4] or Term-seq data [7]. For the classification step of uncharacterized transcript regions, we used two complementary methods.…”
Section: Discussion and Outlookmentioning
confidence: 99%
“…This was revealed by the validation study as well as our observation that TSS-Captur's predicted transcripts often aligned only with subsequences within the RFAM entries, indicating an overestimation of the predicted length. In the future, we may consider integrating more recent tools or other data sources for the prediction of the 3'-end, such as TermNN [4] or Term-seq data [7]. For the classification step of uncharacterized transcript regions, we used two complementary methods.…”
Section: Discussion and Outlookmentioning
confidence: 99%
“…We compared BacTermFinder's performance with that of TermNN [8], iTerm-PseKNC [19], RhoTermPredict [16] and TransTermHP [36] on terminators of five bacterial species (Table 3) not used for generating our model. We decided to include TermNN, RhoTermPredict and iTerm-PseKNC in the comparative assessment because they are the most recently developed tools available for predicting intrinsic, factor-dependent and both types of terminators, respectively (Table 1).…”
Section: Comparative Assessment For Genome-wide Terminator Predictionmentioning
confidence: 99%
“…The availability of genomewide transcription termination sites (TTSs) identified by RNA-seq technologies such as Term-Seq [14], Send-seq [33], SMRT-cappable [69], RendSeq [38], RNATag-seq [60], and dRNA-seq [59] in several bacterial species opens the door to generate a speciesagnostic machine learning-based model using a large number (i.e., thousands) of terminator sequences of a wide range of bacterial species. Here, we generated such a method by 1) gathering a large collection of TTSs from published studies, 2) exploring thousands of features to represent (encode) terminator sequences, 3) generating and assessing eleven different machine learning models to identify bacterial terminators, and 4) comparatively assessing the performance of our best model (BacTermFinder) with the performance of four other bacterial terminator prediction methods (namely, TermNN [8], iTerm-PseKNC [19], RhoTermPredict [16] and TransTer-mHP [36]). Our results show that BacTermFinder can detect intrinsic and factor-dependent terminators and even archaeal terminators at a higher recall rate than current tools.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, there is an urgent need for a comprehensive computational platform to thoroughly explore the sequence space and efficiently design functional RNAs. With the advancement of deep neural networks in computational biology, they have demonstrated immense potential in prediction and de novo RNA generation tasks, such as riboswitches, terminators, tRNAs, ribozymes and others 22,[24][25][26] . For example, Sumi et al designed the RNA Family Sequence Generator (RfamGen), a novel approach for generating functional RNA family sequences, which involves sampling points from a semantically rich and continuous representation, enabling the design of artificial sequences 27 .…”
Section: Introductionmentioning
confidence: 99%