Circular RNAs (circRNAs), which are produced post-splicing of pre-mRNAs, are strongly linked to the emergence of several tumor types. The initial stage in conducting follow-up studies involves identifying circRNAs. Currently, animals are the primary target of most established circRNA recognition technologies. However, the sequence features of plant circRNAs differ from those of animal circRNAs, making it impossible to detect plant circRNAs. For example, there are non-GT/AG splicing signals at circRNA junction sites and few reverse complementary sequences and repetitive elements in the flanking intron sequences of plant circRNAs. In addition, there have been few studies on circRNAs in plants, and thus it is urgent to create a plant-specific method for identifying circRNAs. In this study, we propose CircPCBL, a deep-learning approach that only uses raw sequences to distinguish between circRNAs found in plants and other lncRNAs. CircPCBL comprises two separate detectors: a CNN-BiGRU detector and a GLT detector. The CNN-BiGRU detector takes in the one-hot encoding of the RNA sequence as the input, while the GLT detector uses k-mer (k = 1 − 4) features. The output matrices of the two submodels are then concatenated and ultimately pass through a fully connected layer to produce the final output. To verify the generalization performance of the model, we evaluated CircPCBL using several datasets, and the results revealed that it had an F1 of 85.40% on the validation dataset composed of six different plants species and 85.88%, 75.87%, and 86.83% on the three cross-species independent test sets composed of Cucumis sativus, Populus trichocarpa, and Gossypium raimondii, respectively. With an accuracy of 90.9% and 90%, respectively, CircPCBL successfully predicted ten of the eleven circRNAs of experimentally reported Poncirus trifoliata and nine of the ten lncRNAs of rice on the real set. CircPCBL could potentially contribute to the identification of circRNAs in plants. In addition, it is remarkable that CircPCBL also achieved an average accuracy of 94.08% on the human datasets, which is also an excellent result, implying its potential application in animal datasets. Ultimately, CircPCBL is available as a web server, from which the data and source code can also be downloaded free of charge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.