Summary
Standard bioinformatics pipelines for the analysis of bacterial transcriptomic data commonly ignore non-coding but functional elements e.g. small RNAs, long antisense RNAs or untranslated regions (UTRs) of mRNA transcripts. The root of this problem is the use of incomplete genome annotation files. Here, we present baerhunter, a coverage-based method implemented in R, that automates the discovery of expressed non-coding RNAs and UTRs from RNA-seq reads mapped to a reference genome. The core algorithm is part of a pipeline that facilitates downstream analysis of both coding and non-coding features. The method is simple, easy to extend and customize and, in limited tests with simulated and real data, compares favourably against the currently most popular alternative.
Availability and implementation
The baerhunter R package is available from: https://github.com/irilenia/baerhunter
Supplementary information
Supplementary data are available at Bioinformatics online.
11Summary 12 Standard bioinformatics pipelines for the analysis of bacterial transcriptomic 13 data commonly ignore non-coding but functional elements e.g. small RNAs, long 14 antisense RNAs or untranslated regions (UTRs) of mRNA transcripts. The root of 15 this problem is the use of incomplete genome annotation files. Here, we present 16 baerhunter, a method implemented in R, that automates the discovery of 17 expressed non-coding RNAs and UTRs from RNA-seq reads mapped to a 18 reference genome. The core algorithm is part of a pipeline that facilitates 19 downstream analysis of both coding and non-coding features. The method is 20 simple, easy to extend and customize and, in limited tests with simulated and 21 real data, compares favourably against the currently most popular alternative. 22
Availability
23The baerhunter R package is available from: 24 https://github.com/irilenia/baerhunter 25 Contact 26 i.nobeli@bbk.ac.uk 27
A whole genome co-expression network was created using Mycobacterium tuberculosis transcriptomic data from publicly available RNA-sequencing experiments covering a wide variety of experimental conditions. The network includes expressed regions with no formal annotation, including putative short RNAs and untranslated regions of expressed transcripts, along with the protein-coding genes. These unannotated expressed transcripts were among the best-connected members of the module sub-networks, making up more than half of the "hub" elements in modules that include protein-coding genes known to be part of regulatory systems involved in stress response and host adaptation. This dataset provides a valuable resource for investigating the role of non-coding RNA, and conserved hypothetical proteins, in transcriptomic remodelling. Based on their connections to genes with known functional groupings and correlations with replicated host conditions, predicted expressed transcripts can be screened as suitable candidates for further experimental validation.
A whole genome co-expression network was created using Mycobacterium tuberculosis transcriptomic data from publicly available RNA-sequencing experiments covering a wide variety of experimental conditions. The network includes expressed regions with no formal annotation, including putative short RNAs and untranslated regions of expressed transcripts, along with the protein-coding genes. These unannotated expressed transcripts were among the best-connected members of the module sub-networks, making up more than half of the 'hub' elements in modules that include protein-coding genes known to be part of regulatory systems involved in stress response and host adaptation.This dataset provides a valuable resource for investigating the role of non-coding RNA, and conserved hypothetical proteins, in transcriptomic remodelling. Based on their connections to genes with known functional groupings and correlations with replicated host conditions, predicted expressed transcripts can be screened as suitable candidates for further experimental validation.
Non-coding RNA prediction and quantificationEach dataset was run through the R-package, baerhunter (Ozuna et al., 2019), using the 'feature_file_editor' function optimised to the most appropriate parameters for the sequencing depth (https://doi.org/10.5281/zenodo.7709329). 'Count_features' and 'tpm_norm_flagging' functions were used for transcript quantification and to identify low
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.