A whole genome co-expression network was created using Mycobacterium tuberculosis transcriptomic data from publicly available RNA-sequencing experiments covering a wide variety of experimental conditions. The network includes expressed regions with no formal annotation, including putative short RNAs and untranslated regions of expressed transcripts, along with the protein-coding genes. These unannotated expressed transcripts were among the best-connected members of the module sub-networks, making up more than half of the 'hub' elements in modules that include protein-coding genes known to be part of regulatory systems involved in stress response and host adaptation.This dataset provides a valuable resource for investigating the role of non-coding RNA, and conserved hypothetical proteins, in transcriptomic remodelling. Based on their connections to genes with known functional groupings and correlations with replicated host conditions, predicted expressed transcripts can be screened as suitable candidates for further experimental validation.
Non-coding RNA prediction and quantificationEach dataset was run through the R-package, baerhunter (Ozuna et al., 2019), using the 'feature_file_editor' function optimised to the most appropriate parameters for the sequencing depth (https://doi.org/10.5281/zenodo.7709329). 'Count_features' and 'tpm_norm_flagging' functions were used for transcript quantification and to identify low