Most bacterial ORFs are identified by automated prediction algorithms. However, these algorithms often fail to identify ORFs lacking canonical features such as a length of >50 codons or the presence of an upstream Shine-Dalgarno sequence. Here, we use ribosome profiling approaches to identify actively translated ORFs in Mycobacterium tuberculosis. Most of the ORFs we identify have not been previously described, indicating that the M. tuberculosis transcriptome is pervasively translated. The newly described ORFs are predominantly short, with many encoding proteins of ≤50 amino acids. Codon usage of the newly discovered ORFs suggests that most have not been subject to purifying selection, and hence are unlikely to contribute to cell fitness. Nevertheless, we identify 90 new ORFs (median length of 52 codons) that bear the hallmarks of purifying selection. Thus, our data suggest that pervasive translation of short ORFs in Mycobacterium tuberculosis serves as a rich source for the evolution of new functional proteins.
Functional characterization of bacterial proteins lags far behind the identification of new protein families. This is especially true for bacterial species that are more difficult to grow and genetically manipulate than model systems such as Escherichia coli and Bacillus subtilis. To facilitate functional characterization of mycobacterial proteins, we have established a Mycobacterial Systems Resource (MSR) using the model organism Mycobacterium smegmatis. This resource focuses specifically on 1,153 highly conserved core genes that are common to many mycobacterial species, including Mycobacterium tuberculosis, in order to provide the most relevant information and resources for the mycobacterial research community. The MSR includes both biological and bioinformatic resources. The biological resource includes (i) an expression plasmid library of 1,116 genes fused to a fluorescent protein for determining protein localization; (ii) a library of 569 precise deletions of nonessential genes; and (iii) a set of 843 CRISPR-interference (CRISPRi) plasmids specifically targeted to silence expression of essential core genes and genes for which a precise deletion was not obtained. The bioinformatic resource includes information about individual genes and a detailed assessment of protein localization. We anticipate that integration of these initial functional analyses and the availability of the biological resource will facilitate studies of these core proteins in many Mycobacterium species, including the less experimentally tractable pathogens M. abscessus, M. avium, M. kansasii, M. leprae, M. marinum, M. tuberculosis, and M. ulcerans. IMPORTANCE Diseases caused by mycobacterial species result in millions of deaths per year globally, and present a substantial health and economic burden, especially in immunocompromised patients. Difficulties inherent in working with mycobacterial pathogens have hampered the development and application of high-throughput genetics that can inform genome annotations and subsequent functional assays. To facilitate mycobacterial research, we have created a biological and bioinformatic resource (https://msrdb.org/) using Mycobacterium smegmatis as a model organism. The resource focuses specifically on 1,153 proteins that are highly conserved across the mycobacterial genus and, therefore, likely perform conserved mycobacterial core functions. Thus, functional insights from the MSR will apply to all mycobacterial species. We believe that the availability of this mycobacterial systems resource will accelerate research throughout the mycobacterial research community.
ORF boundaries in bacterial genomes have largely been drawn by gene prediction algorithms. These algorithms often fail to predict ORFs with non-canonical features. Recent developments in genome-scale mapping of translation have facilitated the empirical identification of ORFs. Here, we use ribosome profiling approaches to map initiating and elongating ribosomes in Mycobacterium tuberculosis. Thus, we identify over 1,000 novel ORFs, revealing that much of the genome encodes proteins in overlapping reading frames, and/or on both strands. Most of the novel ORFs are short (sORFs), impeding their identification by traditional methods. The strong codon bias that characterizes annotated mycobacterial ORFs is not evident in the aggregate novel sORFs; hence most are unlikely to encode functional proteins. Our data suggest that bacterial transcriptomes are subject to pervasive translation. We speculate that the inefficiency of expressing spurious sORFs may be offset by positive contributions to M. tuberculosis biology through activities of a small subset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.