BackgroundTuberculosis (TB) is the leading cause of infectious disease mortality worldwide. Numerous blood-based gene expression signatures have been proposed in the literature as alternative tools for diagnosing TB infection. Ongoing efforts are actively focused on developing additional signatures in other TB-related contexts. However, the generalizability of these signatures to different patient contexts is not well-characterized. There is a pressing need for a well-curated database of TB gene expression studies for the systematic assessment of existing and newly developed TB gene signatures.ResultsWe built the curatedTBData, a manually-curated database of 49 TB transcriptomic studies. This data resource is freely available through GitHub and as an R Bioconductor package that allows users to validate new and existing biomarkers without the challenges of harmonizing heterogeneous studies. We also demonstrate the use of this data resource with cross-study comparisons for 72 TB gene signatures. For the comparison of subjects with active TB from healthy controls, 19 gene signatures had weighted mean AUC of 0.90 or greater, with the highest result of 0.94. In active TB disease versus latent TB infection, 7 gene signatures had weighted mean AUC of 0.90 or greater, with a maximum of 0.93. We also explore ensembling methods for averaging predictions from multiple gene signatures to significantly improve diagnostic ability beyond any single signature.ConclusionsThe curatedTBData data package offers a comprehensive resource of curated gene expression and clinically annotated data. It could be used to identify robust new TB gene signatures, to perform comparative analysis of existing TB gene signatures, and to develop alternative gene set scoring or ensembling methods, among other things. This resource will also facilitate the development of new signatures that are generalizable across cohorts or more applicable to specific subsets of patients (e.g. with rare comorbid conditions, etc.). We demonstrated that these blood-based gene signatures could distinguish patients with distinct TB outcomes; moreover, the combination of multiple gene signatures could improve the overall predictive accuracy in differentiating these subtypes, which point out an important aspect for the translation of genomics to clinical implementation.