Tuberculosis (TB) control programs use whole-genome sequencing (WGS) of Mycobacterium tuberculosis (Mtb) for detecting and investigating TB case clusters. Existence of few genomic differences between Mtb isolates might indicate TB cases are the result of recent transmission. However, the variable and sometimes long duration of latent infection, combined with uncertainty in the Mtb mutation rate during latency, can complicate interpretation of WGS results. To estimate the association between infection duration and single nucleotide polymorphism (SNP) accumulation in the Mtb genome, we first analyzed pairwise SNP differences among TB cases from Los Angeles County, California, with strong epidemiologic links. We found that SNP distance alone was insufficient for concluding that cases are linked through recent transmission. Second, we describe a well-characterized cluster of TB cases in California to illustrate the role of genomic data in conclusions regarding recent transmission. Longer presumed latent periods were inconsistently associated with larger SNP differences. Our analyses suggest that WGS alone cannot be used to definitively determine that a case is attributable to recent transmission. Methods for integrating clinical, epidemiologic, and genomic data can guide conclusions regarding the likelihood of recent transmission, providing local public health practitioners with better tools for monitoring and investigating TB transmission.
Understanding tuberculosis (TB) transmission chains can help public health staff target their resources to prevent further transmission, but currently there are few tools to automate this process. We have developed the Logically Inferred Tuberculosis Transmission (LITT) algorithm to systematize the integration and analysis of whole-genome sequencing, clinical, and epidemiological data. Based on the work typically performed by hand during a cluster investigation, LITT identifies and ranks potential source cases for each case in a TB cluster. We evaluated LITT using a diverse dataset of 534 cases in 56 clusters (size range: 2–69 cases), which were investigated locally in three different U.S. jurisdictions. Investigators and LITT agreed on the most likely source case for 145 (80%) of 181 cases. By reviewing discrepancies, we found that many of the remaining differences resulted from errors in the dataset used for the LITT algorithm. In addition, we developed a graphical user interface, user's manual, and training resources to improve LITT accessibility for frontline staff. While LITT cannot replace thorough field investigation, the algorithm can help investigators systematically analyze and interpret complex data over the course of a TB cluster investigation.Code available at:https://github.com/CDCgov/TB_molecular_epidemiology/tree/1.0; https://zenodo.org/badge/latestdoi/166261171.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.