20Pathogen genomic data are increasingly used to characterize global and local transmission patterns of important 21 human pathogens and to inform public health interventions. Yet there is no current consensus on how to measure 22 genomic variation. We investigated the effects of variant identification approaches on transmission inferences for 23 M. tuberculosis by comparing variants identified by five different groups in the same sequence data from a clonal 24 outbreak. We then measured the performance of commonly used variant calling approaches in recovering 25 variation in a simulated tuberculosis outbreak and tested the effect of applying increasingly stringent filters on 26 transmission inferences and phylogenies. We found that variant calling approaches used by different groups do 27 not recover consistent sets of variants, often leading to conflicting transmission inferences. Further, performance 28 in recovering true outbreak variation varied widely across approaches. Finally, stringent filters rapidly eroded the 29 accuracy of transmission inferences and quality of phylogenies reconstructed from outbreak variation. We 30 conclude that measurements of genetic distance and phylogenetic structure are dependent on variant calling 31 approach. Variant calling algorithms trained upon true sequence data outperform other approaches and enable 32 inclusion of repetitive regions typically excluded from genomic epidemiology studies, maximizing the 33 information gleaned from outbreak genomes. 34 35 42 settings, it is unknown where and between whom the majority of transmission occurs 2-4 and therefore where to 43 focus interventions. Patterns of M. tuberculosis genetic and genomic variation are frequently used to identify 44 potential recent transmission events. M. tuberculosis isolates that share a genotype (RFLP, spoligotype, or MIRU-45 VNTR) 5-7 , or whose whole genome sequences are within a given genetic distance [8][9][10][11] , are considered clustered and 46 potentially epidemiologically linked. Phylogenies inferred from outbreak variation may reveal patterns of 47 relatedness within and between clusters 11-13 . Finally, transmission trees integrate epidemiological and phylogenetic 48 information to capture probable transmission histories, chains of who infected whom 14,15 . Predicted transmission 49 links have been used to infer the likely location and/or timing 16,17 of transmission, to identify risk factors for 50 transmission and high risk populations 18 , to distinguish between acquired (primary) and transmitted drug 51 resistance 19 , and to declare an outbreak over 20 . 52 3 Transmission inferences in molecular epidemiology for M. tuberculosis and other pathogens rely on the 53 high-quality measurement of genetic variation from sequence data. However, there is no consensus on how to 54 measure pathogen genomic variation, and studies frequently employ different sequence quality control measures, 55 mapping algorithms, variant callers, and variant filters 21 . Further, the performance of variant calling...