20Prediction of antibiotic resistance phenotypes from whole genome sequencing data by 21 machine learning methods has been proposed as a promising platform for the 22 development of sequence-based diagnostics. However, there has been no systematic 23 evaluation of factors that may influence performance of such models, how they might 24 apply to and vary across clinical populations, and what the implications might be in the 25 clinical setting. Here, we performed a meta-analysis of seven large Neisseria 26 gonorrhoeae datasets, as well as Klebsiella pneumoniae and Acinetobacter baumannii 27 datasets, with whole genome sequence data and antibiotic susceptibility phenotypes 28 using set covering machine classification, random forest classification, and random forest 29 regression models to predict resistance phenotypes from genotype. We demonstrate how 30 model performance varies by drug, dataset, resistance metric, and species, reflecting the 31 complexities of generating clinically relevant conclusions from machine learning-derived 32 models. Our findings underscore the importance of incorporating relevant biological and 33 epidemiological knowledge into model design and assessment and suggest that doing so 34 can inform tailored modeling for individual drugs, pathogens, and clinical populations. We 35 further suggest that continued comprehensive sampling and incorporation of up-to-date 36 whole genome sequence data, resistance phenotypes, and treatment outcome data into 37 model training will be crucial to the clinical utility and sustainability of machine learning-38 based molecular diagnostics. 39 40 Author Summary: 41 3 Machine learning-based prediction of antibiotic resistance from bacterial genome 42 sequences represents a promising tool to rapidly determine the antibiotic susceptibility 43 profile of clinical isolates and reduce the morbidity and mortality resulting from 44 inappropriate and ineffective treatment. However, while there has been much focus on 45 demonstrating the diagnostic potential of these modeling approaches, there has been 46 little assessment of potential caveats and prerequisites associated with implementing 47 predictive models of drug resistance in the clinical setting. Our results highlight significant 48 biological and technical challenges facing the application of machine learning-based 49 prediction of antibiotic resistance as a diagnostic tool. By outlining specific factors 50 affecting model performance, our findings provide a framework for future work on 51 modeling drug resistance and underscore the necessity of continued comprehensive 52 sampling and reporting of treatment outcome data for building reliable and sustainable 53 diagnostics. 54 55 56 57 58 4 Introduction: 59