15Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria 16 such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. 17 Modern methods based on testing against a catalogue of previously identified mutations often yield 18 poor predictive performance. On the other hand, machine learning techniques have demonstrated 19 high predictive accuracy, but lack interpretability to aid in identifying specific mutations which lead 20 to resistance. We propose a novel technique, inspired by the group testing problem and Boolean 21 compressed sensing, which yields highly accurate predictions and interpretable results at the same 22 time.
23Results: We develop a modified version of the Boolean compressed sensing problem for identifying 24 drug resistance, and implement its formulation as an integer linear program. This allows us to 25 characterize the predictive accuracy of the technique and select an appropriate metric to optimize. 26 A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of 27 our model under different regimes. We test the predictive accuracy of our approach on a variety 28 of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to 29 that of standard machine learning models and points to several genes with previously identified 30 association to drug resistance. 31 Availability: https://github.com/WGS-TB/DrugResistance/tree/RB_learning 32 Contact: hooman_zabeti@sfu.ca 33 34 2012 ACM Subject Classification Applied computing -Life and medical sciences -Computational 35 biology -Molecular sequence analysis 36 1 Introduction 43 Drug resistance is the phenomenon by which an infectious organism (also known as pathogen) 44 develops resistance to one or more drugs that are commonly used in treatment [36]. In 45 this paper we focus our attention on Mycobacterium tuberculosis, the etiological agent of 46 tuberculosis, which is the largest infectious killer in the world today, responsible for over 10 47 million new cases and 2 million deaths every year [37]. 48 The development of resistance to common drugs used in treatment is a serious public health 49 threat, not only in low and middle-income countries, but also in high-income countries where 50 it is particularly problematic in hospital settings [39]. It is estimated that, without the urgent 51 development of novel antimicrobial drugs, the total mortality due to drug resistance will 52 exceed 10 million people a year by 2050, a number exceeding the annual mortality due to 53 cancer today [35]. 54 Existing models for predicting drug resistance from whole-genome sequence (WGS) data 55 broadly fall into two classes. The first, which we refer to as "catalogue methods," involves 56 testing the WGS data of an isolate for the presence of point mutations (typically single-57 nucleotide polymorphisms, or SNPs) associated with known drug resistance. If one or 58more such mutations is identified,...