Interpretable genotype-to-phenotype classifiers with performance guarantees

Drouin, Alexandre; Letarte, Gaël; Raymond, Frédéric; Marchand, Mario; Corbeil, Jacques; Laviolette, François

doi:10.1038/s41598-019-40561-2

Cited by 82 publications

(120 citation statements)

References 56 publications

Supporting

Mentioning

113

Contrasting

Order By: Relevance

“…First, our reference-based approach had high performance (AUROC, 0.885 to 0.933) and outperformed the rule-based approach that we used as a benchmark. Despite PR being more biologically complex than other forms of antimicrobial resistance in K. pneumoniae, it was encouraging that our results were similar to those obtained for other antimicrobials with less complex mechanisms of resistance (e.g., carbapenem resistance) tested with ML approaches (14,15,(34)(35)(36). The performance of this approach falls below the FDA cutoffs for AST tests (37).…”

Section: Discussionsupporting

confidence: 77%

“…Impact of feature engineering with GWAS filtering and polymyxin exposure data on performance of ML-based prediction. A key challenge in AST genotypephenotype prediction is the sparsity of the input genomic data sets due to relatively few genomes in data sets compared to the number of genomic features (14). Appropriate feature selection prior to training ML algorithms is a potential solution to this…”

Section: Resultsmentioning

confidence: 99%

“…Second, these rule-based models struggle to account for complex interactions between variants in multiple loci. In order to move beyond these limitations, machine learning (ML) methods have been used to predict antimicrobial susceptibility (14)(15)(16)(17). Given the incomplete identification of contributing PR mutations and the possible polygenic nature of PR, we hypothesize that ML approaches may be well suited to AST genotype-phenotype prediction in this setting (18) and may ultimately be used to help identify isolates for confirmatory phenotypic testing.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Predicting Phenotypic Polymyxin Resistance in Klebsiella pneumoniae through Machine Learning Analysis of Genomic Data

et al. 2020

View full text Add to dashboard Cite

Polymyxins are used as treatments of last resort for Gram-negative bacterial infections. Their increased use has led to concerns about emerging polymyxin resistance (PR). Phenotypic polymyxin susceptibility testing is resource intensive and difficult to perform accurately. The complex polygenic nature of PR and our incomplete understanding of its genetic basis make it difficult to predict PR using detection of resistance determinants. We therefore applied machine learning (ML) to whole-genome sequencing data from >600 Klebsiella pneumoniae clonal group 258 (CG258) genomes to predict phenotypic PR. Using a reference-based representation of genomic data with ML outperformed a rule-based approach that detected variants in known PR genes (area under receiver-operator curve [AUROC], 0.894 versus 0.791, P = 0.006). We noted modest increases in performance by using a bacterial genome-wide association study to filter relevant genomic features and by integrating clinical data in the form of prior polymyxin exposure. Conversely, reference-free representation of genomic data as k-mers was associated with decreased performance (AUROC, 0.692 versus 0.894, P = 0.015). When ML models were interpreted to extract genomic features, six of seven known PR genes were correctly identified by models without prior programming and several genes involved in stress responses and maintenance of the cell membrane were identified as potential novel determinants of PR. These findings are a proof of concept that whole-genome sequencing data can accurately predict PR in K. pneumoniae CG258 and may be applicable to other forms of complex antimicrobial resistance. IMPORTANCE Polymyxins are last-resort antibiotics used to treat highly resistant Gram-negative bacteria. There are increasing reports of polymyxin resistance emerging, raising concerns of a postantibiotic era. Polymyxin resistance is therefore a significant public health threat, but current phenotypic methods for detection are difficult and time-consuming to perform. There have been increasing efforts to use whole-genome sequencing for detection of antibiotic resistance, but this has been difficult to apply to polymyxin resistance because of its complex polygenic nature. The significance of our research is that we successfully applied machine learning methods to predict polymyxin resistance in Klebsiella pneumoniae clonal group 258, a common health care-associated and multidrug-resistant pathogen. Our findings highlight that machine learning can be successfully applied even in complex forms of antibiotic resistance and represent a significant contribution to the literature that could be used to predict resistance in other bacteria and to other antibiotics.

show abstract

Section: Discussionsupporting

confidence: 77%

Section: Resultsmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Predicting Phenotypic Polymyxin Resistance in Klebsiella pneumoniae through Machine Learning Analysis of Genomic Data

et al. 2020

View full text Add to dashboard Cite

show abstract

“…They are potentially broadly applicable to any problem with vast amounts of data, though they perform best when the number of data points exceeds the number of dimensions. Unsurprisingly their uptake in sequence analysis (19,20) and bacterial genomics specifically has been rapid (4,5,21). In general, the predictive variants in the identified sets may not fully overlap those significant by themselves, and the mapping does not necessarily lend itself readily to the same interpretation as P values in a GWAS.…”

mentioning

confidence: 99%

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions

Lees

Mai

Galardini

et al. 2020

mBio

130

View full text Add to dashboard Cite

ABSTRACT Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially. IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.

show abstract

“…Such additional insight may help one to better understand the AMR causality of alleles in future studies. The latest and most comprehensive work within this approach was published by Drouin and colleagues and covers 12 bacterial species and 56 drugs [13]. In addition to generating the collections of k-mers associated with susceptible or resistant strains, they applied rule-based ML algorithms-Classification and Regression Trees (CART) and Set Covering Machines (SCM).…”

Section: Approach 3: Arg Agnostic Identification Of Amr Mechanisms VImentioning

confidence: 99%

Bioinformatics Approaches to the Understanding of Molecular Mechanisms in Antimicrobial Resistance

Camp

Haslam

Porollo

2020

IJMS

View full text Add to dashboard Cite

Antimicrobial resistance (AMR) is a major health concern worldwide. A better understanding of the underlying molecular mechanisms is needed. Advances in whole genome sequencing and other high-throughput unbiased instrumental technologies to study the molecular pathogenicity of infectious diseases enable the accumulation of large amounts of data that are amenable to bioinformatic analysis and the discovery of new signatures of AMR. In this work, we review representative methods published in the past five years to define major approaches developed to-date in the understanding of AMR mechanisms. Advantages and limitations for applications of these methods in clinical laboratory testing and basic research are discussed.

show abstract

Interpretable genotype-to-phenotype classifiers with performance guarantees

Cited by 82 publications

References 56 publications

Predicting Phenotypic Polymyxin Resistance in Klebsiella pneumoniae through Machine Learning Analysis of Genomic Data

Predicting Phenotypic Polymyxin Resistance in Klebsiella pneumoniae through Machine Learning Analysis of Genomic Data

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions

Bioinformatics Approaches to the Understanding of Molecular Mechanisms in Antimicrobial Resistance

Contact Info

Product

Resources

About