2019
DOI: 10.1038/s41598-019-40561-2
|View full text |Cite
|
Sign up to set email alerts
|

Interpretable genotype-to-phenotype classifiers with performance guarantees

Abstract: Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations b… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

6
113
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 82 publications
(120 citation statements)
references
References 56 publications
6
113
1
Order By: Relevance
“…First, our reference-based approach had high performance (AUROC, 0.885 to 0.933) and outperformed the rule-based approach that we used as a benchmark. Despite PR being more biologically complex than other forms of antimicrobial resistance in K. pneumoniae, it was encouraging that our results were similar to those obtained for other antimicrobials with less complex mechanisms of resistance (e.g., carbapenem resistance) tested with ML approaches (14,15,(34)(35)(36). The performance of this approach falls below the FDA cutoffs for AST tests (37).…”
Section: Discussionsupporting
confidence: 77%
See 2 more Smart Citations
“…First, our reference-based approach had high performance (AUROC, 0.885 to 0.933) and outperformed the rule-based approach that we used as a benchmark. Despite PR being more biologically complex than other forms of antimicrobial resistance in K. pneumoniae, it was encouraging that our results were similar to those obtained for other antimicrobials with less complex mechanisms of resistance (e.g., carbapenem resistance) tested with ML approaches (14,15,(34)(35)(36). The performance of this approach falls below the FDA cutoffs for AST tests (37).…”
Section: Discussionsupporting
confidence: 77%
“…Impact of feature engineering with GWAS filtering and polymyxin exposure data on performance of ML-based prediction. A key challenge in AST genotypephenotype prediction is the sparsity of the input genomic data sets due to relatively few genomes in data sets compared to the number of genomic features (14). Appropriate feature selection prior to training ML algorithms is a potential solution to this…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…They are potentially broadly applicable to any problem with vast amounts of data, though they perform best when the number of data points exceeds the number of dimensions. Unsurprisingly their uptake in sequence analysis (19,20) and bacterial genomics specifically has been rapid (4,5,21). In general, the predictive variants in the identified sets may not fully overlap those significant by themselves, and the mapping does not necessarily lend itself readily to the same interpretation as P values in a GWAS.…”
mentioning
confidence: 99%
“…Such additional insight may help one to better understand the AMR causality of alleles in future studies. The latest and most comprehensive work within this approach was published by Drouin and colleagues and covers 12 bacterial species and 56 drugs [13]. In addition to generating the collections of k-mers associated with susceptible or resistant strains, they applied rule-based ML algorithms-Classification and Regression Trees (CART) and Set Covering Machines (SCM).…”
Section: Approach 3: Arg Agnostic Identification Of Amr Mechanisms VImentioning
confidence: 99%