2016
DOI: 10.1101/045153
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons

Abstract: Background:The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. Results: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
71
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 29 publications
(73 citation statements)
references
References 55 publications
2
71
0
Order By: Relevance
“…The average very major error rate (VME), which is defined as resistant genomes that are erroneously predicted to be susceptible, and the average major error rate (ME), which is defined as susceptible genomes that are erroneously predicted to be resistant, tend to go down as gene set size increases. Although the core gene set models described in Figure 1 have lower F1 scores and higher error rates than full-genome models that have been published previously [21][22][23][24]27,29 , their accuracies are striking given the small sizes of the input data sets and the removal of well-annotated AMR genes.…”
Section: Amr Models Based On Core Genes Have Predictive Powermentioning
confidence: 84%
See 1 more Smart Citation
“…The average very major error rate (VME), which is defined as resistant genomes that are erroneously predicted to be susceptible, and the average major error rate (ME), which is defined as susceptible genomes that are erroneously predicted to be resistant, tend to go down as gene set size increases. Although the core gene set models described in Figure 1 have lower F1 scores and higher error rates than full-genome models that have been published previously [21][22][23][24]27,29 , their accuracies are striking given the small sizes of the input data sets and the removal of well-annotated AMR genes.…”
Section: Amr Models Based On Core Genes Have Predictive Powermentioning
confidence: 84%
“…These predictions are typically made using either rules-based or machine learning models [16][17][18][19][20] . Several studies have also built machine learning models for predicting AMR phenotypes by using assembled genomes or pan genomes as training sets [21][22][23][24][25][26][27] . In these cases, the machine learning algorithm detects the most discriminating features (typically short nucleotide k-mers) from a training set with laboratory-derived AMR phenotypes.…”
Section: Introductionmentioning
confidence: 99%
“…Several thousand bacterial genomes and their susceptibility to antimicrobial agents are publicly available 10 and make for an ideal study set. The use of machine learning to predict AMR phenotypes has previously been investigated using two approaches: 1) considering only known resistance genes and mutations [11][12][13][14] , 2) considering whole genomes with no prior knowledge of resistance mechanisms [15][16][17][18][19][20] . Accuracies of CART b and SCM b on the validation data of each dataset, grouped by species.…”
Section: Introductionmentioning
confidence: 99%
“…Three properties are illustrated for each rule in the models: 1) the locus at which the corresponding k-mer can be found, 2) a measure of rule importance, and 3) the number of equivalent rules. The first is the region of the genome in which the k-mer is located and was determined using the Basic Local Alignment Search Tool 27 17 for SCM b , and were normalized to sum to one. The third results from k-mers that are equally predictive of the phenotype.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation