2020
DOI: 10.1128/mbio.01344-20
|View full text |Cite
|
Sign up to set email alerts
|

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions

Abstract: ABSTRACT Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
135
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
2

Relationship

2
8

Authors

Journals

citations
Cited by 88 publications
(138 citation statements)
references
References 88 publications
3
135
0
Order By: Relevance
“…The elastic-net penalty combines the lasso and the ridge penalties, which leads to sparse models with a grouping mechanism: correlated features tend to be selected together [ 46 ]. This approach was recently shown to be efficient in the context of bacterial genome-wide association studies (GWAS), providing increased statistical power for the identification of genotype-phenotype associations and accurate prediction rules [ 47 ]. As we demonstrate in Supplementary Section S9 , however, it remains limited in its ability to provide interpretable predictive signatures, for several reasons.…”
Section: Discussionmentioning
confidence: 99%
“…The elastic-net penalty combines the lasso and the ridge penalties, which leads to sparse models with a grouping mechanism: correlated features tend to be selected together [ 46 ]. This approach was recently shown to be efficient in the context of bacterial genome-wide association studies (GWAS), providing increased statistical power for the identification of genotype-phenotype associations and accurate prediction rules [ 47 ]. As we demonstrate in Supplementary Section S9 , however, it remains limited in its ability to provide interpretable predictive signatures, for several reasons.…”
Section: Discussionmentioning
confidence: 99%
“…The tuning parameter > 0 controls the overall strength of the penalty and we use 10-fold crossvalidation to choose suitable value for . Elastic net approach is implemented in the software 'pyseer' [36,37] focusing on GWAS for bacterial data.…”
Section: Plug-in Lasso Type Estimators For Heritabilitymentioning
confidence: 99%
“…We selected three machine learning algorithms for prediction of antimicrobial resistance from WGS data represented as nucleotide k-mer profiles: extreme gradient boosting (XGB) ( Chen and Guestrin, 2016 ), elastic net regularized logistic regression (ENLR) ( Friedman et al., 2010 ), and set covering machine (SCM) ( Marchand and Shawe-taylor, 2000 ). All selected algorithms were recently reported to perform well on the WGS-AST task ( Aun et al., 2018 ; Nguyen et al., 2018a ; Drouin et al., 2019 ; Ferreira et al., 2020 ; Lees et al., 2020 ).…”
Section: Resultsmentioning
confidence: 99%