Introduction. The resistance rate of
Klebsiella pneumoniae
(
K. pneumoniae
) to imipenem is increasing year by year, and the imipenem resistance mechanism of
K. pneumoniae
is complex. Therefore, it is urgent to develop new strategies to explore the resistance mechanism of imipenem for its effective and accurate use in clinical practice.
Hypothesis/Gap sStatement. Machine learning could identify resistance features and biological process that influence microbial resistance from whole-genome sequencing (WGS) data.
Aims. This work aimed to predict imipenem resistance genetic features in
K. pneumoniae
from whole-genome k-mer features, and analyse their function for understanding its resistance mechanism.
Methods. This study analysed WGS data of
K. pneumoniae
combined with resistance phenotype for imipenem, and established
K. pneumoniae
to imipenem genotype-phenotype model to predict resistance features using chi-squared test and random forest. An external clinical dataset was used to verify prediction power of resistance features. The potential genes were identified through alignment the resistance features with the
K. pneumoniae
reference genome using blastn, the functions of potential genes were further analysed to explore its resistance-related signalling pathways with GO and KEGG analysis, the resistance sequence patterns were screened using streme software. Finally, the resistance features were combined and modelled through four machine-learning algorithms (logistic regression, SVM, GBDT and XGBoost) to evaluate their phenotype prediction ability.
Results. A total of 16 670 imipenem resistance features were predicted from genotype-phenotype model. The 30 potential genes were identified by annotating the resistance features and corresponded to known antibiotic-related genes (mdtM, dedA, rne, etc.). GO and KEGG pathway analyses indicated the possible association of imipenem resistance with metabolism process and cell membrane. CRYCAGCDN and CGRDAAAN were found from the imipenem resistance features, which were widely presented in the reported β-lactam resistance genes (bla
SHV, bla
CTX-M, bla
TEM, etc.), and YCYAGCMCAST with metabolic functions (organic substance metabolic process, nitrogen compound metabolic process and cellular metabolic process) was identified from the top 50 resistance features. The 25 resistance genes in the training dataset included 19 genes in the external dataset, which verified the accuracy of prediction. The area under curve values of logistics regression, SVM, GBDT and XGBoost were 0.965, 0.966, 0.969 and 0.969, respectively, indicating that the imipenem resistance features have a strong prediction power.
Conclusion. Machine-learning methods could effectively predict the imipenem resistance feature in
K. pneumoniae
, and provide resistance sequence profiles for predicting resistance phenotype and exploring potential resistance mechanisms. It provides an important insight into the potential therapeutic strategies of
K. pneumoniae
resistance to imipenem, and speed up the application of machine learning in routine diagnosis.