Introduction: Apply machine learning models to identify new biomarkers associated with the early diagnosis and prognosis of SARS-CoV-2 infection, aiming to prevent long COVID.Material and methods: Plasma and serum samples from COVID-19 patients (mild, moderate, and severe), patients with other pneumonias (but with negative COVID-19 RT-PCR) and from healthy volunteers (control), from hospitals in four different countries (China, Spain, France, and Italy) were analyzed by GC-MS, LC -MS and NMR. Machine learning models (PCA and PLS-DA) were developed for predicting the diagnosis and prognosis of COVID-19 and identifying biomarkers associated with these outcomes.Results. A total of 1410 patient samples were analyzed. In all analyzed data, the PLS-DA model presented a diagnostic and prognostic accuracy of around 95%. A total of 23 biomarkers (e.g. spermidine, taurine, L-aspartic, L-glutamic, L-phenylalanine and xanthine, ornithine and ribothimidine) have been identi ed as being associated with the diagnosis and prognosis of COVID-19. Additionally, we also identi ed for the rst time six new biomarkers (N-Acetyl-4-O-acetylneuraminic acid, N-Acetyl-L-Alanine, N-Acetyltriptophan, palmitoylcarnitine and glycerol 1-myristate) that are also associated with the severity and diagnosis of COVID-19. These six new biomarkers were elevated in severe COVID-19 patients when compared to patients with mild disease or healthy volunteers.
Conclusion:The PLS-DA model was able to miss the diagnosis and prognosis of COVID-19 around 95%. We also identi ed six new biomarkers that were increased in plasma and serum of COVID-19 patients (N-Acetyl-4-O-acetylneuraminic acid, N-Acetyl-L-Alanine, N-Acetyltriptophan, palmitoylcarnitine and glycerol 1-myristate) and should be deeply evaluated as prognostic and diagnostic indicators of COVID-19.