Background: Acute Myeloid Leukemia (AML) is a heterogeneous disease that can occur at any age, and current AML classifications do not include age as a factor to classify patients. On the other hand, it has been shown that the incidence of AML increases with age, and that different genetic alterations are present in younger versus older patients. We sought to investigate this question using a k-mer based machine learning RNA-seq analysis.
Methods: We analyzed 423 samples with AML initial diagnosis to highlight the differences between younger and older patients in risk stratification. Our methodology used Extreme Gradient Boosting (XGB) algorithm in transcriptome data considering the difference between patients using clinical information. The SHapley Additive eXplanations (SHAP) values were used to interpret our models and identify the main differences between younger and older patients.
Results: In a test set, our XGB models achieved an area under the curve (AUC) of 0.88 in younger patients and 0.89 in older patients in different risk stratification. Furthermore, we identified a list of differently expressed genes for each age group.
Conclusion: This study highlighted important differences between younger and older patients in risk stratification and identified potential age group specific biomarkers to be investigated.