Introduction: Individuals with type 2 diabetes (T2DM) or the APOL1 high-risk genotype (APOL1) are at increased risk of rapid kidney function decline (RKFD) as compared to the general population. Plasma biomarkers representing inflammatory and kidney injury pathways have been validated as predictive of kidney disease progression in several studies. In addition, routine clinical data in the electronic health record (EHR) may also be utilized for predictive purposes. The application of machine learning to integrate biomarkers with clinical data may lead to improved identification of RKFD.
Methods:We selected two subpopulations of high-risk individuals: T2DM (n=871) and APOL1 high risk genotype of African Ancestry (n=498), with a baseline eGFR ≥ 45 ml/min/1.73 m 2 from the Mount Sinai BioMe Biobank. Plasma levels of tumor necrosis factor 1/2 (TNFR1/2), and kidney injury molecule-1 (KIM-1) were measured and a series of supervised machine learning approaches including random forest (RF) were employed to combine the biomarker data with longitudinal clinical variables. The primary objective was to accurately predict RKFD (eGFR decline of ≥ 5 ml/min/1.73 m 2 /year) based on an algorithm-produced score and probability cutoffs, with results compared to standard of care.
Results:In 871 participants with T2DM, the mean age was 61 years, baseline estimated glomerular filtration rate (eGFR) was 74 ml/min/1.73 m 2 , and median UACR was 13 mg/g. The median follow-up was 4.7 years from the baseline specimen collection with additional retrospective data available for a median of 2.3 years prior to plasma collection. In the 498 African Ancestry patients with high-risk APOL1 genotype, the median age was 56 years, median baseline eGFR was 83 ml/min/1.73 m 2 ,and median UACR was 11 mg/g. The median follow-up was 4.7 years and there was additional retrospective data available for 3.1 years prior to plasma collection. Overall, 19% with T2DM, and 9% of the APOL1 high-risk genotype experienced RKFD. After evaluation of three supervised algorithms: random forest (RF), support vector machine (SVM), and Cox survival, the RF model was selected. In the training and test sets respectively, the RF model had an AUC of 0.82 (95% CI, 0.81-0.83) and 0.80 (95% CI, 0.78-0.82) in T2DM, and an AUC of 0.85 (95% CI, 0.84-0.87) and 0.80 (95% CI, 0.73-0.86) for the APOL1 high-risk group. The combined RF model outperformed standard clinical variables in both patient populations. Discrimination was comparable in two sensitivity analyses: 1) Using only data from ≤ 1 year prior to baseline biomarker measurement and 2) In individuals with eGFR ≤ 60 and/or albuminuria at baseline. The distribution of RFKD probability varied in the two populations. In patients with T2DM, the RKFD score stratified 18%, 49%, and 33% of patients to high-, intermediate-, and lowprobability strata, respectively, with a PPV of 53% in the high-probability group and an NPV of 97% in the lowprobability group. By comparison, in the APOL1 high-risk genotype, the RKFD score stratified 7%, 23%,...