Alpha-1 antitrypsin deficiency associated liver disease (AATD-LD) is a rare genetic disorder and not well-recognized. Predicting the clinical outcomes of AATD-LD and defining patients more likely to progress to advanced liver disease are crucial for better understanding AATD-LD progression and promoting timely medical intervention. We aimed to develop a tailored machine learning (ML) model to predict the disease progression of AATD-LD. This analysis was conducted through a stacking ensemble learning model by combining five different ML algorithms with 58 predictor variables using nested five-fold cross-validation with repetitions based on the UK Biobank data. Performance of the model was assessed through prediction accuracy, area under the receiver operating characteristic (AUROC), and area under the precision-recall curve (AUPRC). The importance of predictor contributions was evaluated through a feature importance permutation method. The proposed stacking ensemble ML model showed clinically meaningful accuracy and appeared superior to any single ML algorithms in the ensemble, e.g., the AUROC for AATD-LD was 68.1%, 75.9%, 91.2%, and 67.7% for all-cause mortality, liver-related death, liver transplant, and all-cause mortality or liver transplant, respectively. This work supports the use of ML to address the unanswered clinical questions with clinically meaningful accuracy using real-world data.
With many unknowns in alpha-1-antitrypsin deficiency-associated liver disease (AATD-LD), we aimed to develop a tailored stacking ensemble supervised machine learning (ML) model to predict the disease progression and clinical outcomes of AATD-LD to enable the data-driven decision-making for the clinical outcome endpoints selection as well as clinical development strategy. This analysis was carried out through a stacking ensemble learning model via meta-learning by combining five different supervised ML algorithms with 58 potential predictor variables using a nested 5-fold cross-validation with repetitions based on the UK Biobank data. Performance of the model was assessed through prediction accuracy, area under the receiver operating characteristic (AUROC), and area under the precision-recall curve (AUPRC). The importance of predictor contributions was evaluated through a feature importance permutation method. For example, the AUROC of the prediction model in patients with AATD-LD was 68.1%, 75.9%, 91.2%, and 67.7% for all-cause mortality, liver-related death, liver transplant, and all-cause mortality or liver transplant, respectively. The generalizable predictive patterns support the use of ML to address the unanswered clinical questions with clinically meaningful accuracy using real-world data. This method can be easily applied to other clinical outcomes and/or diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.