BackgroundPrevention and treatment of liver fibrosis at an early stage is of great prognostic importance, whereas changes in liver stiffness are often overlooked in patients before the onset of obvious clinical symptoms. Recognition of liver fibrosis at an early stage is therefore essential.ObjectiveAn XGBoost machine learning model was constructed to predict participants' liver stiffness measures (LSM) from general characteristic information, blood test metrics and insulin resistance-related indexes, and to compare the fit efficacy of different datasets for LSM.MethodsAll data were obtained from the National Health and Nutrition Examination Survey (NHANES) for the time interval January 2017 to March 2020. Participants' general characteristics, Liver Ultrasound Transient Elastography (LUTE) information, indicators of blood tests and insulin resistance-related indexes were collected, including homeostasis model assessment of insulin resistance (HOMA-IR) and metabolic score for insulin resistance (METS-IR). Three datasets were generated based on the above information, respectively named dataset A (without the insulin resistance-related indexes as predictor variables), dataset B (with METS-IR as a predictor variable) and dataset C (with HOMA-IR as a predictor variable). XGBoost regression was used in the three datasets to construct machine learning models to predict LSM in participants. A random split was used to divide all participants included in the study into training and validation cohorts in a 3:1 ratio, and models were developed in the training cohort and validated with the validation cohort.ResultsA total of 3,564 participants were included in this study, 2,376 in the training cohort and 1,188 in the validation cohort, and all information was not statistically significantly different between the two cohorts (p > 0.05). In the training cohort, datasets A and B both had better predictive efficacy than dataset C for participants' LSM, with dataset B having the best fitting efficacy [±1.96 standard error (SD), (-1.49,1.48) kPa], which was similarly validated in the validation cohort [±1.96 SD, (-1.56,1.56) kPa].ConclusionsXGBoost machine learning models built from general characteristic information and clinically accessible blood test indicators are practicable for predicting LSM in participants, and a dataset that included METS-IR as a predictor variable would improve the accuracy and stability of the models.