Objectives The present study aimed to develop a random forest (RF) based prediction model for hyperuricemia (HUA) and estimate associated risk factors. Methods This cross-sectional study recruited 91,690 participants (52,607 males, 39,083 females). The prediction models were derived from training sets using RF learning machine. Performances of the prediction model were evaluated in validation datasets. Significant indicators were produced after comparing between true positive set and true negative set. Odds ratio was calculated by binary logistic regression models. Results The area under the receiver-operating curve was 0.732 in males and 0.837 in females in the RF prediction models. The sensitivity, specificity and negative predictive value of the models were 0.686, 0.656 and 0.882 in males, 0.786, 0.738 and 0.978 in females, respectively. According to the feature value of each index in RF, a total of 10 explanatory variables were selected for each gender. Triglyceride, creatinine, body mass index, waist circumference, alanine transaminase, age, weight and total cholesterol were high-risk factors for HUA in both genders. Conclusion RF demonstrated good stability and strong predictive power in predicting HUA in Chinese population. People with high risk factors should be encouraged to actively control the above factors to reduce the probability of developing HUA.
Attribute‐oriented induction (AOI) is a data analysis technique based on induction. The traditional AOI algorithm requires a threshold given by users to determine the number of output tuples. However, it is not easy to set an appropriate tuple threshold, and there is usually noise contained in a dataset. The traditional AOI algorithm can only generate a summary output of a fixed size, but it cannot guarantee that all generalized tuples have sufficient specificity and representativeness. In this article, a new AOI method is proposed to make up for the shortcomings. We introduce the concept of cost to measure the loss of accuracy due to attribute ascension. We also propose two algorithms based on the hierarchical clustering method. By setting cost constraints on each generalized tuple, our method can generate accurate output while eliminating noise, and help users get more informative and clearer results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.