Objectives: We aimed to identify and validate a minimum panel of important features for allergic diseases by machine-learning and deep-learning algorithms in school children aged 6-14 years.
Methods: We performed a cross-sectional survey in 8 primary schools and 16 middle schools using a clustering sample strategy. Features were gleaned by questionnaires. Machine/deep learning algorithms were implemented using the IDE PyCharm and Python (v3.7.6).
Results: Of 11308 eligible children, 4375 had allergic diseases. The prevalence of asthma, allergic rhinitis and eczema was 6.31% (713/11308), 25.36% (2868/11308) and 21.38% (2418/11308), respectively. Out of 12 machine-learning algorithms, Gaussian naive Bayes (NB) was the optimal for asthma, Bernoulli NB for rhinitis and multinomial NB for eczema. By comparison, a minimum panel of six, five and five important features was ascertained for asthma (episodes of upper and lower respiratory infection, age, gender, family history of diabetes and dental caries), rhinitis (episodes of upper respiratory infection, age, gender, maternal education and family history of diabetes) and eczema (episodes of upper respiratory infection, age, maternal education, outdoor activities and dental caries), respectively. The prediction performance of these features was further validated by deep-learning sequential model, with accuracy reaching 94.01%, 75.51% and 78.29% for asthma, rhinitis and eczema, respectively.
Conclusions: We identified three minimum panels of important features that can capture the majority of information in the whole set and accurately predict the risk of asthma, rhinitis and eczema in children aged 6-14 years of age.