Background
Systemic juvenile idiopathic arthritis (SJIA) is a form of childhood arthritis with clinical features such as fever, lymphadenopathy, arthritis, rash, and serositis. It seriously affects the growth and development of children and has a high rate of disability and mortality. SJIA may result from genetic, infectious, or autoimmune factors since the precise source of the disease is unknown. Our study aims to develop a genetic-based diagnostic model to explore the identification of SJIA at the genetic level.
Methods
The gene expression dataset of peripheral blood mononuclear cell (PBMC) samples from SJIA was collected from the Gene Expression Omnibus (GEO) database. Then, three GEO datasets (GSE11907-GPL96, GSE8650-GPL96 and GSE13501) were merged and used as a training dataset, which included 125 SJIA samples and 92 health samples. GSE7753 was used as a validation dataset. The limma method was used to screen differentially expressed genes (DEGs). Feature selection was performed using Lasso, random forest (RF)-recursive feature elimination (RFE) and RF classifier.
Results
We finally identified 4 key genes (ALDH1A1, CEACAM1, YBX3 and SLC6A8) that were essential to distinguish SJIA from healthy samples. And we combined the 4 key genes and performed a grid search as well as 10-fold cross-validation with 5 repetitions to finally identify the RF model with optimal mtry. The mean area under the curve (AUC) value for 5-fold cross-validation was greater than 0.95. The model’s performance was then assessed once more using the validation dataset, and an AUC value of 0.990 was obtained. All of the above AUC values demonstrated the strong robustness of the SJIA diagnostic model.
Conclusions
We successfully developed a new SJIA diagnostic model that can be used for a novel aid in the identification of SJIA. In addition, the identification of 4 key genes that may serve as potential biomarkers for SJIA provides new insights to further understand the mechanisms of SJIA.