2022
DOI: 10.1038/s41598-022-14048-6
|View full text |Cite
|
Sign up to set email alerts
|

Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods

Abstract: Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expression signature and disease severity, we thereby aim to assess whether genes can help to diagnose UC and predict its correlation with immune regulation. A total of ten eligible microarrays (including 387 UC patients… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 54 publications
0
11
0
Order By: Relevance
“…In this study, we set the random seed at 234 with the set.seed function to separate the training and testing sets, and the microarrays were randomly divided into the training and testing sets in a ratio of 8:2. In fact, we found that the segmentation of the training and testing data has different division standards in the different articles, such as 90:10 [ 48 ], 85:15 [ 49 ], 80:20, 70:30 [ 50 ], 60:40 [ 51 ], and so on. We choose 8:2 for three reasons: first, it is supported by the reported literature; second, considering that the number of samples is not large enough, we wanted to make the training data for developing the model as large as possible; third, the ratio of the number of DILI samples and control samples is about 80:20.…”
Section: Discussionmentioning
confidence: 99%
“…In this study, we set the random seed at 234 with the set.seed function to separate the training and testing sets, and the microarrays were randomly divided into the training and testing sets in a ratio of 8:2. In fact, we found that the segmentation of the training and testing data has different division standards in the different articles, such as 90:10 [ 48 ], 85:15 [ 49 ], 80:20, 70:30 [ 50 ], 60:40 [ 51 ], and so on. We choose 8:2 for three reasons: first, it is supported by the reported literature; second, considering that the number of samples is not large enough, we wanted to make the training data for developing the model as large as possible; third, the ratio of the number of DILI samples and control samples is about 80:20.…”
Section: Discussionmentioning
confidence: 99%
“…Many researchers (16)(17)(18) only focus on single or two MLs which might ignore their potential shortcomings. In our previous research (19), five MLs show different weights even with the same genes. So just intersecting the top N genes may unconsciously delete some dominant genes (20)(21)(22)(23).…”
Section: Introductionmentioning
confidence: 86%
“…So just intersecting the top N genes may unconsciously delete some dominant genes (20)(21)(22)(23). And ignoring the weights of genes may result in an imbalance of filtration (19,24).…”
Section: Introductionmentioning
confidence: 99%
“…For the same genes, limma (version 3.54.0) was employed to identify the DEGs with the average gene expression. According to the Benjamini and Hochberg method, two thresholds were established: an absolute value of fold change (|logFC|) >0.7 (previous studies were 0.5 [29]-1 [23]) and a false discovery rate [30] <0.05.…”
Section: Methodsmentioning
confidence: 99%