2022
DOI: 10.37990/medr.1077024
|View full text |Cite
|
Sign up to set email alerts
|

Artificial Intelligence-based Colon Cancer Prediction by Identifying Genomic Biomarkers

Abstract: Colon cancer is the third most common type of cancer worldwide. Because of the poor prognosis and unclear preoperative staging, genetic biomarkers have become more important in the diagnosis and treatment of the disease. In this study, we aimed to determine the biomarker candidate genes for colon cancer and to develop a model that can predict colon cancer based on these genes. Material and Methods: In the study, a dataset containing the expression levels of 2000 genes from 62 different samples (22 healthy and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…Therefore, we considered only metabolites overlapping the two different statistical approaches for further analysis (FDR-corrected p -value < 0.05 and AUC > 0.70). Multivariate analyses were performed using the ROC curve method with biomarker candidate metabolites based on linear support vector machine (SVM) [ 17 ], partial least squares discrimination analysis (PLS-DA) [ 18 ], and random forest (RF) [ 19 ] algorithms. These methods have proved to be robust for high-dimensional data and are widely used for other types of ‘omics’ data analysis.…”
Section: Methodsmentioning
confidence: 99%
“…Therefore, we considered only metabolites overlapping the two different statistical approaches for further analysis (FDR-corrected p -value < 0.05 and AUC > 0.70). Multivariate analyses were performed using the ROC curve method with biomarker candidate metabolites based on linear support vector machine (SVM) [ 17 ], partial least squares discrimination analysis (PLS-DA) [ 18 ], and random forest (RF) [ 19 ] algorithms. These methods have proved to be robust for high-dimensional data and are widely used for other types of ‘omics’ data analysis.…”
Section: Methodsmentioning
confidence: 99%
“…Synthetic Minority Over-sampling Technique for Nominal and Continuous (SMOTE-NC) Gök and Olgun (2021) was used to eliminate the class imbalance problem. Class imbalance problem, when working with real-life data, this problem is highly prevalent and can be defined as a state of imbalance when there are significantly more cases belonging to the majority class than those belonging to the minority class (Paksoy and Yağin, 2022). Because ML techniques like logistic regression might be biased toward the majority class, which causes issues with under or overfitting, balanced data is crucial.…”
Section: Data Preprocessing and Machine Learning Approachmentioning
confidence: 99%
“…An algorithm based on decision-tree (DT) and gradient-boosting (GB), XGBoost is a faster running algorithm compared to GB algorithms, with different regularization penalties to avoid overfitting [18]. NB is an algorithm based on conditional probability, which is assumed to be equal and independent from each other in the classification of all attributes based on conditional probability [19].…”
Section: Machine Learning Approachmentioning
confidence: 99%