Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review

Osama, Sarah; Shaban, Hassan; Ali, Abdelmgeid A.

doi:10.1016/j.eswa.2022.118946

Cited by 38 publications

(22 citation statements)

References 145 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…15 In recent years, researchers have been addressing the task of cancer prediction from high-dimensional microarray datasets using optimization algorithms. 16,17 Rabia Musheer Aziz adopted an approach that focuses on solving optimization problems related to extracting features (genes) using the independent component analysis (ICA) method. These optimized features were then utilized in conjunction with the NB classifier for cancer prediction.…”

Section: Related Workmentioning

confidence: 99%

Improving cancer prediction using feature selection in spark environment

Longkumer,

Hussain Mazumder

2023

Concurrency and Computation

View full text Add to dashboard Cite

Cancer prediction from microarray‐based gene expression data has been subject to much research in recent years. Because of its vast number of features and relatively smaller sample sizes, feature selection becomes necessary for improving classification performance. Additionally, the characteristics of this malignant condition may often vary, providing a significant amount of data that requires additional time and resources to process. This research work proposes an Apache Spark‐based feature selection for microarray cancer classification. The first aim is to select only the optimal and necessary features obtained by the feature selector(information gain [IG] and correlation‐based feature selection [CFS]). Secondly, employ a distributed framework and observe the efficiency of the different feature selectors for classification. Finally, we evaluated our approach in terms of accuracy, precision, recall and ROC (AUC) using three classifiers: support vector machine (SVM), naive Bayes (NB), and decision tree (DT). The results reveal that the NB classifier outperformed in all the cases with IG as a feature selector.

show abstract

Section: Related Workmentioning

confidence: 99%

Improving cancer prediction using feature selection in spark environment

Longkumer,

Hussain Mazumder

2023

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…Therefore, in recent years, feature selection has been discussed as a tool for uncovering potential tumoral biomarkers, allowing reliable diagnosis and prognosis of different cancer types (Grisci et al, 2018, 2019). Numerous works provide complete reviews of feature selection algorithms and their application to gene expression data (Ang et al, 2016; Bolón‐Canedo et al, 2014; Boulesteix et al, 2008; Feltes et al, 2018; Lazar et al, 2012; Osama et al, 2022; Saeys et al, 2007). According to a survey by Osama et al (2022), between 2010 and 2021, the number of publications on gene selection increased by 1.8‐fold, and the citations by 135.5‐fold.…”

Section: Introductionmentioning

confidence: 99%

“…Numerous works provide complete reviews of feature selection algorithms and their application to gene expression data (Ang et al, 2016; Bolón‐Canedo et al, 2014; Boulesteix et al, 2008; Feltes et al, 2018; Lazar et al, 2012; Osama et al, 2022; Saeys et al, 2007). According to a survey by Osama et al (2022), between 2010 and 2021, the number of publications on gene selection increased by 1.8‐fold, and the citations by 135.5‐fold.…”

Section: Introductionmentioning

confidence: 99%

The use of gene expression datasets in feature selection research: 20 years of inherent bias?

Grisci,

Feltes,

de Faria Poloni

et al. 2023

WIREs Data Min & Knowl

View full text Add to dashboard Cite

Feature selection algorithms are frequently employed in preprocessing machine learning pipelines applied to biological data to identify relevant features. The use of feature selection in gene expression studies began at the end of the 1990s with the analysis of human cancer microarray datasets. Since then, gene expression technology has been perfected, the Human Genome Project has been completed, new microarray platforms have been created and discontinued, and RNA‐seq has gradually replaced microarrays. However, most feature selection methods in the last two decades were designed, evaluated, and validated on the same datasets from the microarray technology's infancy. In this review of over 1200 publications regarding feature selection and gene expression, published between 2010 and 2020, we found that 57% of the publications used at least one outdated dataset, 23% used only outdated data, and 32% did not cite data sources. Other issues include referencing databases that are no longer available, the slow adoption of RNA‐seq datasets, and bias toward human cancer data, even for methods designed for a broader scope. In the most popular datasets, some being 23 years old, mislabeled samples, experimental biases, distribution shifts, and the absence of classification challenges are common. These problems are more predominant in publications with computer science backgrounds compared to publications from biology and can lead to inaccurate and misleading biological results.This article is categorized under: Algorithmic Development > Biological Data Mining Technologies > Machine Learning

show abstract

“…[1][2][3][4][5][6][7][8][9] A widely used approach in this field involves leveraging the chemical reactivity of DNA to produce modified or synthetic nucleotides, which can be used to power an array of cutting-edge technologies such as functional nanostructure assembly, molecular motors, logic gates, CRISPR-Cas9 and biosensors. [10][11][12][13][14][15] On a different note, boronate ester chemistry-arising due to the specific reactivity of boronic acid with 1,2-and 1,3-diols-has garnered considerable interest in the field of chemical biology due to its biocompatibility and distinctive reactivity. This has been demonstrated by studies that utilize its oxidative instability, 16 as well as those that develop novel approaches to enhance its stability against oxidation.…”

mentioning

confidence: 99%

“…To derive insights into the thermodynamic parameters of the boronate ester formation with the DNA duplexes, isothermal titration calorimetric (ITC) experiments were carried out employing probe-40 (10) hybridized with its target as a representative sample (Fig. 3).…”

mentioning

confidence: 99%