Identifying genes with the largest expression changes (gene selection) to characterize a given condition is a popular first step to drive exploration into molecular mechanisms and is, therefore, paramount for therapeutic development. Reproducibility in the sciences makes it necessary to emphasize objectivity and systematic repeatability in biological and informatics analyses, including gene selection. With these two characteristics in mind, in previous works our research team has proposed using multiple criteria optimization (MCO) in gene selection to analyze microarray datasets. The result of this effort is the MCO algorithm, which selects genes with the largest expression changes without user manipulation of neither informatics nor statistical parameters. Furthermore, the user is not required to choose either a preference structure among multiple measures or a predetermined quantity of genes to be deemed significant a priori. This implies that using the same datasets and performance measures (PMs), the method will converge to the same set of selected differentially expressed genes (repeatability) despite who carries out the analysis (objectivity). The present work describes the development of an open-source tool in RStudio to enable both: (1) individual analysis of single datasets with two or three PMs and (2) meta-analysis with up to five microarray datasets, using one PM from each dataset. The capabilities afforded by the code include license-free portability and the possibility to carry out analyses via modest computer hardware, such as personal laptops. The code provides affordable, repeatable, and objective detection of differentially expressed genes from microarrays. It can be used to analyze other experiments with similar experimental comparative layouts, such as microRNA arrays and protein arrays, among others. As a demonstration of the capabilities of the code, the analysis of four publicly-available microarray datasets related to Parkinson´s Disease (PD) is presented here, treating each dataset individually or as a four-way meta-analysis. These MCO-supported analyses made it possible to identify MMP9 and TUBB2A as potential PD genetic biomarkers based on their persistent appearance across each of the case studies. A literature search confirmed the importance of these genes in PD and indeed as PD biomarkers, which evidences the code´s potential.
Gene interactions play a fundamental role in the proneness to cancer. However, detect- ing and ranking these interactions is a complex problem due to the high dimensionality of genomic data. Hence, we aim to find patterns composed of multiple features to molecularly characterize breast cancer subtypes from the integration of different omics datasets using a data mining approach. To retrieve biological understanding from these computational results, we developed IBIF-RF (Importance Between Interactive Features using Random Forest), a new metric capable of assessing and holistically ranking the importance of genomic interactions without any prior knowledge of key feature combinations. A set of 247 top-performing features from transcriptomic, proteomic, methylation, and clinical data were used to investigate interactive patterns to classify breast cancer subtypes us- ing over 1150 samples. IBIF-RF metric allowed the extraction of 154312, 190481, and 463917 combinations of variables for TCGA, GSE20685, and GSE21653 datasets. Single genes, MLPH and FOXA1, were the most frequently identified variables across all datasets followed by some two-gene interactions such as CEP55-FOXA1 and FOXC1-THSD4. More- over, IBIF-RF metric allowed the definition of two sets of genes frequently found together (1: FOXA1, MLPH, and SIDT1, and 2: CEP55, ASPM, CENPL, AURKA, ESPL1, TTK, UBE2T, NCAPG, GMPS, NDC80, MYBL2, KIF18B, and EXO1).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.