The identification of biomarker signatures in omics molecular profiling is usually performed to predict outcomes in a precision medicine context, such as patient disease susceptibility, diagnosis, prognosis, and treatment response. To identify these signatures, we have developed a biomarker discovery tool, called BioDiscML. From a collection of samples and their associated characteristics, i.e., the biomarkers (e.g., gene expression, protein levels, clinico-pathological data), BioDiscML exploits various feature selection procedures to produce signatures associated to machine learning models that will predict efficiently a specified outcome. To this purpose, BioDiscML uses a large variety of machine learning algorithms to select the best combination of biomarkers for predicting categorical or continuous outcomes from highly unbalanced datasets. The software has been implemented to automate all machine learning steps, including data pre-processing, feature selection, model selection, and performance evaluation. BioDiscML is delivered as a stand-alone program and is available for download at https://github.com/mickaelleclercq/BioDiscML .
Dormancy and germination vigor are complex traits of primary importance for adaptation and agriculture. Intraspecific variation in cytoplasmic genomes and cytonuclear interactions were previously reported to affect germination in Arabidopsis using novel cytonuclear combinations that disrupt co-adaptation between natural variants of nuclear and cytoplasmic genomes. However, specific aspects of dormancy and germination vigor were not thoroughly explored, nor the parental contributions to the genetic effects. Here, we specifically assessed dormancy, germination performance and longevity of seeds from Arabidopsis plants with natural and new genomic compositions. All three traits were modified by cytonuclear reshuffling. Both depth and release rate of dormancy could be modified by a changing of cytoplasm. Significant changes on dormancy and germination performance due to specific cytonuclear interacting combinations mainly occurred in opposite directions, consistent with the idea that a single physiological consequence of the new genetic combination affected both traits oppositely. However, this was not always the case. Interestingly, the ability of parental accessions to contribute to significant cytonuclear interactions modifying the germination phenotype was different depending on whether they provided the nuclear or cytoplasmic genetic compartment. The observed deleterious effects of novel cytonuclear combinations (in comparison with the nuclear parent) were consistent with a contribution of cytonuclear interactions to germination adaptive phenotypes. More surprisingly, we also observed favorable effects of novel cytonuclear combinations, suggesting suboptimal genetic combinations exist in natural populations for these traits. Reduced sensitivity to exogenous ABA and faster endogenous ABA decay during germination were observed in a novel cytonuclear combination that also exhibited enhanced longevity and better germination performance, compared to its natural nuclear parent. Taken together, our results strongly support that cytoplasmic genomes represent an additional resource of natural variation for breeding seed vigor traits.
Determining which treatment to provide to men with prostate cancer (PCa) is a major challenge for clinicians. Currently, the clinical risk-stratification for PCa is based on clinico-pathological variables such as Gleason grade, stage and prostate specific antigen (PSA) levels. But transcriptomic data have the potential to enable the development of more precise approaches to predict evolution of the disease. However, high quality RNA sequencing (RNA-seq) datasets along with clinical data with long follow-up allowing discovery of biochemical recurrence (BCR) biomarkers are small and rare. In this study, we propose a machine learning approach that is robust to batch effect and enables the discovery of highly predictive signatures despite using small datasets. Gene expression data were extracted from three RNA-Seq datasets cumulating a total of 171 PCa patients. Data were re-analyzed using a unique pipeline to ensure uniformity. Using a machine learning approach, a total of 14 classifiers were tested with various parameters to identify the best model and gene signature to predict BCR. Using a random forest model, we have identified a signature composed of only three genes (JUN, HES4, PPDPF) predicting BCR with better accuracy [74.2%, balanced error rate (BER) = 27%] than the clinico-pathological variables (69.2%, BER = 32%) currently in use to predict PCa evolution. This score is in the range of the studies that predicted BCR in single-cohort with a higher number of patients. We showed that it is possible to merge and analyze different small and heterogeneous datasets altogether to obtain a better signature than if they were analyzed individually, thus reducing the need for very large cohorts. This study demonstrates the feasibility to regroup different small datasets in one larger to identify a predictive genomic signature that would benefit PCa patients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.