High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.
To establish a novel panel of cancer-specific methylated genes for cancer detection and prognostic stratification of early-stage non-small cell lung cancer (NSCLC). Identification of differentially methylated regions (DMR) was performed with bumphunter on "The Cancer Genome Atlas (TCGA)" dataset, and clinical utility was assessed using quantitative methylation-specific PCR assay in multiple sets of primary NSCLC and body fluids that included serum, pleural effusion, and ascites samples. A methylation panel of 6 genes (, and ) was selected from TCGA dataset. Promoter methylation of the gene panel was detected in 92.2% (83/90) of the training cohort with a specificity of 72.0% (18/25) and in 93.0% (40/43) of an independent cohort of stage IA primary NSCLC. In serum samples from the later 43 stage IA subjects and population-matched 42 control subjects, the gene panel yielded a sensitivity of 72.1% (31/41) and specificity of 71.4% (30/42). Similar diagnostic accuracy was observed in pleural effusion and ascites samples. A prognostic risk category based on the methylation status of, and refined the risk stratification for outcomes as an independent prognostic factor for an early-stage disease. Moreover, the paralog group for HOXA9, predominantly overexpressed in subjects with methylation, showed poor outcomes. Promoter methylation of a panel of 6 genes has potential for use as a biomarker for early cancer detection and to predict prognosis at the time of diagnosis. .
Motivation Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. Results We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. Availability http://bioconductor.org/packages/GSEABenchmarkeR Contact ludwig.geistlinger@sph.cuny.edu
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.