Estimating maturity in pome fruits is a critical task that directs virtually all postharvest supply chain decisions. This is especially important for European pear (Pyrus communis) cultivars because losses due to spoilage and senescence must be minimized while ensuring proper ripening capacity is achieved (in part by satisfying a fruit chilling requirement). Reliable methods are lacking for accurate estimation of pear fruit maturity, and because ripening is maturity dependent it makes predicting ripening capacity a challenge. In this study of the European pear cultivar ‘d’Anjou’, we sorted fruit at harvest based upon on-tree fruit position to build contrasts of maturity. Our sorting scheme showed clear contrasts of maturity between canopy positions, yet there was substantial overlap in the distribution of values for the index of absorbance difference (IAD), a non-destructive spectroscopic measurement that has been used as a proxy for pome fruit maturity. This presented an opportunity to explore a contrast of maturity that was more subtle than IAD could differentiate, and thus guided our subsequent transcriptome analysis of tissue samples taken at harvest and during storage. Using a novel approach that tests for condition-specific differences of co-expressed genes, we discovered genes with a phased character that mirrored our sorting scheme. The expression patterns of these genes are associated with fruit quality and ripening differences across the experiment. Functional profiles of these co-expressed genes are concordant with previous findings, and also offer new clues, and thus hypotheses, about genes involved in pear fruit quality, maturity, and ripening. This work may lead to new tools for enhanced postharvest management based on activity of gene co-expression modules, rather than individual genes. Further, our results indicate that modules may have utility within specific windows of time during postharvest management of ‘d’Anjou’ pear.
Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.
Background Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. Results GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. Conclusions Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.