Background RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering is used to classify DEGs with similar expression patterns for the subsequent analyses of data from experiments such as time-courses or multi-group comparisons. However, gene clustering has rarely been used for analyzing simple two-group data or differential expression (DE). In this study, we report that a model-based clustering algorithm implemented in an R package, MBCluster.Seq, can also be used for DE analysis. Results The input data originally used by MBCluster.Seq is DEGs, and the proposed method (called MBCdeg) uses all genes for the analysis. The method uses posterior probabilities of genes assigned to a cluster displaying non-DEG pattern for overall gene ranking. We compared the performance of MBCdeg with conventional R packages such as edgeR, DESeq2, and TCC that are specialized for DE analysis using simulated and real data. Our results showed that MBCdeg outperformed other methods when the proportion of DEG (PDEG) was less than 50%. However, the DEG identification using MBCdeg was less consistent than with conventional methods. We compared the effects of different normalization algorithms using MBCdeg, and performed an analysis using MBCdeg in combination with a robust normalization algorithm (called DEGES) that was not implemented in MBCluster.Seq. The new analysis method showed greater stability than using the original MBCdeg with the default normalization algorithm. Conclusions MBCdeg with DEGES normalization can be used in the identification of DEGs when the PDEG is relatively low. As the method is based on gene clustering, the DE result includes information on which expression pattern the gene belongs to. The new method may be useful for the analysis of time-course and multi-group data, where the classification of expression patterns is often required.
Spinach (Spinacia oleracea L.) is a dioecious plant with male heterogametic sex determination and homomorphic sex chromosomes (XY). The dioecism is utilized for producing commercial hybrid seeds, and hence understanding the molecular-genetic basis of the species' sex determining locus is an important issue for spinach breeding. In this study, seven dominant DNA markers were shown to completely co-segregate with the male-determining gene in segregating spinach populations comprising > 1500 plants. In addition, these seven dominant DNA markers were completely associated with the male-determining gene in over 100 spinach germplasm accessions and cultivars. These observations suggest that, in spinach, a Y-chromosomal region around the male-determining locus does not (or almost not) recombine with a counterpart region on the X chromosome. Using five of the seven DNA markers, five bacterial artificial chromosome (BAC) clone contigs with a total length of approximately 690 kbp were constructed. Full sequencing of six representative BAC clones (total insert length 504 kbp) from the five contigs and a transcriptome analysis by RNA-seq revealed that the Y-chromosomal region around the male-determining locus contains large amounts of repetitive elements, suggesting that the region might be poor in gene content. Most of the repeats found in this region are novel Ty1-copia-like and its derivative elements that accumulate predominantly in heterochromatic regions. Our findings may provide valuable insight into spinach genome structure and clues for future research into the evolution of the sex determining locus.
Empirical Bayes is a choice framework for differential expression (DE) analysis for multi-group RNA-seq count data. Its characteristic ability to compute posterior probabilities for predefined expression patterns allows users to assign the pattern with the highest value to the gene under consideration. However, current Bayesian methods such as baySeq and EBSeq can be improved, especially with respect to normalization. Two R packages (baySeq and EBSeq) with their default normalization settings and with other normalization methods (MRN and TCC) were compared using three-group simulation data and real count data. Our findings were as follows: (1) the Bayesian methods coupled with TCC normalization performed comparably or better than those with the default normalization settings under various simulation scenarios, (2) default DE pipelines provided in TCC that implements a generalized linear model framework was still superior to the Bayesian methods with TCC normalization when overall degree of DE was evaluated, and (3) baySeq with TCC was robust against different choices of possible expression patterns. In practice, we recommend using the default DE pipeline provided in TCC for obtaining overall gene ranking and then using the baySeq with TCC normalization for assigning the most plausible expression patterns to individual genes.
Background RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering is used to classify DEGs with similar expression patterns for the subsequent analyses of data from experiments such as time-courses or multi-group comparisons. However, gene clustering has rarely been used for analyzing simple two-group data or differential expression (DE). In this study, we report a model-based clustering algorithm, MBCluster.Seq, that can be implemented using an R package for DE analysis.Results The input data originally used by MBCluster.Seq is DEGs, and the proposed method (called MBCdeg) uses all genes for the analysis. The method uses posterior probabilities of genes assigned to a cluster displaying non-DEG pattern for overall gene ranking. We compared the performance of MBCdeg with conventional R packages such as edgeR, DESeq2, and TCC that are specialized for DE analysis using simulated and real data. Our results showed that MBCdeg outperformed other methods when the proportion of DEG was less than 50%. However, the DEG identification using MBCdeg was less consistent than with conventional methods. We compared the effects of different normalization algorithms using MBCdeg, and performed an analysis using MBCdeg in combination with a robust normalization algorithm (called DEGES) that was not implemented in MBCluster.Seq. The new analysis method showed greater stability than using the original MBCdeg with the default normalization algorithm.Conclusions MBCdeg with DEGES normalization can be used in the identification of DEGs when the PDEG is relatively low. As the method is based on gene clustering, the DE result includes information on which expression pattern the gene belongs to. The new method may be useful for the analysis of time-course and multi-group data, where the classification of expression patterns is often required.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.