Background
RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering has been widely used to classify DEGs with similar expression patterns, but rarely used to identify DEGs themselves. We recently reported that the clustering-based method (called MBCdeg) for identifying DEGs has great potential. However, a thorough investigation of its feasibility is still needed.
Results
We compared a total of six competing methods: three conventional R packages (edgeR, DESeq2, and TCC) and three versions of MBCdeg (denoted as MBCdeg1, 2, and 3) corresponding to three different normalization algorithms. Different scenarios of simulated data generated from three R packages (TCC, compcodeR, and PROPER) were mainly evaluated based on the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the modified version of MBCdeg2 performed well for many scenarios on simulated data. However, MBCdeg showed very unstable results between trials for identical real data.
Conclusions
The current MBCdeg2 shows excellent performance in simulation analysis, but not at a practical level for real data. Since further improvements are needed for MBCdeg to reach a practical level, we cannot recommend the use of the current MBCdeg. Our report suggests not only the need for method developers to carefully describe their shortcomings, but also the need for the reader to critically decipher the conclusions reached under what conditions.