Development of single cell sequencing allows detailing the transcriptome of individual oocytes. Here, we compare different RNA-seq datasets from single and pooled mouse oocytes and show higher reproducibility using single oocyte RNA-seq. We further demonstrate that UMI (unique molecular identifiers) based and other deduplication methods are limited in their ability to improve the precision of these datasets. Finally, for normalization of sample differences in cross-stage comparisons, we propose that external spike-in molecules are comparable to using the endogenous genes stably expressed during oocyte maturation. The ability to normalize data among single cells provides insight into the heterogeneity of mouse oocytes.
KeywordsMouse single oocyte RNA-seq; cross-stage normalization; oocyte heterogeneity (GV) stage. ConstGenes or ERCC normalization should become standards for cross-stage comparison in future single oocyte sequencing studies.
Results and discussionSingle oocyte RNA-seq has high reproducibility To estimate the reproducibility of mouse oocyte RNA-seq, we obtained several published datasets from single or pooled oocytes at GV (geminal vesicle) and MII (meiosis II) stages which are the beginning and ending of oocyte maturation, respectively [6][7][8][9][10]. We processed all raw sequencing files in parallel. Due to the different distribution of transcripts from all the samples, we only took the coding reads for comparison ( Fig. 1a; Table S1). Interestingly, single oocyte RNA-seq (GSE141190, GSE96638, GSE44183) had higher correlation within the group than when compared to pooled oocytes of the same stage (Fig. 1b). In addition, MII stages exhibit higher deviation of poly(A) vs RiboMinus sequencing results, possibly due to the existence of dormant RNAs and hyper polyadenylated mRNAs being actively translated (Fig. 1b) [11]. By principal component clustering, single oocyte groups show higher similarity, though experiments/methods dominate the difference (Fig. 1c-d). Thus, we concluded that single oocyte sequencing can generate highly reproducible and consistent results.
Deduplication improved the single oocyte RNA-seq limitedlySingle cell RNA-seq is susceptible to a range of biases, including gene capture, reverse transcription and cDNA amplification [12]. The incorporation of UMI (unique molecular identifiers) significantly improves single cell sequencing reproducibility by quantifying reads with more precision [3]. To test whether UMI also benefits single oocyte RNA-seq, we re-analyzed the GSE141190 RiboMinus RNA-seq results using UMI quantification ( Fig. 2a-b) which determines duplicates by both UMI and insert reads. The N8 UMI, capable of distinguishing up to 65,536 molecules, is sufficient to distinguish the ~20,000 different RNAs expressed in mouse oocytes [3,7]. On average, the number of reads of the Dedup (UMI-based) samples were 43%±15% of their Original (without deduplication) samples ( Fig. 2c-d; Table S2). After filtering out low-count genes, the linear regression of gene counts in Original a...