High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed.
BackgroundEarly application of second-generation sequencing technologies to transcript quantitation (RNA-seq) has hinted at a vast mammalian transcriptome, including transcripts from nearly all known genes, which might be fully measured only by ultradeep sequencing. Subsequent studies suggested that low-abundance transcripts might be the result of technical or biological noise rather than active transcripts; moreover, most RNA-seq experiments did not provide enough read depth to generate high-confidence estimates of gene expression for low-abundance transcripts. As a result, the community adopted several heuristics for RNA-seq analysis, most notably an arbitrary expression threshold of 0.3 - 1 FPKM for downstream analysis. However, advances in RNA-seq library preparation, sequencing technology, and informatic analysis have addressed many of the systemic sources of uncertainty and undermined the assumptions that drove the adoption of these heuristics. We provide an updated view of the accuracy and efficiency of RNA-seq experiments, using genomic data from large-scale studies like the ENCODE project to provide orthogonal information against which to validate our conclusions.ResultsWe show that a human cell’s transcriptome can be divided into active genes carrying out the work of the cell and other genes that are likely the by-products of biological or experimental noise. We use ENCODE data on chromatin state to show that ultralow-expression genes are predominantly associated with repressed chromatin; we provide a novel normalization metric, zFPKM, that identifies the threshold between active and background gene expression; and we show that this threshold is robust to experimental and analytical variations.ConclusionsThe zFPKM normalization method accurately separates the biologically relevant genes in a cell, which are associated with active promoters, from the ultralow-expression noisy genes that have repressed promoters. A read depth of twenty to thirty million mapped reads allows high-confidence quantitation of genes expressed at this threshold, providing important guidance for the design of RNA-seq studies of gene expression. Moreover, we offer an example for using extensive ENCODE chromatin state information to validate RNA-seq analysis pipelines.
Memory T cells are primed for rapid responses to antigen; however, the molecular mechanisms responsible for priming remain incompletely defined. CpG methylation in promoters is an epigenetic modification, which regulates gene transcription. Using targeted bisulfite sequencing, we examined methylation of 2100 genes (56,000 CpG) mapped by deep sequencing of T cell activation in human naïve and memory CD4 T cells. 466 CpGs (132 genes) displayed differential methylation between naïve and memory cells. 21 genes exhibited both differential methylation and gene expression before activation, linking promoter DNA methylation states to gene regulation; 6 of 21 genes encode proteins closely studied in T cells, while 15 genes represent novel targets for further study. 84 genes demonstrated differential methylation between memory and naïve cells that correlated to differential gene expression following activation, of which 39 exhibited reduced methylation in memory cells coupled with increased gene expression upon activation compared to naïve cells. These reveal a class of primed genes more rapidly expressed in memory compared to naïve cells and putatively regulated by DNA methylation. These findings define a DNA methylation signature unique to memory CD4 T cells that correlates with activation-induced gene expression.
Cytosine methylation of DNA CpG dinucleotides in gene promoters is an epigenetic modification that regulates gene transcription. While many methods exist to interrogate methylation states, few current methods offer large-scale, targeted, single CpG resolution. We report an approach combining bisulfite treatment followed by microdroplet PCR with next-generation sequencing to assay the methylation state of 50 genes in the regions 1 kb upstream of and downstream from their transcription start sites. This method yielded 96% coverage of the targeted CpGs and demonstrated high correlation between CpG island (CGI) DNA methylation and transcriptional regulation. The method was scaled to interrogate the methylation status of 77,674 CpGs in the promoter regions of 2100 genes in primary CD4 T cells. The 2100 gene library yielded 97% coverage of all targeted CpGs and 99% of the target amplicons.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.