We introduce Salmon, a method for quantifying transcript abundance from RNA-seq reads that is accurate and fast. Salmon is the first transcriptome-wide quantifier to correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure.
Chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types and conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally-relevant domains may exist at multiple length scales. We introduce a new and efficient algorithm that is able to capture persistent domains across various resolutions by adjusting a single scale parameter. The ensemble of domains we identify allows us to quantify the degree to which the domain structure is hierarchical as opposed to overlapping, and our analysis reveals a pronounced hierarchical structure in which larger stable domains tend to completely contain smaller domains. The identified novel domains are substantially different from domains reported previously and are highly enriched for insulating factor CTCF binding and histone marks at the boundaries.
Existing methods for quantifying transcript abundance require a fundamental compromise: either use high quality read alignments and experiment-specific models or sacrifice them for speed. We introduce Salmon, a quantification method that overcomes this restriction by combining a novel 'lightweight' alignment procedure with a streaming parallel inference algorithm and a feature-rich bias model. These innovations yield both exceptional accuracy and order-of-magnitude speed benefits over traditional alignment-based methods.Estimating transcript abundance across cell types, species, and conditions is a fundamental task in genomics. For example, these estimates are used for the classification of diseases and their subtypes [1], for understanding expression changes during development [2], and tracking the progression of cancer [3]. Efficient quantification of transcript abundance from RNA-seq data is an especially pressing problem due to the exponentially increasing number of experiments and the * rob.growing adoption of expression data for medical diagnosis [4]. However, various methods that address this problem achieve accurate results at the cost of requiring significant computational resources and do not scale well with the rate at which data is produced [5]. The recently developed quantification tool Sailfish [6] achieves an order of magnitude speed improvement over previous approaches, but Sailfish can sometimes produce slightly less accurate estimates for paired-end data or for stranded protocols and does not take advantage of high quality alignment information and experiment-specific models.We introduce a quantification procedure, called Salmon ( Supplementary Fig. 1), that achieves best-in-class accuracy, takes advantage of high quality alignment information and experiment-specific models and provides the same order-of-magnitude speed benefits as Sailfish.Using synthetic data from both the RSEM simulator [7] and the Flux Simulator [8] as well as experimental quantitative PCR data [9], we show that Salmon generally outperforms Sailfish and eXpress [10] with respect to accuracy ( Fig. 1a-b,e; Supplementary Tables 1&2) and is also faster than Sailfish (Fig. 1c). The transcript abundance estimation problem is particularly difficult for genes with many isoforms since reads derived from these genes can map to many more transcripts, and we find that Salmon is also generally more accurate in this case (Fig. 1d). Salmon is designed to run in parallel so that the procedure scales better with the number of reads in an experiment. Salmon can quantify abundance either via a lightweight alignment procedure (Online methods, Lightweight alignment and Supplementary Fig. 2), or using pre-computed alignments provided in SAM or BAM format -we find that the quantification accuracy is robust to this choice of input ( Supplementary Fig. 3). Salmon is also typically more accurate than a recent unpublished procedure Kallisto ( Supplementary Figs. 4&5, Supplementary Table 1).An innovation contributing to Salmon's speed and accuracy is ...
Distal expression quantitative trait loci (distal eQTLs) are genetic mutations that affect the expression of genes genomically far away. However, the mechanisms that cause a distal eQTL to modulate gene expression are not yet clear. Recent high-resolution chromosome conformation capture experiments along with a growing database of eQTLs provide an opportunity to understand the spatial mechanisms influencing distal eQTL associations on a genome-wide scale. We test the hypothesis that spatial proximity contributes to eQTL-gene regulation in the context of the higher-order domain structure of chromatin as determined from recent Hi-C chromosome conformation experiments. This analysis suggests that the large-scale topology of chromatin is coupled with eQTL associations by providing evidence that eQTLs are in general spatially close to their target genes, occur often around topological domain boundaries and preferentially associate with genes across domains. We also find that within-domain eQTLs that overlap with regulatory elements such as promoters and enhancers are spatially more close than the overall set of within-domain eQTLs, suggesting that spatial proximity derived from the domain structure in chromatin plays an important role in the regulation of gene expression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.