BackgroundThe initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR.ResultsWe found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application.ConclusionsA researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0697-y) contains supplementary material, which is available to authorized users.
This article presents a novel compiler framework for CUDA code generation. The compiler structure is designed to support autotuning, which employs empirical techniques to evaluate a set of alternative mappings of computation kernels and select the mapping that obtains the best performance. This article introduces a Transformation Strategy Generator, a meta-optimizer that generates a set of transformation recipes, which are descriptions of the mapping of the sequential code to parallel CUDA code. These recipes comprise a search space of possible implementations. This system achieves performance comparable and sometimes better than manually tuned libraries and exceeds the performance of a state-of-the-art GPU compiler.
Copy number variations have been linked to numerous genetic diseases including cancer, Parkinson's disease, pancreatitis, and lupus. While current best practices for CNV detection often require using microarrays for detecting large CNVs or multiplex ligation-dependent probe amplification (MLPA) for gene-sized CNVs, new methods have been developed with the goal of replacing both of these specialized assays with bioinformatic analysis applied to next-generation sequencing (NGS) data. Because NGS is already used by clinical labs to detect small coding variants, this approach reduces associated costs, resources, and analysis time. This chapter provides an overview of the various approaches to CNV detection via NGS data, and examines VS-CNV, a commercial tool developed by Golden Helix, which provides robust CNV calling capabilities for both gene panel and exome data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.