2015
DOI: 10.1186/s12859-015-0736-4
|View full text |Cite
|
Sign up to set email alerts
|

Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies

Abstract: MotivationNext-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate ac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
2
2
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 12 publications
(15 citation statements)
references
References 16 publications
0
15
0
Order By: Relevance
“…A summary of patients used in this study is given in Table 1 . Transcriptome data (Affymetrix microarray) from whole blood were compared to matching genotypes generated from whole-genome sequencing [ 17 ]. In brief, eQTL mapping was performed by linear regression on adjusted data and false discovery rate (FDR) was estimated with a permutation method separately for local ( cis defined as less than 1 Mb distance from SNP to gene) and distant associations ( trans , greater than 1 Mb from SNP to gene including across chromosomes; see “ Methods ”).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…A summary of patients used in this study is given in Table 1 . Transcriptome data (Affymetrix microarray) from whole blood were compared to matching genotypes generated from whole-genome sequencing [ 17 ]. In brief, eQTL mapping was performed by linear regression on adjusted data and false discovery rate (FDR) was estimated with a permutation method separately for local ( cis defined as less than 1 Mb distance from SNP to gene) and distant associations ( trans , greater than 1 Mb from SNP to gene including across chromosomes; see “ Methods ”).…”
Section: Resultsmentioning
confidence: 99%
“…Genotypes were generated from whole-genome sequencing as described in Standish et al [ 17 ]. DNA isolated from whole blood was sequenced and DNA variants called using a modified GATK pipeline.…”
Section: Methodsmentioning
confidence: 99%
“…While group-calling is the approach recommended by GATK, with the exception of one paper from our lab exploring the impact of genetic background on group calling, to our knowledge there has not been a published systematic comparison of group calling vs. single sample calling on a gold standard dataset to quantify the advantages of group calling [30]. Group-calling is not without problems.…”
Section: Discussionmentioning
confidence: 99%
“…Several uses of BBs have been shown in order to mitigate the I/O bottlenecks of data-intensive workloads [6,32,33,34]. Most studies surrounding the design and use of BBs have so far focused on the I/O characteristics of individual applications [35] or small components within workflows [4]. However, research into optimizing scientific workflows with diverse I/O and storage requirements for BBs is still in its infancy, and a limited body of work presently exists [3,36].…”
Section: Related Workmentioning
confidence: 99%
“…Thus, while providers of supercomputing resources must continue to support the extreme bandwidth requirements of traditional supercomputing applications, centers must now also deploy resources that are capable of supporting the requirements of these emerging data-intensive workflows. In sharp contrast to the highly coherent, sequential, large-transaction reads and writes that are characteristic of traditional HPC checkpoint-restart workloads [2], dataintensive workflows have been shown to often utilize non-sequential, metadataintensive, and small-transaction reads and writes [3,4]. However, parallel file systems in today's supercomputers have been optimized for more traditional HPC workloads [5].…”
Section: Introductionmentioning
confidence: 99%