2015
DOI: 10.3389/fgene.2015.00045
|View full text |Cite
|
Sign up to set email alerts
|

Identification of copy number variants in whole-genome data using Reference Coverage Profiles

Abstract: The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing var… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
6
2
1

Relationship

3
6

Authors

Journals

citations
Cited by 22 publications
(22 citation statements)
references
References 62 publications
0
22
0
Order By: Relevance
“…16 Successfully normalized autosomal coverage follows a narrow distribution, centered on 100% of the expected diploid coverage; the width of this distribution serves as a metric of uniformity of genome coverage.…”
Section: Resultsmentioning
confidence: 99%
“…16 Successfully normalized autosomal coverage follows a narrow distribution, centered on 100% of the expected diploid coverage; the width of this distribution serves as a metric of uniformity of genome coverage.…”
Section: Resultsmentioning
confidence: 99%
“…SV detection methods that use read-pair and split read information [8] can detect deletions and duplications but most CNVfocused approaches look for an increased or decreased read coverage, the expected consequence of a duplication or a deletion. Coverage-based methods exist to analyze single samples [9], pairs of samples [10] or multiple samples [11][12][13] but the presence of technical bias in WGS remains an important challenge. Indeed, various features of sequencing experiments, such as mappability [14,15], GC content [16], replication timing [17], DNA quality and library preparation [18], have a negative impact on the uniformity of the read coverage [19].…”
Section: Introductionmentioning
confidence: 99%
“…We studied intermediate files in our previously published coverage analysis pipeline for WGS data [5] . Briefly, the pipeline condenses depth-of-coverage information into a compact format, computes summary statistics, computes a Reference Coverage Profile (RCP) from multiple such files, normalizes each genome's coverage relative to the RCP, uses a hidden Markov model (HMM) to segment the normalized coverage into regions of uniform ploidy, and filters these results to identify regions of unusual ploidy relative to a reference population ( Figure 3).…”
Section: Qc Of An Analysis Pipelinementioning
confidence: 99%
“…Here, we used BDQC to evaluate the output files of four steps in this pipeline (green arrows in Figure 3). We studied a large collection of 4461 genome assemblies produced by Complete Genomics, Inc. using a wide variety of software versions as described [5] , flavors of the technology, and two reference versions (hg18 and hg19). We visualized the results ( Figure 3) and observed two types of failure: missing files and BDQC-flagged outliers.…”
Section: Qc Of An Analysis Pipelinementioning
confidence: 99%