2018
DOI: 10.1038/nbt.4227
|View full text |Cite
|
Sign up to set email alerts
|

Variation graph toolkit improves read mapping by representing genetic variation in the reference

Abstract: Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implem… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

9
748
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 552 publications
(758 citation statements)
references
References 34 publications
9
748
0
1
Order By: Relevance
“…We aligned trimmed reads to the human linear reference genome (hs37d5) using bwa aln [25] with parameters -l1024 -n 0.02 [31], keeping bases with quality above or equal to 15. We constructed the index file for vg [14] with hs37d5 and variants from the 1000 Genomes Project phase 3 dataset [13] above 0.1% MAF. In total, the graph contained 27,485,419 SNPs, 2,662,263 indels and 4,753 other small complex variants.…”
Section: Datasets and Sequence Data Processingmentioning
confidence: 99%
See 2 more Smart Citations
“…We aligned trimmed reads to the human linear reference genome (hs37d5) using bwa aln [25] with parameters -l1024 -n 0.02 [31], keeping bases with quality above or equal to 15. We constructed the index file for vg [14] with hs37d5 and variants from the 1000 Genomes Project phase 3 dataset [13] above 0.1% MAF. In total, the graph contained 27,485,419 SNPs, 2,662,263 indels and 4,753 other small complex variants.…”
Section: Datasets and Sequence Data Processingmentioning
confidence: 99%
“…However, a limitation of this approach is that it has only considered biallelic single nucleotide polymorphisms (SNPs). Therefore, non-reference alleles at insertion and deletion (indel) loci are not accounted for, despite there being hundreds of thousands of non-reference indels in a typical human genome [13], and these having a greater affect on read mapping than SNPs [14].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Paragraph works by aligning and genotyping reads on a local sequence graph constructed for each targeted SV. This approach is different from other proposed and most existing graph methods that create a single whole-genome graph and align all reads to this large graph 18,39 . A whole-genome graph may be able to rescue reads from novel insertions that are misaligned to other parts of the genome in the original linear reference, however, the computational cost of building such a graph and performing alignment against this graph is very high.…”
Section: Discussionmentioning
confidence: 91%
“…Most targeted methods for genotyping are integrated with particular discovery algorithms and require the input SVs to be originally discovered by the designated SV caller [15][16][17] , require a complete genome-wide realignment 18,19 or need to be optimized on a set of training samples 12,20 .…”
mentioning
confidence: 99%