2020
DOI: 10.1186/s13059-020-02168-z
|View full text |Cite
|
Sign up to set email alerts
|

The design and construction of reference pangenome graphs with minigraph

Abstract: The recent advances in sequencing technologies enable the assembly of individual genomes to the quality of the reference genome. How to integrate multiple genomes from the same species and make the integrated representation accessible to biologists remains an open challenge. Here, we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implement our ideas in the minigraph toolkit and demonstrate that we can effi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
378
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 297 publications
(380 citation statements)
references
References 73 publications
2
378
0
Order By: Relevance
“…This can help distinguish stabilizing versus fragile motifs (Braida et al 2010) , or resolve some of the problem of missing heritability by discovering new associations between motif and disease (Song, Lowe, and Kingsley 2018) . Finally, this work is a part of ongoing pangenome graph analysis (Paten et al 2017;Li, Feng, and Chu 2020) , and represents an approach to generating pangenome graphs in loci that have difficult multiple sequence alignments or degenerate graph topologies. Additional methods may be developed to harmonize danbing-tk RPGGs with genome-wide pangenome graphs constructed from other methods.…”
Section: Discussionmentioning
confidence: 99%
“…This can help distinguish stabilizing versus fragile motifs (Braida et al 2010) , or resolve some of the problem of missing heritability by discovering new associations between motif and disease (Song, Lowe, and Kingsley 2018) . Finally, this work is a part of ongoing pangenome graph analysis (Paten et al 2017;Li, Feng, and Chu 2020) , and represents an approach to generating pangenome graphs in loci that have difficult multiple sequence alignments or degenerate graph topologies. Additional methods may be developed to harmonize danbing-tk RPGGs with genome-wide pangenome graphs constructed from other methods.…”
Section: Discussionmentioning
confidence: 99%
“…Using the bubble popping algorithm of gfatools 21 , we identified 68,328 structural variations present in the multi-assembly graph. To reveal true alleles within these structural variations, we traversed all possible paths through the bubbles (i.e., alleles) and retained only those that were supported by at least one assembly ( Supplementary Fig.…”
Section: Structural Variation Discovery From the Multi-assembly Graphmentioning
confidence: 99%
“…We used minigraph 21 The genetic distance among the six assemblies was estimated using Mash (version 2.2) 22 .…”
Section: Construction Of the Multi-assembly Graphmentioning
confidence: 99%
See 1 more Smart Citation
“…The incorporation of thousands of individuals into a single reference will avoid “reference bias”, and mapping reads to such a pan-genome will improve variant calling, especially in regions with a high density of complex variants [ 107 ]. While many of the proposed pan-genome implementations represent genomes as graphs with shared and private variants, some of the new approaches have proposed elegant ways of creating pan-genome graphs while preserving linear coordinates [ 108 ]. In the future, ultra-long accurate reads, coupled with complete reference pan-genomes, will enable the full understanding of the underlying functional variation hidden in the repetitive parts of the genome.…”
Section: Futurementioning
confidence: 99%