2023
DOI: 10.1038/s41592-023-01914-y
|View full text |Cite
|
Sign up to set email alerts
|

Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes

Abstract: Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreting variation at various scales, from smaller tandem repeats to megabase rearrangements, across many human genomes. We present a PanGenome Research Tool Kit (PGR-TK) enabling analyses of complex pangenome structural and haplotype variation at multiple scales. We apply the graph decomposition methods in PGR… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 22 publications
(10 citation statements)
references
References 66 publications
0
10
0
Order By: Relevance
“…The use of pangenomes for association testing has largely relied on presence/absence variation between groups (Brynildsrud et al, 2016; Leonard et al, 2022), decomposed graph genome variation (Chin et al, 2023), or genotypes obtained by mapping of short reads to pangenomes (Cochetel et al, 2023; Sirén et al, 2021). We have developed and applied an approach that compares path similarities and differences between assemblies of samples with divergent phenotypes.…”
Section: Discussionmentioning
confidence: 99%
“…The use of pangenomes for association testing has largely relied on presence/absence variation between groups (Brynildsrud et al, 2016; Leonard et al, 2022), decomposed graph genome variation (Chin et al, 2023), or genotypes obtained by mapping of short reads to pangenomes (Cochetel et al, 2023; Sirén et al, 2021). We have developed and applied an approach that compares path similarities and differences between assemblies of samples with divergent phenotypes.…”
Section: Discussionmentioning
confidence: 99%
“…To characterize the structural diversity of the amylase locus, we first constructed a minimizer anchored pangenome graph (MAP-graph) 22 from 94 amylase haplotypes derived from 54 long-read, haplotype resolved genome assemblies recently sequenced by the Human Pangenome Reference Consortium (HPRC) 23 alongside GRCh38 and the newly sequenced T2T-CHM13 reference 24 ( Fig 2B , see methods). The MAP-graph captures large-scale sequence structures with vertices representing sets of homologous or paralogous sequences; thus, input haplotypes can be represented as paths through the graph.…”
Section: Mainmentioning
confidence: 99%
“…Genomic analysis has mainly been based on using a linear reference genome since the release of its initial draft in 2001 [1]. However, this approach is inadequate to represent the genetic diversity of a species [2][3][4][5]. Given the fact that the human reference genome (e.g., GRCh38) harbors only a single representative scaffold for each chromosome as the primary sequence, using it as a reference genome comes with severe limitations, including population-specific read mapping biases.…”
Section: Introductionmentioning
confidence: 99%