2024
DOI: 10.1101/2024.03.05.24303792
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation

Jonas A Gustafson,
Sophia B Gibson,
Nikhita Damaraju
et al.

Abstract: Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
12
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 91 publications
1
12
0
Order By: Relevance
“…Meanwhile, we observed more inversions, duplications, and large deletions >10kb in SRS than in LRS, likely due both to higher false positives in the SRS callset and limitations in alignment-based SV calling from LRS. From our study, we also demonstrated that LRS-based reference panels (ADRC) significantly improved variant allele frequency estimates, stressing the need for high-quality, diverse ancestry population reference such as the ongoing LRS sequencing of the 1000 Genome Consortium or All of Us [52,53]. In addition, we also examined variation at TR loci, genotyping TR copy number from LRS at 2 times more loci than profiled routinely in SRS, and identifying longer TREs even up to 10kb.…”
Section: Discussionmentioning
confidence: 96%
“…Meanwhile, we observed more inversions, duplications, and large deletions >10kb in SRS than in LRS, likely due both to higher false positives in the SRS callset and limitations in alignment-based SV calling from LRS. From our study, we also demonstrated that LRS-based reference panels (ADRC) significantly improved variant allele frequency estimates, stressing the need for high-quality, diverse ancestry population reference such as the ongoing LRS sequencing of the 1000 Genome Consortium or All of Us [52,53]. In addition, we also examined variation at TR loci, genotyping TR copy number from LRS at 2 times more loci than profiled routinely in SRS, and identifying longer TREs even up to 10kb.…”
Section: Discussionmentioning
confidence: 96%
“…Such an approach will allow full completion of the sequence of more common structural haplotypes, while at the same time provide insights into rare structural haplotypes including those associated with diseasesenabling consideration and inclusion of large-scale clinical cohorts from a wide diversity of geographical locations. During early stages of our study, we thus coordinated sample sets with efforts piloting genome sequence assembly in smaller 1kGP sample subsets 5,20,93 , and provided open access to our data and callsets. This approach to open data sharing is guided by the principles of the 1kGP and is inspired by the potential to combine intermediate-and high-coverage techniques to advance the completion of the catalogue of human genomic variation encompassing the entire 1kGP cohort 3,29,32,33 in the near future.…”
Section: Discussionmentioning
confidence: 99%
“…In the ALS/FTD samples we analyzed, we identified several expanded and non-reference alleles at the panel targets including in STARD7, BEAN1, RFC1, and DAB1 . Recently whole genome, long-read sequencing projects have uncovered significant heterogeneity in both motif composition and size at these pentanucleotide repeat loci, as well as other disease-associated tandem repeats, across populations (20, 54). Using this targeted panel we are able to efficiently and accurately genotype these loci as well as rare and complex alleles with HMMSTR.This enables an increased power to investigate and characterize a wide range of variation at disease-associated tandem repeats in both cases and controls.…”
Section: Discussionmentioning
confidence: 99%