2022
DOI: 10.1038/s41586-022-05325-5
|View full text |Cite
|
Sign up to set email alerts
|

Semi-automated assembly of high-quality diploid human reference genomes

Abstract: The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
129
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 133 publications
(133 citation statements)
references
References 102 publications
3
129
1
Order By: Relevance
“…On the other hand, HiFi-FALCON, HiFi-HiCanu and HiFi-Hifiasm nuclear scaffolds contained very similar amounts of 5S rDNA arrays, 1.64–1.68 Mb, and of centromeres, 13.63–13.69 Mb. To investigate the reliability of our assemblies in these repetitive regions, we analyzed potentially collapsed and expandable sequences in the scaffolded assemblies ( 67 , 73 ). According to annotations of repeat features in the assemblies, 5S rDNAs and centromeres did not appear to contribute substantially to the collapsed sequences in the HiFi-FALCON, HiFi-HiCanu and HiFi-Hifiasm nuclear scaffolds ( Supplementary Figure S6 ).…”
Section: Resultsmentioning
confidence: 99%
“…On the other hand, HiFi-FALCON, HiFi-HiCanu and HiFi-Hifiasm nuclear scaffolds contained very similar amounts of 5S rDNA arrays, 1.64–1.68 Mb, and of centromeres, 13.63–13.69 Mb. To investigate the reliability of our assemblies in these repetitive regions, we analyzed potentially collapsed and expandable sequences in the scaffolded assemblies ( 67 , 73 ). According to annotations of repeat features in the assemblies, 5S rDNAs and centromeres did not appear to contribute substantially to the collapsed sequences in the HiFi-FALCON, HiFi-HiCanu and HiFi-Hifiasm nuclear scaffolds ( Supplementary Figure S6 ).…”
Section: Resultsmentioning
confidence: 99%
“…It is anticipated that the T2T Consortium will generate more complete genome assemblies from a diversity of human samples and non-human primates. These will help us to fully understand the extent of complex/discrepant regions in humans [4][5][6][7][8][9][10][11]25,26 and their biological impact using reference-free approaches.…”
Section: Discussionmentioning
confidence: 99%
“…2c). We examined the length of CR1 gene in the 94 long-read human genome assembly from the Human Pangenome Reference Consortium (HPRC) [24][25][26] and the length of CR1 in 79 assemblies coincides with that of T2T-CHM13. This suggests that T2T-CHM13 carries the major allele of CR1 (allele frequency: 0.84) (Fig.…”
Section: Gene Model and Structure Differences In The Cn Polymorphic D...mentioning
confidence: 99%
See 1 more Smart Citation
“…In addition to biparental crosses, this strategy can be utilized for linkage mapping studies on other groups of related individuals, such as family studies in humans. As the cost and analytical constraints of generating long-read genome assemblies continues to decline (Jarvis et al 2022), it will become possible to generate high-confidence de novo whole genome sequences of every individual in a mapping study, including genome-wide association studies of unrelated humans. We expect that assembly of highly complete genomes using long-read sequencing will be vital for us to understand the genetic sources of trait variation.…”
Section: Discussionmentioning
confidence: 99%