2022
DOI: 10.1126/science.abj6965
|View full text |Cite
|
Sign up to set email alerts
|

Segmental duplications and their variation in a complete human genome

Abstract: Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
141
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 213 publications
(186 citation statements)
references
References 118 publications
4
141
0
Order By: Relevance
“…Interestingly, in human SD regions, we observe a paucity of CpG transition mutations, characteristically associated with spontaneous deamination of CpG dinucleotides and concomitant transitions (Duncan and Miller 1980). The basis for the latter is unclear but it may be partially explained by the recent observation that duplicated genes show greater degree of hypomethylation when compared to their unique counterparts (Vollger et al 2022). We propose that excess of guanosine and cytosine transversions is a direct consequence of GC-biased gene conversion (Duret and Galtier 2009) driven by an excess of double-stranded breaks that result from a high rate of NAHR events among paralogous sequences.…”
Section: Discussionmentioning
confidence: 86%
“…Interestingly, in human SD regions, we observe a paucity of CpG transition mutations, characteristically associated with spontaneous deamination of CpG dinucleotides and concomitant transitions (Duncan and Miller 1980). The basis for the latter is unclear but it may be partially explained by the recent observation that duplicated genes show greater degree of hypomethylation when compared to their unique counterparts (Vollger et al 2022). We propose that excess of guanosine and cytosine transversions is a direct consequence of GC-biased gene conversion (Duret and Galtier 2009) driven by an excess of double-stranded breaks that result from a high rate of NAHR events among paralogous sequences.…”
Section: Discussionmentioning
confidence: 86%
“…This way, a total of 33 different lcr16a copies (ppy_1–33) were identified ( supplementary table S1, Supplementary Material online; note that with the similarity between copies approaching that of allelic variation, in some cases it cannot be decided whether they represent independent copies or alleles of the same copy). Once it became available, the copies were mapped against the new orang-utan high-fidelity (HiFi) assembly (Susie_PAB_pri, Vollger et al 2022 ). Out of the 33 lcr16a-containing duplications 21 could be mapped ( fig.…”
Section: Resultsmentioning
confidence: 99%
“… Distribution of lcr16a duplications in the orang-utan genome and characteristics of their breakpoints. ( A ) The positions of the lcr16a duplicates on Susie_PAB_pri ( Vollger et al 2022 ) are mapped against the ponAbe3 ideogram available at the NCBI genome decoration page. ( B ) The number of repetitive elements overlapping the independent breakpoints (BP) at the lcr16a 5′ and 3′ end, respectively.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Human genome contains 22 autosomes and 2 sex chromosomes, and the reported human reference genome is the most researched human genome, whose extensively researched version is GRCh38 released from the Genome Reference Consortium in 2013 (patch GRCh38.p13 in 2019), with 3,099,734,149 bp in total sequence length and 2,948,611,470 bp in total ungapped length, including overall 473 scaffolds and 999 contigs [23,24]. Recently, the Telomere-to-Telomere Consortium has firstly finished the sequencing of a complete human (female) genome without any gap called T2T-CHM13 (CHM13), with 3,054,815,472 bp in total sequence length, which contains all centromeric satellite arrays and the short arms of all 5 acrocentric chromosomes (chr13, 14, 15, 21 and 22), and this complete genome can provide a more comprehensive perspective to analyze microsatellites in human genome [25-28]. Previously, we investigated STRs landscape maps in the full human reference chromosome Y (GRCh38: NC_000024.10) at 1 kilo base pairs (Kbp) resolution by Differential Calculator of Microsatellite (DCM) method [29], revealing an exact distributional feature of STRs in every 1-Kbp locational bins of the chromosome Y [30].…”
Section: Introductionmentioning
confidence: 99%