2021
DOI: 10.1101/2021.04.09.438906
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

False gene and chromosome losses affected by assembly and sequence errors

Abstract: Many genome assemblies have been found to be incomplete and contain mis-assemblies. The Vertebrate Genomes Project (VGP) has been producing assemblies with an emphasis on being as complete and error-free as possible, utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. Here we evaluate these new vertebrate genome assemblies relative to the previous references for the same species, including a mammal (platypus), two birds (zebra finch, Anna's hummingbird), and a fish … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
3

Relationship

5
1

Authors

Journals

citations
Cited by 20 publications
(20 citation statements)
references
References 75 publications
0
20
0
Order By: Relevance
“…Furthermore, we identified seven additional chromosomes (chromosomes [30][31][32][33][34][35][36] in the zebra finch, and eight (chromosomes 8, 9, 14, 15, 17, 19, 21, and X4; Extended Data Fig. 8a, b) in the platypus 26,27 . Relative to the VGP assembly, the earlier short-read Anna's hummingbird assembly was highly fragmented (Extended Data Fig.…”
Section: Curation Is Needed For a High-quality Referencementioning
confidence: 95%
See 1 more Smart Citation
“…Furthermore, we identified seven additional chromosomes (chromosomes [30][31][32][33][34][35][36] in the zebra finch, and eight (chromosomes 8, 9, 14, 15, 17, 19, 21, and X4; Extended Data Fig. 8a, b) in the platypus 26,27 . Relative to the VGP assembly, the earlier short-read Anna's hummingbird assembly was highly fragmented (Extended Data Fig.…”
Section: Curation Is Needed For a High-quality Referencementioning
confidence: 95%
“…Thousands of such false gains and losses in previous reference assemblies have been corrected in our VGP assemblies (more details in refs. 27,44 ), demonstrating that assembly quality has a critical effect on subsequent annotations and functional genomics.…”
Section: Articlementioning
confidence: 99%
“…The sequencing enzymes used often have difficulty reading through regions with complex structures, such as GC-rich regions often found in promoters that regulate gene expression 9, 10 . It is also now clear that mixing diverse haplotypes in a single assembly, even from the same individual, can introduce many errors with standard assembly tools 8, 10, 11 . These errors include: switch errors where variants from each haplotype are assembled into the same pseudo- haplotype; false duplications and associated gaps where more divergent haplotype homologs are assembled as separate false paralogs; and consensus errors due to collapses between haplotypes.…”
Section: Mainmentioning
confidence: 99%
“…Module F, a component in HVC, LMAN, and Area X specializations, was enriched on chromosome 2. In addition to module D, both modules J and K were enriched on "other", a category that includes newly identified zebra finch micro-chromosomes 36 . These findings indicate a remarkable association between the structure of song nuclei gene expression networks, measured in an unbiased way with WGCNA, and the chromosomal structure of the genome.…”
Section: Gene Modules Enriched On Specific Chromosomesmentioning
confidence: 99%