2020
DOI: 10.1101/2020.05.22.110833
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards complete and error-free genome assemblies of all vertebrate species

Abstract: High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species 1-4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

23
465
2

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 250 publications
(490 citation statements)
references
References 100 publications
23
465
2
Order By: Relevance
“…We further advocate for the use of the hill climbing feature in HapSolo, because the computational cost is relatively small but the gains -while not always substantive -can be large (Figure 3). We believe HapSolo can be used to further improve assemblies that use heterozygous samples and employ HiC scaffolding, such as the data being created by the Vertebrate Genome Project [32].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We further advocate for the use of the hill climbing feature in HapSolo, because the computational cost is relatively small but the gains -while not always substantive -can be large (Figure 3). We believe HapSolo can be used to further improve assemblies that use heterozygous samples and employ HiC scaffolding, such as the data being created by the Vertebrate Genome Project [32].…”
Section: Resultsmentioning
confidence: 99%
“…The data were all based on the Pacific Biosciences (PacBio) sequencing platform and either provided by authors or downloaded from public databases. The chromosome number for each species were found in various sources [20,32,33] Contig Alignments: For each genome, the pre-processing step prior to application of HapSolo was to perform pairwise contig alignments, as described above. For this study, we used Blat v35 [27], although in principle other aligners, like minimap2 [34] could be used.…”
Section: Methodsmentioning
confidence: 99%
“…Despite the advances in sequencing and mapping technologies and the ever-increasing number of sophisticated algorithms and pipelines available, generating error-free eukaryotic genome assemblies in a purely automated fashion is currently not possible [1,2]. Assembly software designed to generate continuous sequence from raw reads is confused by heterozygous or repeat-rich regions, introducing erroneous duplications, collapses and misjoins.…”
Section: Assembly Curation Adds Significant Valuementioning
confidence: 99%
“…We therefore also provide detailed recommendations on how to create similar analyses that do not use gEVAL to promote the core, proven design concepts in gEVAL.. This is especially timely in the context of emerging projects that aim to assemble the genomes of very large numbers of species to highest quality possible, including the Vertebrate Genomes Project (VGP), the Darwin Tree of Life Project (DToL, darwintreeoflife.org) and the overarching Earth Biogenome Project (EBP) [1,11].…”
Section: Assembly Curation Adds Significant Valuementioning
confidence: 99%
See 1 more Smart Citation