2014
DOI: 10.1371/journal.pcbi.1003998
|View full text |Cite
|
Sign up to set email alerts
|

Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies

Abstract: Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

9
230
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 266 publications
(239 citation statements)
references
References 72 publications
9
230
0
Order By: Relevance
“…Gene annotation of low-quality draft genome assemblies is known to be problematic (51). We therefore verified that our mealybug assemblies were sufficient for our purpose of establishing gene presence or absence by comparing our gene sets with databases containing core eukaryotic [Core Eukaryotic Genes Mapping Approach (CEGMA)] and Arthropod [Benchmarking Universal Single-Copy Orthologs (BUSCO)] gene sets.…”
Section: Phylogenetic Analyses Confirm the Intra-tremblaya γ-Proteobamentioning
confidence: 77%
“…Gene annotation of low-quality draft genome assemblies is known to be problematic (51). We therefore verified that our mealybug assemblies were sufficient for our purpose of establishing gene presence or absence by comparing our gene sets with databases containing core eukaryotic [Core Eukaryotic Genes Mapping Approach (CEGMA)] and Arthropod [Benchmarking Universal Single-Copy Orthologs (BUSCO)] gene sets.…”
Section: Phylogenetic Analyses Confirm the Intra-tremblaya γ-Proteobamentioning
confidence: 77%
“…Briefly, the order and orientation of contigs of such draft assemblies remains unresolved and the differentiation between traits, which are verified to be chromosomally-encoded versus plasmid-encoded, is not possible particularly when one considers plasmid integration events. Most notable, however, is the finite nature of a finished genome which facilitates the comparison of the full genetic content of a strain rather than most of the genetic content, whereas in the case of a draft genome the likelihood of error from missing genes or incorrect copy number is significantly higher [23, 24]. …”
Section: Resultsmentioning
confidence: 99%
“…), as draft assemblies are often incorrect in annotating multigene family copy number (Denton et al . ) and whole‐genome shotgun assemblies are typically poor at adequately resolving repeat structures (She et al . ).…”
Section: Discussionmentioning
confidence: 99%