2022
DOI: 10.1016/j.cub.2022.04.085
|View full text |Cite
|
Sign up to set email alerts
|

Mixing genome annotation methods in a comparative analysis inflates the apparent number of lineage-specific genes

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
58
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 60 publications
(61 citation statements)
references
References 28 publications
3
58
0
Order By: Relevance
“…3 To edit and visualize the genomic information we used the Apollo genome browser v.2.0.8 (Lee et al, 2009). The manual curation of candidate gene models allowed to reduce biases produced by annotation heterogeneity (Weisman et al, 2022). Also, to validate each OBP candidate, we identified OBP functional domains with Interproscan v.5.52 (Jones et al, 2014) by comparing candidate OBPs against the Pfam database v.33.1 (Mistry et al, 2021) and Conserved Domains Database (CDD) v. 3.18 (Marchler-Bauer et al, 2005).…”
Section: Identification Of Odorant-binding Protein Gene Candidates In...mentioning
confidence: 99%
“…3 To edit and visualize the genomic information we used the Apollo genome browser v.2.0.8 (Lee et al, 2009). The manual curation of candidate gene models allowed to reduce biases produced by annotation heterogeneity (Weisman et al, 2022). Also, to validate each OBP candidate, we identified OBP functional domains with Interproscan v.5.52 (Jones et al, 2014) by comparing candidate OBPs against the Pfam database v.33.1 (Mistry et al, 2021) and Conserved Domains Database (CDD) v. 3.18 (Marchler-Bauer et al, 2005).…”
Section: Identification Of Odorant-binding Protein Gene Candidates In...mentioning
confidence: 99%
“…S2). Another issue raised for genomic phylostratigraphy is that spurious genome annotations or comparison between annotations with different levels of quality and accuracy can overestimate the proportion of recently-evolved proteins in the analysis (23,31). To address this short-coming, GenEra is able to include an additional protein-against-nucleotide search through MMseqs2 (36) with its most sensitive parameters (s = 7.5) to reconfirm gene age assignments with an annotation-free approach solely based on six-frame alignments.…”
Section: Methodsmentioning
confidence: 99%
“…While conceptually powerful, several studies have questioned the detection sensitivity of the phylostratigraphic approach (13)(14)(15)22). Gene ages may appear younger than they actually are due to gene prediction errors in the target database (23). Previous approaches have also overlooked contamination or horizontal gene transfer across lineages, which can overestimate a gene's age in a given organism (24).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Examples of genome annotation errors include gene models that may be fragmented or multiple genes incorrectly merged, genes broken by technical errors such as fragmented assemblies or sequencing errors that produce stop codons or frameshifts, or missing genes due to a lack of evidence or divergence from the training dataset. Such errors complicate analyses of orthology typically used to define evolutionary and putative functional relationships of genes within and between species, and likely confounds estimates of shared and species-specific annotations [21]. Many examples in the literature describe the cloning and validation of a gene structure in response to identifying an error in a gene model from a genomic resource; unfortunately, these updated experimentally validated models are rarely used to correct the original resource.…”
Section: Glossarymentioning
confidence: 99%