2022
DOI: 10.1002/ece3.8625
|View full text |Cite
|
Sign up to set email alerts
|

Metagenomic clustering reveals microbial contamination as an essential consideration in ultraconserved element design for phylogenomics with insect museum specimens

Abstract: Phylogenomics via ultraconserved elements (UCEs) has led to improved phylogenetic reconstructions across the tree of life. However, inadvertently incorporating non‐targeted DNA into the UCE marker design will lead to misinformation being incorporated into subsequent analyses. To date, the effectiveness of basic metagenomic filtering strategies has not been assessed in arthropods. Designing markers from museum specimens requires careful consideration of methods due to the high levels of microbial contamination … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 98 publications
1
4
0
Order By: Relevance
“…We identified >50,000 contaminating proteins in this set, while Conterminator identified 327−14,148 depending on the database configuration used. These figures agree with previous reports of contamination in reference sequence databases 59 and (meta)genomes 10 , 26 , 60 62 , however, our inventory highlighted a range of novel patterns. First, the number of contaminating proteins covered three orders of magnitudes: it ranged from a handful of proteins up to >12,000, in extreme cases allowing the subtraction of presumably complete protein repertoires of the contaminating organism 14 , 15 from the contaminated genome.…”
Section: Discussionsupporting
confidence: 92%
See 1 more Smart Citation
“…We identified >50,000 contaminating proteins in this set, while Conterminator identified 327−14,148 depending on the database configuration used. These figures agree with previous reports of contamination in reference sequence databases 59 and (meta)genomes 10 , 26 , 60 62 , however, our inventory highlighted a range of novel patterns. First, the number of contaminating proteins covered three orders of magnitudes: it ranged from a handful of proteins up to >12,000, in extreme cases allowing the subtraction of presumably complete protein repertoires of the contaminating organism 14 , 15 from the contaminated genome.…”
Section: Discussionsupporting
confidence: 92%
“…Due to various biological or technical issues, genomes may contain sequences that do not belong to the targeted organism 8 , 9 with projects relying on preserved museum- or metagenomic samples are particularly vulnerable to contamination 10 12 . If not carefully addressed, contaminated reference genomes poison public databases with inaccurately labeled sequence data, as demonstrated by a recent study that identified over 2 million records corresponding to contamination in GenBank alone 13 .…”
Section: Introductionmentioning
confidence: 99%
“…We also found that genome assembly metrics, such as completeness measures, do not correlate well with base genome performance, and alone are not a good justification for selecting a taxon to serve as the base genome. However, use of an annotated genome as the base will allow identification of UCE loci targeted by the probe design, which can in turn be used to merge cogenic loci for analysis, and prevent development of off-target probes, both of which are important aspects to consider (Van Dam et al 2021, 2022a). Finally, we found further evidence that combining generalized probes with tailored probes improves phylogenetic performance of the probe set.…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, we ultimately wanted the scarab probe design to utilize the available NCBI genome assembly of Onthophagus taurus . There were several reasons for this: (1) the assembly is highly complete (Table 1); (2) its annotation will allow all loci targeted by the resultant probe design to be identifiable, thus preventing inclusion of ‘off-target’ probes (see Van Dam et al 2022a); and (3) it performed relatively well as the base genome.…”
Section: Methodsmentioning
confidence: 99%
“…The other species belong to four Lamiinae tribes spanning the phylogenetic diversity of the subfamily Lamiinae (Apomecynini, Lamiini, Tetraopini, and Pteropliini) 28 . Soft masked files were used following guidelines of probe design 50 , 54 .…”
Section: Methodsmentioning
confidence: 99%