2023
DOI: 10.1101/2023.06.12.544582
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Semi-automated curation and manual addition of sequences to build reliable and extensive reference databases for ITS2 vascular plant DNA (meta-)barcoding

Abstract: With the breakthrough of DNA (meta)-barcoding, it soon became clear that one of the most critical step for accurate taxonomic identification is to have an accurate DNA reference database for the DNA barcode marker of choice. Therefore, developing such a database has been a long-term ambition, especially in the Viridiplantae kingdom. Typically, reference databases are constructed from marker sequences downloaded from general public databases, which can carry taxonomic and other relevant errors. Herein, we const… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 63 publications
0
1
0
Order By: Relevance
“…Reads were first directly mapped iteratively with global alignments using VSEARCH against five flowering plant ITS2 reference databases (see below for construction details) for the study region and an identity cut-off threshold of at least 97% (higher percentages are prioritized). These references databases were created with BCdatabaser, then automatically curated 35 from GenBank entries with default parameters (length between 200 and 2000 bp, maximum nine sequences per species), from the following species lists: 1) all plant species recorded from IBGE. This database was then manually curated to remove voucher-less entries for greater trustworthiness.…”
Section: Methodsmentioning
confidence: 99%
“…Reads were first directly mapped iteratively with global alignments using VSEARCH against five flowering plant ITS2 reference databases (see below for construction details) for the study region and an identity cut-off threshold of at least 97% (higher percentages are prioritized). These references databases were created with BCdatabaser, then automatically curated 35 from GenBank entries with default parameters (length between 200 and 2000 bp, maximum nine sequences per species), from the following species lists: 1) all plant species recorded from IBGE. This database was then manually curated to remove voucher-less entries for greater trustworthiness.…”
Section: Methodsmentioning
confidence: 99%