2019
DOI: 10.1101/2019.12.12.865014
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

coil: an R package for cytochrome C oxidase I (COI) DNA barcode data cleaning, translation, and error evaluation

Abstract: AbstractBiological conclusions based on DNA barcoding and metabarcoding analyses can be strongly influenced by the methods utilized for data generation and curation, leading to varying levels of success in the separation of biological variation from experimental error. The five-prime region of cytochrome c oxidase subunit I (COI-5P) is the most common barcode gene for animals, with conserved structure and function that allows for biologically informed error identification. Here… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…NUMTdumper provides a method to screen for NuMTs based on read counts while acknowledging the trade-offs between removing all possible NuMTs while erroneously removing genuine reads. An R package called ‘coil’ has also recently been developed that will place COI barcode and metabarcode sequences in frame using profile HMM analysis [58]. MetaWorks aims to extend the COI metabarcode toolkit that provides a harmonized environment where data from other organismal markers in multi-marker, multi-trophic studies can also be analyzed.…”
Section: Resultsmentioning
confidence: 99%
“…NUMTdumper provides a method to screen for NuMTs based on read counts while acknowledging the trade-offs between removing all possible NuMTs while erroneously removing genuine reads. An R package called ‘coil’ has also recently been developed that will place COI barcode and metabarcode sequences in frame using profile HMM analysis [58]. MetaWorks aims to extend the COI metabarcode toolkit that provides a harmonized environment where data from other organismal markers in multi-marker, multi-trophic studies can also be analyzed.…”
Section: Resultsmentioning
confidence: 99%
“…The dataset was obtained on 22 November 2022 from the BOLD Systems website [4] after searching for "Tetrigidae" in the public data portal and downloading the combined TSV file of sequences and specimen information on all public records. The subsequent curation of the dataset was performed in R [27] using the "janitor" [28] and "coil" [12,13] packages.…”
Section: Methodsmentioning
confidence: 99%
“…The coil package detected 96 sequences (4% of the dataset) that likely contain indels causing shifts in the reading frame and none that contain stop codons. Those sequences were generally shorter, so it is possible that some were wrongly labeled, as coil has an error rate of up to 25% for short sequences [13]. Most of the non-indel sequences are close to the expected length, meaning that BOLD's own error-seeking system is adequate and the low amount of indels detected with coil could be due to differences in the algorithms used.…”
Section: Strengths and Shortcomings Of The Bold Databasementioning
confidence: 99%
See 1 more Smart Citation
“…The Barcode of Life Data System (BOLD) [59] was used as the source for mitochondrial sequence data as it contains thousands of published cytochrome c oxidase subunit I (COI) partial gene sequence records (16,676 sequences for over 2000 Squamata species as of July 16 th , 2021). COI barcode sequence data [60] for Squamata were obtained from a published dataset [61], originally downloaded into R on March 12 th , 2020. Data were filtered for records that have been identified to the species level, as this information was necessary for trait matching purposes.…”
Section: Trait Datamentioning
confidence: 99%