2020
DOI: 10.1093/bioinformatics/btaa586
|View full text |Cite
|
Sign up to set email alerts
|

Detecting and correcting misclassified sequences in the large-scale public databases

Abstract: Motivation As the cost of sequencing decreases, the amount of data being deposited into public repositories is increasing rapidly. Public databases rely on the user to provide metadata for each submission that is prone to user error. Unfortunately, most public databases, such as non-redundant (NR), rely on user input and do not have methods for identifying errors in the provided metadata, leading to the potential for error propagation. Previous research on a small subset of the non-redundant … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 36 publications
(46 citation statements)
references
References 26 publications
0
46
0
Order By: Relevance
“…Determining taxa-function robustness would provide further insight into the ecological resilience of the bacterial community (Eng & Borenstein, 2018). However, functional redundancy is not uniform across different bacterial groups (Griffiths & Philippot, 2013) and the level of functional redundancy within environmental systems remains unclear as the functional capacities of many bacterial groups are yet to be described, or even incorrectly annotated (Allison & Martiny, 2008;Bagheri et al, 2020). Moreover, describing bacterial functional capacities does not elucidate functional expression and individual microbes may modulate their metabolic performance in response to environmental stress (Eng & Borenstein, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…Determining taxa-function robustness would provide further insight into the ecological resilience of the bacterial community (Eng & Borenstein, 2018). However, functional redundancy is not uniform across different bacterial groups (Griffiths & Philippot, 2013) and the level of functional redundancy within environmental systems remains unclear as the functional capacities of many bacterial groups are yet to be described, or even incorrectly annotated (Allison & Martiny, 2008;Bagheri et al, 2020). Moreover, describing bacterial functional capacities does not elucidate functional expression and individual microbes may modulate their metabolic performance in response to environmental stress (Eng & Borenstein, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…As shown in the previous work [12], the provenance information could be utilized to clean the NR database by assigning more weight to the manually reviewed annotations.…”
Section: Provenance Of Annotationsmentioning
confidence: 99%
“…As it can be seen in Table 2, some proteins have thousands of taxonomic assignments. We previously explored the NR database for the taxonomic misclassified sequences [12]. The non-redundant version of annotations in the NR database improves the usage and querying of the NR database.…”
Section: Redundancy and Ambiguity Of Annotationsmentioning
confidence: 99%
See 2 more Smart Citations