2020
DOI: 10.1186/s12859-020-03744-7
|View full text |Cite
|
Sign up to set email alerts
|

Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life

Abstract: Background It is a computational challenge for current metagenomic classifiers to keep up with the pace of training data generated from genome sequencing projects, such as the exponentially-growing NCBI RefSeq bacterial genome database. When new reference sequences are added to training data, statically trained classifiers must be rerun on all data, resulting in a highly inefficient process. The rich literature of “incremental learning” addresses the need to update an existing classifier to accommodate new dat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 19 publications
(22 citation statements)
references
References 49 publications
0
13
0
Order By: Relevance
“…Bracken 47 and NBC++ (ref. 48 ) had completeness above 80% at either rank, and CCMetagen 49 , DUDes v.0.08 (ref. 50 ), LSHVec v.gsa 51 , Metalign 52 , MetaPalette 53 and MetaPhlAn v.cami1 more than 80% purity.…”
Section: Resultsmentioning
confidence: 99%
“…Bracken 47 and NBC++ (ref. 48 ) had completeness above 80% at either rank, and CCMetagen 49 , DUDes v.0.08 (ref. 50 ), LSHVec v.gsa 51 , Metalign 52 , MetaPalette 53 and MetaPhlAn v.cami1 more than 80% purity.…”
Section: Resultsmentioning
confidence: 99%
“…The exponentially increasing size of available sequence data creates computational challenges and this has led to the development of more efficient methods of comparing genomes 22 , 23 . Our comparison of third position bias in codon pairs adds a new tool for analysis of all types of genomes.…”
Section: Discussionmentioning
confidence: 99%
“…Methods are in development to periodically update these that could prove important. 43 The second limitation with respect to the use of sequencing methods for quantification is that absolute abundances are needed for QMRA, rather than relative abundances. There are several potential routes to solve this problem, though at best they can be regarded as under development.…”
Section: ■ Information Needed For Qmramentioning
confidence: 99%
“…With all sequencing-based methods, the use of a robust and updated reference database is needed. Methods are in development to periodically update these that could prove important …”
Section: Information Needed For Qmramentioning
confidence: 99%