2021
DOI: 10.2174/1574893616666210621101150
|View full text |Cite
|
Sign up to set email alerts
|

NLP-MeTaxa: A Natural Language Processing Approach for Metagenomic Taxonomic Binning Based on Deep Learning

Abstract: Background: Metagenomics is the study of genomic content in mass from an environment of interest such as the human gut or soil. Taxonomy is one of the most important fields of metagenomics, which is the science of defining and naming groups of microbial organisms that share the same characteristics. The problem of taxonomy classification is the identification and quantification of microbial species or higher-level taxa sampled by high throughput sequencing. Objective: Although many methods exist to deal wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…Given the analogy between NLP and DNA analyses, it is not surprising to see adaptations of word embedding algorithms to DNA sequence data. The word2vec method [ 122 ] has been adapted to generate k-mer and sequence embeddings by both NLP-MeTaxa [ 123 ] and FastDNA [ 124 ]. FastDNA was reused within the Metagenome2Vec method [ 125 ] to combine word embeddings with taxonomy and create a metagenome embedding.…”
Section: Resultsmentioning
confidence: 99%
“…Given the analogy between NLP and DNA analyses, it is not surprising to see adaptations of word embedding algorithms to DNA sequence data. The word2vec method [ 122 ] has been adapted to generate k-mer and sequence embeddings by both NLP-MeTaxa [ 123 ] and FastDNA [ 124 ]. FastDNA was reused within the Metagenome2Vec method [ 125 ] to combine word embeddings with taxonomy and create a metagenome embedding.…”
Section: Resultsmentioning
confidence: 99%
“…Given the analogy between NLP and DNA analyses, it is not surprising to see adaptations of word embedding algorithms to DNA sequence data. The word2vec method ( [97]) has been adapted to generate k-mer and sequence embeddings by both NLP-MeTaxa ( [98]) and FastDNA. ( [99]).…”
Section: Methods Inspired By Natural Language Processingmentioning
confidence: 99%
“…NLP-based methods appear to be better suited for metagenomics since nucleotide sequences can be processed as words and are less amenable to representation as images ( 16 ). For example, NLP-based methods that use filters to slide off DNA “sentence” matrices have been applied for genomic binning ( 17 ), viral sequence identification, and phenotypic classification for cancer reads ( 18 ). DL applications for taxonomic classification have been explored to some degree for NLP ( 19 ) and CNN architectures ( 20 ) but have not yet been widely applied ( 7 ).…”
Section: Introductionmentioning
confidence: 99%