2022
DOI: 10.1093/bioinformatics/btac557
|View full text |Cite
|
Sign up to set email alerts
|

Metagenomic binning with assembly graph embeddings

Abstract: Motivation Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(30 citation statements)
references
References 38 publications
0
30
0
Order By: Relevance
“…For the long-read datasets, we compared to MetaBAT2 16 , MetaDecoder 18 , VAMB 19 , and SemiBin1 20 . We did not include GraphMB 25 and LRBinner 24 in this comparison because we used gold standard assemblies for binning in simulated datasets, and we could not obtain the assembly graph GraphMB requires as input and LRBinner cannot be run with co-assembly binning.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…For the long-read datasets, we compared to MetaBAT2 16 , MetaDecoder 18 , VAMB 19 , and SemiBin1 20 . We did not include GraphMB 25 and LRBinner 24 in this comparison because we used gold standard assemblies for binning in simulated datasets, and we could not obtain the assembly graph GraphMB requires as input and LRBinner cannot be run with co-assembly binning.…”
Section: Resultsmentioning
confidence: 99%
“…SemiBin2 uses the implementation of DBSCAN in scikit-learn 34 and runs DBSCAN with ε value equals to 0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, and 0.55. Then SemiBin2 integrates the results of these runs based on the single-copy genes that have been used in other tools 18,25 . In particular, SemiBin2 uses 107 single-copy genes 32 to estimate the completeness, contamination, and F1-score of every bin.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Woloszynek et al explore using word and sentence embedding approaches for nucleotide sequences, which may be a suitable numerical representation for downstream machine learning applications (especially deep learning). The results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream applications, such as binning [31][32][33]. Choi et al present all-to-all comparison of metagenomes using k-mer content and Hadoop for precise clustering [34].…”
Section: Quince Et Al Introduce Strainmentioning
confidence: 99%