2018
DOI: 10.1101/490078
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Binning microbial genomes using deep learning

Abstract: Identification and reconstruction of microbial species from metagenomics wide genome sequencing data is an important and challenging task. Current existing approaches rely on gene or contig co-abundance information across multiple samples and k -mer composition information in the sequences. Here we use recent advances in deep learning to develop an algorithm that uses variational autoencoders to encode co-abundance and compositional information prior to clustering. We show that the deep network is able to inte… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(21 citation statements)
references
References 37 publications
0
21
0
Order By: Relevance
“…However, we show that the sequences as short as 100 bp formed separable clusters using high-level features extracted by DNNs. The feature type generated by supervised learning depends on training targets, which is more focused and task-relevant than auto-encoder methods 27 . Future researches might investigate how to utilize these features, and also incorporate them with co-occurrence or coverage information to build a powerful metagenome binning tool.…”
Section: Discussionmentioning
confidence: 99%
“…However, we show that the sequences as short as 100 bp formed separable clusters using high-level features extracted by DNNs. The feature type generated by supervised learning depends on training targets, which is more focused and task-relevant than auto-encoder methods 27 . Future researches might investigate how to utilize these features, and also incorporate them with co-occurrence or coverage information to build a powerful metagenome binning tool.…”
Section: Discussionmentioning
confidence: 99%
“…Anvi'o v5.1.0 [23] was also applied to bin contigs >5 kb using CONCOCT v1.0.0 proceeded by manual refinement with redundancy cut-offs of 2.5% for MAGs with 50-75% completeness, 5% for MAGs with 75-90 % completeness, and 10% for MAGs with >90 % completeness. In addition, Vamb v1.0.1 [24] and BinSanity v0.2.8 [25] were used to bin contigs with a minimum length cutoff of 4 kb. All bins generated by the six binners and the MetaWRAP refined bins were further refined using DAS_Tool v1.1.1 [26] with custom penalty parameters (--duplicate_penalty 0.4, --megabin_penalty 0.4) and a score threshold of 0.3.…”
Section: Dna Extraction Sequencing Assembly and Metagenome-assemblmentioning
confidence: 99%
“…For all samples, short genomic assemblies (< 1000 bp), which could have biased the subsequent analysis, were first excluded. Genomes were then binned based on their tetranucleotide frequency, differential coverage, GC content, as well as codon usage, by 6 different binning tools: MetaBAT2, MaxBin2, CONCOCT, VAMB, BMC3C, and BinSanity (34)(35)(36)(37)(38). The binning results were refined using the MetaWRAP (v 1.2.1) package based on the bin quality assessment (completeness > 70 and contamination < 20) of different binners from CheckM (39,40).…”
Section: Genome Assembly and Functional Annotationmentioning
confidence: 99%