Deconvolute individual genomes from metagenome sequences through short read clustering

Li, Kexue; Lu, Yakang; Deng, Li; Wang, Lili; Shi, Lizhen; Wang, Zhong

doi:10.7717/peerj.8966

Cited by 7 publications

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LPA is capable of resolving genomes with shared reads and has near linear computational performance. SpaRC can be run at two different modes: "local mode" only cluster reads based on their overlap, while "global mode" further clusters the results from local mode based on multiple sample statistics (Li et al, 2020).…”

Section: The Hybrid-lpa Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

Hybrid Clustering of Long and Short-read for Improved Metagenome Assembly

Shi

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Next-generation sequencing has enabled metagenomics, the study of the genomes of microorganisms sampled directly from the environment without cultivation. We previously developed a proof-of-concept, scalable metagenome clustering algorithm based on Apache Spark to cluster sequence reads according to their species of origin. To overcome its under-clustering problem on short-read sequences, in this study we developed a new, two-step Label Propagation Algorithm (LPA) that first forms clusters of long reads and then recruits short reads to these clusters. Compared to alternative label propagation strategies, this hybrid clustering algorithm (hybrid-LPA) yields significantly larger read clusters without compromising cluster purity. We show that adding an extra clustering step before assembly leads to improved metagenome assemblies, predicting more complete genomes or gene clusters from a synthetic metagenome dataset and a real-world metagenome dataset, respectively. These results suggest that hybrid-LPA is a good alternative to current metagenome assembly practice by providing benefits in both scalability and accuracy on large metagenome datasets.Availability and implementationhttps://bitbucket.org/zhong_wang/hybridlpa/src/master/.Contactzhongwang@lbl.gov

show abstract

Section: The Hybrid-lpa Algorithmmentioning

confidence: 99%

“…We previously developed a scalable metagenome clustering tool called SpaRC (Shi et al, 2018;Li et al, 2020) based on Apache Spark. SpaRC can form pure and complete clusters with long-read sequencing technologies.…”

Section: Introductionmentioning

confidence: 99%