2011
DOI: 10.1007/978-3-642-23038-7_25
|View full text |Cite
|
Sign up to set email alerts
|

Separating Metagenomic Short Reads into Genomes via Clustering

Abstract: Background:The metagenomics approach allows the simultaneous sequencing of all genomes in an environmental sample. This results in high complexity datasets, where in addition to repeats and sequencing errors, the number of genomes and their abundance ratios are unknown. Recently developed next-generation sequencing (NGS) technologies significantly improve the sequencing efficiency and cost. On the other hand, they result in shorter reads, which makes the separation of reads from different species harder. Among… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…AbundanceBin ( Wu and Ye, 2011 ) groups reads based on Observation (A) but fails when the species in the sample have similar abundance. TOSS ( Tanaseichuk et al , 2011 ) bins reads based on Observations (A) and (B), and since TOSS relies on AbundanceBin to handle genomes with different abundances, it carries all the shortcomings of AbundanceBin. MetaCluster 4.0 ( Wang et al , 2012 ) has three phases: Phase 1 groups reads together based on Observation (B); Phase 2 derives the q -mer distribution of each group and Phase 3 merges the groups of reads based on Observation (C) by the well-known K -means clustering approach.…”
Section: Introductionmentioning
confidence: 99%
“…AbundanceBin ( Wu and Ye, 2011 ) groups reads based on Observation (A) but fails when the species in the sample have similar abundance. TOSS ( Tanaseichuk et al , 2011 ) bins reads based on Observations (A) and (B), and since TOSS relies on AbundanceBin to handle genomes with different abundances, it carries all the shortcomings of AbundanceBin. MetaCluster 4.0 ( Wang et al , 2012 ) has three phases: Phase 1 groups reads together based on Observation (B); Phase 2 derives the q -mer distribution of each group and Phase 3 merges the groups of reads based on Observation (C) by the well-known K -means clustering approach.…”
Section: Introductionmentioning
confidence: 99%
“…Sequence similarities are typically identified by comparing occurence patterns of relatively short DNA substrings of length l between the sequences [50,55]. Two broad scenarii can be used to assess l-mer-based similarities: abundance-based methods make use of relatively large l values (l ≥ 20) in order to ensure the uniqueness of most l-mers [50], while composition-based methods rely on smaller l values. Since DNA is a combination of four different types of nucleotides (A,T,G,C), there are at most 4 l l-mer combinations forming the feature vector.…”
Section: L-mer Frequency Calculationmentioning
confidence: 99%
“…Composite genomes can be amassed from metagenomic contigs by classifying (or 'binning') reads according to the abundance of related reads and lineage-specific signatures such as nucleotide content signatures (Tyson et al, 2004;Woyke et al, 2006;Dick et al, 2009;Hess et al, 2011;Luo et al, 2011;Tanaseichuk et al, 2011;Wang et al, 2012b;Fig. 1a).…”
Section: Introductionmentioning
confidence: 99%