2008
DOI: 10.1007/978-3-540-78839-3_3
|View full text |Cite
|
Sign up to set email alerts
|

CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads

Abstract: A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a D… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
107
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 117 publications
(107 citation statements)
references
References 39 publications
0
107
0
Order By: Relevance
“…As most of the extant databases are highly biased in their representation of true diversity, such methods fail to find homologs for reads derived from novel species. On the other hand, composition-based methods rely on the intrinsic features of the reads such as oligomer/word distributions [15,3,5,13,12], codon usage preference [1] and GC composition [2] to ascertain the origin of the reads. The underlying basis is that the distribution of words in a DNA is specific to each species and undergoes only slight variations along the genome.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…As most of the extant databases are highly biased in their representation of true diversity, such methods fail to find homologs for reads derived from novel species. On the other hand, composition-based methods rely on the intrinsic features of the reads such as oligomer/word distributions [15,3,5,13,12], codon usage preference [1] and GC composition [2] to ascertain the origin of the reads. The underlying basis is that the distribution of words in a DNA is specific to each species and undergoes only slight variations along the genome.…”
Section: Related Workmentioning
confidence: 99%
“…Most of the existing clustering methods are supervised and depend on the availability of reference data for training [15,3,19,5]. A metagenome may however, contain reads from unexplored phyla which cannot be labeled into one of the existing classes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…One category is the taxonomy-dependent methods (Brady and Salzberg, 2009;Haque et al, 2009;Huson et al, 2007;Krause et al, 2008;Matsen et al, 2010;McHardy et al, 2007;Meyer et al, 2008;Mohammed et al, 2011;Stark et al, 2010;Wood and Salzberg, 2014;Wu and Eisen, 2008), which compare reads with sequences in public databases or models inferred from public databases to group reads and determine which known species are present. The other category is the taxonomy-independent methods (Chatterji et al, 2008;Diaz et al, 2009;Wang et al, 2012Wang et al, , 2015Wu and Ye, 2011), which employ the difference of GC content (guanine-cytosine content), k-mer (a k base pairs long DNA segment) frequencies, etc., of different microbes in the same samples to bin reads.…”
mentioning
confidence: 99%
“…frequency of occurrence for kmers found in broad classes of organisms (Teeling et al 2004, Chatterji et al 2008, and McHardy et al 2007). This approach is more scalable then sequence alignment but lacks the ability to provide detailed discrimination of the sample contents.…”
mentioning
confidence: 99%