2012
DOI: 10.1093/bib/bbs035
|View full text |Cite
|
Sign up to set email alerts
|

Ultrafast clustering algorithms for metagenomic sequence analysis

Abstract: The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing compu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
275
0
1

Year Published

2015
2015
2019
2019

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 428 publications
(276 citation statements)
references
References 81 publications
(106 reference statements)
0
275
0
1
Order By: Relevance
“…Sequencing data were denoised, low quality sequences were removed and the sequences were clustered into operational taxonomic units (OTUs) at 97% identity using the CD-HIT-OTU pipeline (12). Subsequently, taxonomy classification was analyzed using the Quantitative Insights into Microbial Ecology pipeline, as previously reported (13).…”
Section: Introductionmentioning
confidence: 99%
“…Sequencing data were denoised, low quality sequences were removed and the sequences were clustered into operational taxonomic units (OTUs) at 97% identity using the CD-HIT-OTU pipeline (12). Subsequently, taxonomy classification was analyzed using the Quantitative Insights into Microbial Ecology pipeline, as previously reported (13).…”
Section: Introductionmentioning
confidence: 99%
“…The filtered reads were denoised and clustered at 100% identity using the CD-HIT-OTU clustering program [20] . The remaining representative reads after removing the identified chimeric reads were clustered into operational taxonomic units (OTUs) using a greedy algorithm with a cutoff of >97% identity at the species level.…”
Section: Sequence Process and Analysismentioning
confidence: 99%
“…PANDA (Masella et al, 2012), which allows zero mismatches in the overlapping region, served to pair raw Illumina reads; thereafter, we removed any chimeric reads and assigned operational taxonomic units (OTUs) by pairwise similarity at a threshold of 97% using cd-hit-otu (Li et al, 2012). We used SINA 1.2 (Pruesse et al, 2012) against the SILVA database (v. 115) to classify unique OTUs and estimated the abundance of each OTU using in-house Python script.…”
Section: Dna Extraction Sequencing and Classification Of Otusmentioning
confidence: 99%