2006
DOI: 10.1016/j.mimet.2005.06.012
|View full text |Cite
|
Sign up to set email alerts
|

An ecoinformatics tool for microbial community studies: Supervised classification of Amplicon Length Heterogeneity (ALH) profiles of 16S rRNA

Abstract: Support vector machines (SVM) and K-nearest neighbors (KNN) are two computational machine learning tools that perform supervised classification. This paper presents a novel application of such supervised analytical tools for microbial community profiling and to distinguish patterning among ecosystems. Amplicon length heterogeneity (ALH) profiles from several hypervariable regions of 16S rRNA gene of eubacterial communities from Idaho agricultural soil samples and from Chesapeake Bay marsh sediments were separa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2007
2007
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(33 citation statements)
references
References 37 publications
0
33
0
Order By: Relevance
“…Due to the size and variability of the OTU table, we identified predictive OTUs using sparse supervised learning, which has been successfully applied to classification of microarray data (19) and 16S amplicon length heterogeneity profiles (20). We chose location of the facility as the class attribute to be predicted, because the community phylogenetic structure was observed to cluster most closely on the basis of facility location, and because facilities were also unique and consistent in terms of performance data (Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Due to the size and variability of the OTU table, we identified predictive OTUs using sparse supervised learning, which has been successfully applied to classification of microarray data (19) and 16S amplicon length heterogeneity profiles (20). We chose location of the facility as the class attribute to be predicted, because the community phylogenetic structure was observed to cluster most closely on the basis of facility location, and because facilities were also unique and consistent in terms of performance data (Fig.…”
Section: Resultsmentioning
confidence: 99%
“…In two recent studies by our group, LH-PCR was used to query which hypervariable domain or combination of 16S rRNA gene domains was the best molecular marker Yang et al, 2006). In the first study, data from Idaho natural sagebrush and irrigated moldboard plowed sites were used to compare univariate and multivariate analyses.…”
Section: Analysis Of Length Heterogeneity-polymerase Chain Reaction Pmentioning
confidence: 99%
“…The "pruned" data was then pasted into rows combining the data from V1, V3 and Vi + V2 regions (113). All data in each row represented PCR's from a single soil extraction.…”
Section: Sews-m (Salt/ethanolmentioning
confidence: 99%
“…All data in each row was normalized (individual peak fluorescence divided by total fluorescence of all peaks in row) so that each fragment length now had a relative abundance unit attached. At this point another "pruning" was performed removing any peak representing less than 1 % (0.01) of the total profile (113). The resulting microbial community profile is our final data output including fragment length in bp and relative abundance of each peak in the profile from each soil replicate.…”
Section: Sews-m (Salt/ethanolmentioning
confidence: 99%
See 1 more Smart Citation