Naturally occurring populations of bacteria and archaea are vital to life on the earth and are of enormous practical significance in medicine, engineering and agriculture. However, the rules governing the formation of such communities are still poorly understood, and there is a need for a usable mathematical description of this process. Typically, microbial community structure is thought to be shaped mainly by deterministic factors such as competition and niche differentiation. Here we show, for a wide range of prokaryotic communities, that the relative abundance and frequency with which different taxa are observed in samples can be explained by a neutral community model (NCM). The NCM, which is a stochastic, birth-death immigration process, does not explicitly represent the deterministic factors and therefore cannot be a complete or literal description of community assembly. However, its success suggests that chance and immigration are important forces in shaping the patterns seen in prokaryotic communities.
We present an algorithm, PyroNoise, that clusters the flowgrams of 454 pyrosequencing reads using a distance measure that models sequencing noise. This infers the true sequences in a collection of amplicons. We pyrosequenced a known mixture of microbial 16S rDNA sequences extracted from a lake and found that without noise reduction the number of operational taxonomic units is overestimated but using PyroNoise it can be accurately calculated.
The absolute diversity of prokaryotes is widely held to be unknown and unknowable at any scale in any environment. However, it is not necessary to count every species in a community to estimate the number of different taxa therein. It is sufficient to estimate the area under the species abundance curve for that environment. Log-normal species abundance curves are thought to characterize communities, such as bacteria, which exhibit highly dynamic and random growth. Thus, we are able to show that the diversity of prokaryotic communities may be related to the ratio of two measurable variables: the total number of individuals in the community and the abundance of the most abundant members of that community. We assume that either the least abundant species has an abundance of 1 or Preston's canonical hypothesis is valid. Consequently, we can estimate the bacterial diversity on a small scale (oceans 160 per ml; soil 6,400 -38,000 per g; sewage works 70 per ml). We are also able to speculate about diversity at a larger scale, thus the entire bacterial diversity of the sea may be unlikely to exceed 2 ؋ 10 6 , while a ton of soil could contain 4 ؋ 10 6 different taxa. These are preliminary estimates that may change as we gain a greater understanding of the nature of prokaryotic species abundance curves. Nevertheless, it is evident that local and global prokaryotic diversity can be understood through species abundance curves and purely experimental approaches to solving this conundrum will be fruitless.
With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.