A new method for detecting chimeras and other anomalies within 16S rRNA sequence records is presented. Using this method, we screened 1,399 sequences from 19 phyla, as defined by the Ribosomal Database Project, release 9, update 22, and found 5.0% to harbor substantial errors. Of these, 64.3% were obvious chimeras, 14.3% were unidentified sequencing errors, and 21.4% were highly degenerate. In all, 11 phyla contained obvious chimeras, accounting for 0.8 to 11% of the records for these phyla. Many chimeras (43.1%) were formed from parental sequences belonging to different phyla. While most comprised two fragments, 13.7% were composed of at least three fragments, often from three different sources. A separate analysis of the Bacteroidetes phylum (2,739 sequences) also revealed 5.8% records to be anomalous, of which 65.4% were apparently chimeric. Overall, we conclude that, as a conservative estimate, 1 in every 20 public database records is likely to be corrupt. Our results support concerns recently expressed over the quality of the public repositories. With 16S rRNA sequence data increasingly playing a dominant role in bacterial systematics and environmental biodiversity studies, it is vital that steps be taken to improve screening of sequences prior to submission. To this end, we have implemented our method as a program with a simple-to-use graphic user interface that is capable of running on a range of computer platforms. The program is called Pintail, is released under the terms of the GNU General Public License open source license, and is freely available from our website at http://www.cardiff.ac.uk/biosi/research/biosoft/.
Volume 64, no. 2, p. 795: The article byline should read as given above. p. 797 and 798, Tables 2 and 3, respectively: columns 3 and 4 (the sequence data) in each table should read as shown below.
The sub-seafloor biosphere is the largest prokaryotic habitat on Earth but also a habitat with the lowest metabolic rates. Modelled activity rates are very low, indicating that most prokaryotes may be inactive or have extraordinarily slow metabolism. Here we present results from two Pacific Ocean sites, margin and open ocean, both of which have deep, subsurface stimulation of prokaryotic processes associated with geochemical and/or sedimentary interfaces. At 90 m depth in the margin site, stimulation was such that prokaryote numbers were higher (about 13-fold) and activity rates higher than or similar to near-surface values. Analysis of high-molecular-mass DNA confirmed the presence of viable prokaryotes and showed changes in biodiversity with depth that were coupled to geochemistry, including a marked community change at the 90-m interface. At the open ocean site, increases in numbers of prokaryotes at depth were more restricted but also corresponded to increased activity; however, this time they were associated with repeating layers of diatom-rich sediments (about 9 Myr old). These results show that deep sedimentary prokaryotes can have high activity, have changing diversity associated with interfaces and are active over geological timescales.
A new computer program, called Mallard, is presented for screening entire 16S rRNA gene libraries of up to 1,000 sequences for chimeras and other artifacts. Written in the Java computer language and capable of running on all major operating systems, the program provides a novel graphical approach for visualizing phylogenetic relationships among 16S rRNA gene sequences. To illustrate its use, we analyzed most of the large libraries of cloned bacterial 16S rRNA gene sequences submitted to the public repository during 2005. Defining a large library as one containing 100 or more sequences of 1,200 bases or greater, we screened 25 of the 28 libraries and found that all but three contained substantial anomalies. Overall, 543 anomalous sequences were found. The average anomaly content per clone library was 9.0%, 4% higher than that previously estimated for the public repository overall. In addition, 90.8% of anomalies had characteristic chimeric patterns, a rise of 25.4% over that found previously. One library alone was found to contain 54 chimeras, representing 45.8% of its content. These figures far exceed previous estimates of artifacts within public repositories and further highlight the urgent need for all researchers to adequately screen their libraries prior to submission. Mallard is freely available from our website at http://www.cardiff.ac.uk/biosi/research/biosoft/.Recent papers (2, 14) have reported numerous corrupt 16S rRNA gene sequences within the public repositories (3,16,19), and it has been estimated that overall, 5% of records are likely to have substantial anomalies (2). While poor sequencing and errors during assembly have led to some of these reported errors, most anomalies have been chimeras-artificial sequences generated from two or more phylogenetically different DNA templates during PCR amplification (17, 22-24, 30, 31).Our previous study showed that chimeras and other anomalies are continuing to be generated and submitted without comment to the public repositories (2). The presence of such high numbers of substantial anomalies in the public domain has serious implications for future efforts to accurately estimate bacterial diversity, elucidate likely phylogenetic relationships, and form correct taxonomic identifications. Consequently, there is a requirement for effective computer programs to simplify the screening process.A number of useful, complementary approaches already exist, with Bellerophon (13) and CHIMERA_CHECK (19) being two noteworthy examples, and in our previous paper we described a new computer program, called Pintail, for screening individual sequences for errors (2). Now, we describe another program, Mallard, which develops the Pintail algorithm further so that whole libraries of 16S rRNA gene sequences can be screened simultaneously and quickly.We demonstrate the new program's ability to screen libraries of a range of sizes from different sources. Through a detailed analysis of submissions made to public repositories during 2005, we show that the problem of unrecognized anomal...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.