MG-RAST (http://metagenomics.anl.gov) is an open-submission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. The system currently hosts over 200 000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. To show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools.
The vast majority of microbes are unculturable and thus cannot be sequenced by means of traditional methods. High-throughput sequencing techniques like 454 or Solexa-Illumina make it possible to explore those microbes by studying whole natural microbial communities and analysing their biological diversity as well as the underlying metabolic pathways. Over the past few years, different methods have been developed for the taxonomic and functional characterization of metagenomic shotgun sequences. However, the taxonomic classification of metagenomic sequences from novel species without close homologue in the biological sequence databases poses a challenge due to the high number of wrong taxonomic predictions on lower taxonomic ranks. Here we present CARMA3, a new method for the taxonomic classification of assembled and unassembled metagenomic sequences that has been adapted to work with both BLAST and HMMER3 homology searches. We show that our method makes fewer wrong taxonomic predictions (at the same sensitivity) than other BLAST-based methods. CARMA3 is freely accessible via the web application WebCARMA from http://webcarma.cebitec.uni-bielefeld.de.
17The U.S. Department of Energy Systems Biology Knowledgebase (KBase) is an open-source 18 software and data platform designed to meet the grand challenge of systems biology-19 predicting and designing biological function from the biomolecular (small scale) to the ecological 20 (large scale). KBase is available for anyone to use, and enables researchers to collaboratively 21 generate, test, compare, and share hypotheses about biological functions; perform large-scale 22 analyses on scalable computing infrastructure; and combine experimental evidence and 23conclusions that lead to accurate models of plant and microbial physiology and community 24 dynamics. The KBase platform has (1) extensible analytical capabilities that currently include 25 genome assembly, annotation, ontology assignment, comparative genomics, transcriptomics, 26 and metabolic modeling; (2) a web-browser-based user interface that supports building, sharing, 27and publishing reproducible and well-annotated analyses with integrated data; (3) access to 28 extensive computational resources; and (4) a software development kit allowing the community 29to add functionality to the system. 30
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.