Ray Meta: scalable de novo metagenome assembly and profiling

Boisvert, Sébastien; Raymond, Frédéric; Godzaridis, Élénie; Laviolette, François; Corbeil, Jacques

doi:10.1186/gb-2012-13-12-r122

Cited by 541 publications

(417 citation statements)

References 51 publications

Supporting

Mentioning

416

Contrasting

Order By: Relevance

“…Metagenomes from rumen, human gut, and permafrost soil sequencing could be assembled only by discarding low-abundance sequences before assembly (2,4,5). Although many metagenome-specific assemblers have been developed recently for the assembly of low-complexity communities, they cannot work with the volume of reads necessary to achieve high coverage for extremely diverse environmental metagenomes (10)(11)(12).…”

mentioning

confidence: 99%

Tackling soil diversity with the assembly of large, complex metagenomes

Howe

Jansson

Malfatti

et al. 2014

Proc. Natl. Acad. Sci. U.S.A.

301

275

View full text Add to dashboard Cite

Significance Investigations of complex environments rely on large volumes of sequence data to adequately sample the genetic diversity of a microbial community. The assembly of short-read data into longer, more interpretable sequence currently is not possible for much of the research community because it requires specialized computational facilities. We present approaches that make de novo assembly of complex metagenomes more accessible. These approaches scale data size with community richness and subdivide the data into tractable subsets representing individual species. We applied these methods toward the assembly of two large soil metagenomes to identify important metagenomic references and show that considerably more data are needed to study the terrestrial microbiome comprehensively.

show abstract

mentioning

confidence: 99%

Tackling soil diversity with the assembly of large, complex metagenomes

Howe

Jansson

Malfatti

et al. 2014

Proc. Natl. Acad. Sci. U.S.A.

301

275

View full text Add to dashboard Cite

show abstract

“…The adapters were trimmed from the reads using Trimmomatic-0.30 (Bolger et al, 2014) quality checked with Sickle (Joshi and Fass, 2011), and assembled using Ray with default parameters, and 23 as the k-value (Boisvert et al, 2012). The sequence data have been submitted to the GenBank databases under accession No.…”

Section: Structural Proteinsmentioning

confidence: 99%

Polar freshwater cyanophage S-EIV1 represents a new widespread evolutionary lineage of phages

et al. 2015

View full text Add to dashboard Cite

Cyanobacteria are often the dominant phototrophs in polar freshwater communities; yet, the phages that infect them remain unknown. Here, we present a genomic and morphological characterization of cyanophage S-EIV1 that was isolated from freshwaters on Ellesmere Island (Nunavut, High Arctic Canada), and which infects the polar Synechococcus sp., strain PCCC-A2c. S-EIV1 represents a newly discovered evolutionary lineage of bacteriophages whose representatives are widespread in aquatic systems. Among the 130 predicted open reading frames (ORFs) there is no recognizable similarity to genes that encode structural proteins other than the large terminase subunit and a distant viral morphogenesis protein, indicating that the genes encoding the structural proteins of S-EIV1 are distinct from other viruses. As well, only 19 predicted coding sequences on the 79 178 bp circularly permuted genome have homology with genes encoding proteins of known function. Although S-EIV1 is divergent from other sequenced phage isolates, it shares synteny with phage genes captured on a fosmid from the deep-chlorophyll maximum in the Mediterranean Sea, as well as with an incision element in the genome of Anabaena variabilis (ATCC 29413). Sequence recruitment of metagenomic data indicates that S-EIV1-like viruses are cosmopolitan and abundant in a wide range of aquatic systems, suggesting they have an important ecological role.

show abstract

“…After constructing an A-Bruijn graph, one faces the problem of finding a path in this graph that corresponds to traversing the genome and then correcting errors in the sequence spelled by this path (this genomic path does not have to traverse all edges of the graph). Because the long reads are merely paths in the A-Bruijn graph, one can use the path extension paradigm (37)(38)(39) to derive the genomic path from these (shorter) read-paths. exSPAnder (38) is a module of the SPAdes assembler (24) that finds a genomic The histograms of the number of 15-mers with given frequencies for the ECOLI dataset from Escherichia coli.…”

Section: For Details)mentioning

confidence: 99%

“…Hence, the A-Bruijn graph can function as an oracle, from which one can efficiently identify the overlaps of a given read with all other reads by considering all possible overlaps at once. The genome is assembled by repeatedly applying this procedure and borrowing the path extension paradigm from short read assemblers (37)(38)(39).…”

Section: For Details)mentioning

confidence: 99%

Assembly of long error-prone reads using de Bruijn graphs

Lin

Yuan

Kolmogorov

et al. 2016

Proc. Natl. Acad. Sci. U.S.A.

298

220

View full text Add to dashboard Cite

The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.de Bruijn graph | genome assembly | single-molecule sequencing T he key challenge to the success of single-molecule sequencing (SMS) technologies lies in the development of algorithms for assembling genomes from long but inaccurate reads. The pioneer in long reads technologies, Pacific Biosciences, now produces accurate assemblies from long error-prone reads (1, 2). Goodwin et al. (3) and Loman et al. (4) demonstrated that high-quality assemblies can be obtained from even less-accurate Oxford Nanopore reads. Advances in assembly of long errorprone reads recently resulted in the accurate reconstructions of various genomes (5-10). However, as illustrated in Booher et al. (11), the problem of assembling long error-prone reads is far from being resolved even in the case of relatively small bacterial genomes.Previous studies of SMS assemblies were based on the overlaplayout-consensus (OLC) approach (12) or a similar string graph approach (13), which require an all-against-all comparison of reads (14) and remain computationally challenging (see refs. 15-17 for a discussion of the pros and cons of this approach). Moreover, there is an assumption that the de Bruijn graph approach, which has dominated genome assembly for the last decade, is inapplicable to long reads. This is a misunderstanding, because the de Bruijn graph approach, as well as its variation called the A-Bruijn graph approach, was developed to assemble rather long Sanger reads (18). There is also a misunderstanding that the de Bruijn graph approach can only assemble highly accurate reads and fails when assembling long error-prone reads. Although this is true for the original de Bruijn graph approach to assembly (15-17), the A-Bruijn graph approach was originally designed to assemble inaccurate reads as long as any similarities between reads can be reliably identified. Moreover, A-Bruijn graphs have proven to be useful even for assembling mass spectra, which represent highly inaccurate fingerprints of amino acid sequences of peptides (19,20). However, although A-Bruijn graphs have proven to be useful in assembling Sanger reads and mass spectra, the question of how to apply A-Bruijn graphs for assembling long error-prone reads remains open.de Bruijn graphs are a key algorithmic technique in genome assembly (15,(21)(22)(23)(24). In addition, de Bruijn graphs have been used for sequencing by hybridization (...

show abstract

Ray Meta: scalable de novo metagenome assembly and profiling

Cited by 541 publications

References 51 publications

Tackling soil diversity with the assembly of large, complex metagenomes

Tackling soil diversity with the assembly of large, complex metagenomes

Polar freshwater cyanophage S-EIV1 represents a new widespread evolutionary lineage of phages

Assembly of long error-prone reads using de Bruijn graphs

Contact Info

Product

Resources

About