Benjamin Albrecht scite author profile

²

,

Bağcı

³

et al. 2018

BackgroundThere are numerous computational tools for taxonomic or functional analysis of microbiome samples, optimized to run on hundreds of millions of short, high quality sequencing reads. Programs such as MEGAN allow the user to interactively navigate these large datasets. Long read sequencing technologies continue to improve and produce increasing numbers of longer reads (of varying lengths in the range of 10k-1M bps, say), but of low quality. There is an increasing interest in using long reads in microbiome sequencing, and there is a need to adapt short read tools to long read datasets.MethodsWe describe a new LCA-based algorithm for taxonomic binning, and an interval-tree based algorithm for functional binning, that are explicitly designed for long reads and assembled contigs. We provide a new interactive tool for investigating the alignment of long reads against reference sequences. For taxonomic and functional binning, we propose to use LAST to compare long reads against the NCBI-nr protein reference database so as to obtain frame-shift aware alignments, and then to process the results using our new methods.ResultsAll presented methods are implemented in the open source edition of MEGAN, and we refer to this new extension as MEGAN-LR (MEGAN long read). We evaluate the LAST+MEGAN-LR approach in a simulation study, and on a number of mock community datasets consisting of Nanopore reads, PacBio reads and assembled PacBio reads. We also illustrate the practical application on a Nanopore dataset that we sequenced from an anammox bio-rector community.ReviewersThis article was reviewed by Nicola Segata together with Moreno Zolfo, Pete James Lockhart and Serghei Mangul.ConclusionThis work extends the applicability of the widely-used metagenomic analysis software MEGAN to long reads. Our study suggests that the presented LAST+MEGAN-LR pipeline is sufficiently fast and accurate.

Fast computation of minimum hybridization networks

¹

,

Scornavacca

²

,

Cenci

³

et al. 2011

MEGAN-LR: New algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs

Huson

¹

,

²

,

Bağcı

³

et al. 2017

Preprint

Background There are numerous computational tools for taxonomic or functional analysis of microbiome samples, optimized to run on hundreds of millions of short, high quality sequencing reads.Programs such as MEGAN allow the user to interactively navigate these large datasets. Long read sequencing technologies continue to improve and produce increasing numbers of longer reads (of varying lengths in the range of 10k-1M bps, say), but of low quality. There is an increasing interest in using long reads in microbiome sequencing and there is a need to adapt short read tools to long read datasets. MethodsWe describe a new LCA-based algorithm for taxonomic binning, and an interval-tree based algorithm for functional binning, that are explicitly designed for long reads and assembled contigs. We provide a new interactive tool for investigating the alignment of long reads against reference sequences.For taxonomic and functional binning, we propose to use LAST to compare long reads against the NCBInr protein reference database so as to obtain frame-shift aware alignments, and then to process the results using our new methods.Results All presented methods are implemented in the open source edition of MEGAN and we refer to this new extension as MEGAN-LR (MEGAN long read). We evaluate the LAST+MEGAN-LR approach * To whom correspondence should be addressed.1 in a simulation study, and on a number of mock community datasets consisting of Nanopore reads, PacBio reads and assembled PacBio reads. We also illustrate the practical application on a Nanopore dataset that we sequenced from an anammox bio-rector community. BackgroundThere are numerous computational tools for taxonomic or functional binning or profiling of microbiome samples, optimized to run on hundreds of millions of short, high quality sequencing reads [1,2,3,4].Alignment-based taxonomic binning is often performed using the naïve LCA algorithm [5], because it is fast, easy to interpret and easy to implement. Functional binning usually involves a best-hit strategy to assign reads to functional classes.Software or websites for analyzing microbiome shotgun sequencing samples usually provide some level of interactivity, such as MG-RAST [2]. The interactive microbiome analysis tool MEGAN, which was first used in 2006 [6], is one of the most feature-rich tools of this type. MEGAN is highly optimized to enable users to interactively explore large numbers of microbiome samples containing hundreds of millions of short reads.Illumina HiSeq and MiSeq sequencers allow researchers to generate sequencing data on a huge scale, so as to analyze many samples at a great sequencing depth [7,8,9]. A wide range of questions, in particular involving the presence or absence of particular organisms or genes in a sample, can be answered using such data. However, there are interesting problems that are not easily resolved using short reads. For example, the question whether two genes, which both are detected in the same microbiome sample, also occur together on the same genome, can often not be ...

A First Step Toward Computing All Hybridization Networks For Two Rooted Binary Phylogenetic Trees

Scornavacca

¹

,

Linz

²

,

Journal of Computational Biology

³

2012

Recently, considerable effort has been put into developing fast algorithms to reconstruct a rooted phylogenetic network that explains two rooted phylogenetic trees and has a minimum number of hybridization vertices. With the standard app1235roach to tackle this problem being combinatorial, the reconstructed network is rarely unique. From a biological point of view, it is therefore of importance to not only compute one network, but all possible networks. In this article, we make a first step toward approaching this goal by presenting the first algorithm--called ALLMAAFs--that calculates all maximum-acyclic-agreement forests for two rooted binary phylogenetic trees on the same set of taxa.

Computing all hybridization networks for multiple binary phylogenetic input trees

¹

2015

BackgroundThe computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks.ResultsThis work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available1 and runs on all three major operating systems.ConclusionOur simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0660-7) contains supplementary material, which is available to authorized users.