Steffen Rechner scite author profile

2016

A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k ≥ 32. Given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important. While existing k-mer counting tools suffer from excessive memory resource consumption or degrading performance for large k, Gerbil is able to efficiently support large k without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into k-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large k, we outperform state-of-the-art open source k-mer counting tools for large genome data sets.

Gerbil: a fast and memory-efficient k-mer counter with GPU-support

Erbert

2017

Algorithms Mol Biol

BackgroundA basic task in bioinformatics is the counting of k-mers in genome sequences. Existing k-mer counting tools are most often optimized for small k < 32 and suffer from excessive memory resource consumption or degrading performance for large k. However, given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important.ResultsWe present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k ≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the k-mers of each temporary file are counted via a hash table approach. In addition to its basic functionality, Gerbil can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that Gerbil is able to efficiently support both small and large k.ConclusionsWhile Gerbil’s performance is comparable to existing state-of-the-art open source k-mer counting tools for small k < 32, it vastly outperforms its competitors for large k, thereby enabling new applications which require large values of k.Electronic supplementary materialThe online version of this article (doi:10.1186/s13015-017-0097-9) contains supplementary material, which is available to authorized users.

Uniform sampling of bipartite graphs with degrees in prescribed intervals

Strowick

2017

We consider the problem of constructing a bipartite graph whose degrees lie in prescribed intervals. Necessary and sufficient conditions for the existence of such graphs are well-known. However, existing realization algorithms suffer from large running times. In this paper, we present a realization algorithm that constructs an appropriate bipartite graph G = (U, V, E) in O(|U | + |V | + |E|) time, which is asymptotically optimal. In addition, we show that our algorithm produces edge-minimal bipartite graphs and that it can easily be modified to construct edgemaximal graphs.

PANDA: a software tool for improved train dispatching with focus on passenger flows

Rückert

Lemnian

Blendinger³

et al. 2016

Public Transp

We introduce the decision support tool PANDA (Passenger Aware Novel Dispatching Assistance). Our web-based tool is designed to provide train dispatchers with detailed real-time information about the current passenger flow and the multidimensional impact of waiting decisions in case of train delays. After presenting the algorithmic background and PANDA's main features, we show how it can be utilized in a typical use case scenario for train dispatchers. Besides its practical value for train dispatchers, the framework can be used to systematically study scientific questions. Exemplarily, we use our software to experimentally analyse the influence of waiting decisions on realistic passenger flows of Deutsche Bahn. In a first experiment, we evaluate PANDA's potential benefit for passengers. Our findings indicate that a remarkable reduction in total delay might be possible in comparison to current practice. In two additional experiments, we investigate the timing aspect of waiting decisions. Our observations suggest that the timing of waiting decisions is of crucial importance and that a carefully implemented early rerouting strategy has a significant potential to reduce resulting delays of passengers.

Efficient Computation of Time-Dependent Centralities in Air Transportation Networks

Berger

et al. 2011