Kanak Mahadik scite author profile

et al. 2018

High-throughput next generation sequencers (NGS) can rapidly read billions of short DNA fragments, called reads, at low cost. Moreover, their throughput is increasing and cost is decreasing at rates much faster than the Moore's law. This demands commensurate acceleration for NGS secondary analysis that process the reads to identify variations between genomes. Conventional architectural improvements can at best improve performance at the rate of Moore's law even if the software tools efficiently utilize the underlying architecture. Unfortunately, most of the dozens of software products developed for this purpose fail to exploit the underlying architecture well. Therefore, to match the pace of development of the sequencers, we will need architecture that is more tailored for the computational requirements of NGS secondary analysis as well as software that uses the architecture optimally. To this end, in this work, we study the performance characteristics of NGS secondary analysis and investigate the suitability of modern Intel Xeon and Xeon Phi processors for the same. To keep the study manageable, we rely on recent studies that attribute a majority of the run-time to a few key kernels. We present detailed optimization efforts to accelerate these kernels on the latest Intel Xeon and Xeon Phi processors with the goal of extracting maximum performance. A comparison of our optimized implementations, along with published results on GPGPU implementations, shows that our * Kanak Mahadik was a research intern at Intel when she worked on this project.

Flexible resource allocation for reliable virtual cluster computing systems

Hacker

2011

Virtualization and cloud computing technologies now make it possible to create scalable and reliable virtual high performance computing clusters. Integrating these technologies, however, is complicated by fundamental and inherent differences in the way in which these systems allocate resources to computational tasks. Cloud computing systems immediately allocate available resources or deny requests. In contrast, parallel computing systems route all requests through a queue for future resource allocation. This divergence of allocation policies hinders efforts to implement efficient, responsive, and reliable virtual clusters.In this paper, we present a continuum of four scheduling polices along with an analytical resource prediction model for each policy to estimate the level of resources needed to operate an efficient, responsive, and reliable virtual cluster system. We show that it is possible to estimate the size of the virtual cluster system needed to provide a predictable grade of service for a realistic high performance computing workload and estimate the queue wait time for a partial or full resource allocation. Moreover, we show that it is possible to provide a reliable virtual cluster system using a limited pool of spare resources. The models and results we present are useful for cloud computing providers seeking to operate efficient and cost-effective virtual cluster systems.

Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization

Chaterji

Zhou

et al. 2014

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers

Wright

Kulkarni

et al. 2019

Sci Rep

Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular de Bruijn graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of k-values used in the construction of de Bruijn graphs (DBG). However, this process of sequentially iterating from small to large k-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct k-value in parallel. We develop an innovative mechanism to “patch” a higher k-valued graph with contigs generated from a lower k-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes simultaneously. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).

Fast distributed bandits for online recommendation systems

et al. 2020

Contextual bandit algorithms are commonly used in recommender systems, where content popularity can change rapidly. These algorithms continuously learn latent mappings between users and items, based on contexts associated with them both. Recent recommendation algorithms that learn clustering or social structures between users have exhibited higher recommendation accuracy. However, as the number of users and items in the environment increases, the time required to generate recommendations deteriorates significantly. As a result, these cannot be deployed in practice. The state-of-the-art distributed bandit algorithm-DCCB-relies on a peer-to-peer network to share information among distributed workers. However, this approach does not scale well with the increasing number of users. Furthermore, it suffers from slow discovery of clusters, resulting in accuracy degradation. To address the above issues, this paper proposes a novel distributed bandit-based algorithm called DistCLUB. This algorithm lazily creates clusters in a distributed manner, and dramatically reduces the network data sharing requirement, achieving high scalability. Additionally, DistCLUB finds clusters much faster, achieving better accuracy than the state-of-the-art algorithm. Evaluation over both real-world benchmarks and synthetic datasets shows that Dist-CLUB is on average 8.87x faster than DCCB, and achieves 14.5% higher normalized prediction performance.