Accelerating string matching for bio-computing applications on multi-core CPUs

Herath, Damayanthi; Lakmali, C.; Ragel, Roshan

doi:10.1109/iciinfs.2012.6304784

Cited by 21 publications

(25 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For performance comparison, we use the absolute difference and the percent difference, absolute_d iff erence D jt EM t SA j percent_d iff erence D 100 absolute_d iff erence=t EM (7) where t EM indicates the best execution time determined using EM and t SA indicates the execution time of our algorithm with a system configuration suggested by the SA approach. Figure 8 depicts the execution time and the standard deviation of our DNA sequence analysis implementation using the system configuration suggested by the SA for various types of DNA sequences.…”

Section: Comparison Of Our Optimization Approach With the Emmentioning

confidence: 99%

See 1 more Smart Citation

Combinatorial optimization of DNA sequence analysis on heterogeneous systems

Memeti

Pllana

2016

Concurrency and Computation

View full text Add to dashboard Cite

Summary Analysis of DNA sequences is a data and computational intensive problem, and therefore, it requires suitable parallel computing resources and algorithms. In this paper, we describe our parallel algorithm for DNA sequence analysis that determines how many times a pattern appears in the DNA sequence. The algorithm is engineered for heterogeneous platforms that comprise a host with multi‐core processors and one or more many‐core devices. For combinatorial optimization, we use the simulated annealing algorithm. The optimization goal is to determine the number of threads, thread affinities, and DNA sequence fractions for host and device, such that the overall execution time of DNA sequence analysis is minimized. We evaluate our approach experimentally using real‐world DNA sequences of various organisms on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co‐processing device. By running only about 5% of possible experiments, our optimization method finds a near‐optimal system configuration for DNA sequence analysis that yields with average speedup of 1.6 × and 2 × compared with the host‐only and device‐only execution. Copyright © 2016 John Wiley & Sons, Ltd.

show abstract

Section: Comparison Of Our Optimization Approach With the Emmentioning

confidence: 99%

“…Herath et al . presented an implementation of the Aho–Corasick (AC) algorithm based on pattern partitioning. A prefix‐based input partitioning approach is presented by Drews et al .…”

Section: Introductionmentioning

confidence: 99%

Combinatorial optimization of DNA sequence analysis on heterogeneous systems

Memeti

Pllana

2016

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…Herath et al presented in [16] an implementation of the Aho-Corasick string matching algorithm using POSIX threads, which is based on the pattern partitioning approach. A replication of the Herath's study with the intention to improve the software implementation of the Aho-Corasick algorithm was conducted by Arudchutha et al [6].…”

Section: Related Workmentioning

confidence: 99%

“…The hardware based approaches (such as [7], [27]) are faster, but less flexible and more expensive, whereas software based acceleration techniques are flexible in terms of updating or adding new patterns [30]. Recently different software based DNA analysis techniques designed for multi-core systems have been proposed [6], [11], [14], [16], [19], [23].…”

Section: Introductionmentioning

confidence: 99%

Analyzing Large-Scale DNA Sequences on Multi-core Architectures

Memeti

Pllana

2015

2015 IEEE 18th International Conference on Computational Science and Engineering

View full text Add to dashboard Cite

Abstract-Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analyzing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speedups of up to 17.6×. Our approach is up to 3× faster than a patternbased parallel approach that uses the RE2 library.

show abstract

“…Many researches have been done utilizing both hardware and software to accelerate string matching in several areas: hardware supported approaches use FPGA [7], [12], GPU [3], [8], [9] and Cell/B.E. processor [5] and software based approaches use multiple processors [2], [4]. Among them, the software based acceleration techniques need only some modification in the software code or the architecture.…”

Section: Introductionmentioning

confidence: 99%

String matching with multicore CPUs: Performing better with the Aho-Corasick algorithm

Arudchutha

Nishanthy

Ragel

2013

2013 IEEE 8th International Conference on Industrial and Information Systems

View full text Add to dashboard Cite

-Multiple string matching is known as locating all the occurrences of a given number of patterns in an arbitrary string. It is used in bio-computing applications where the algorithms are commonly used for retrieval of information such as sequence analysis and gene/protein identification. Extremely large amount of data in the form of strings has to be processed in such biocomputing applications. Therefore, improving the performance of multiple string matching algorithms is always desirable. Multicore architectures are capable of providing better performance by parallelizing the multiple string matching algorithms. The Aho-Corasick algorithm is the one that is commonly used in exact multiple string matching algorithms. The focus of this paper is the acceleration of Aho-Corasick algorithm through a multicore CPU based software implementation. Through our implementation and evaluation of results, we prove that our method performs better compared to the state of the art.

show abstract

Accelerating string matching for bio-computing applications on multi-core CPUs

Cited by 21 publications

References 15 publications

Combinatorial optimization of DNA sequence analysis on heterogeneous systems

Combinatorial optimization of DNA sequence analysis on heterogeneous systems

Analyzing Large-Scale DNA Sequences on Multi-core Architectures

String matching with multicore CPUs: Performing better with the Aho-Corasick algorithm

Contact Info

Product

Resources

About