Tile optimization for area in FPGA based hardware acceleration of peptide identification

7th International Conference on Information and Automation for Sustainability

et al. 2014

Self Cite

The problem of inferring proteins from complex peptide cocktails (digestion products of biological samples) in shotgun proteomic workflow sets extreme demands on computational resources in respect of the required very high processing throughputs, rapid processing rates and reliability of results. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequence of amino acids due to the existence of splice variants and isoforms of that protein. Therefore, the problem of protein inference could be considered as one of identifying sequences of amino acids with some limited tolerance. In the current paper a model-based hardware acceleration of a structured and practical inference approach is developed and validated on a mass spectrometry experiment of realistic size. We have achieved 10 times maximum speed-up in the co-designed workflow compared to a similar software-only workflow run on the processor used for co-design.

Section: Methodsmentioning

confidence: 99%

A structured hardware software architecture for peptide based diagnosis of Baylisascaris Procyonis infection

Vidanagamachchi

Dewasurendra

7th International Conference on Information and Automation for Sustainability

et al. 2014

Self Cite

“…For this in-sillico digestion was performed first, with the developed tool allowing mentioned digestion rules. Then the peptides were arranged into a new order and a new categorisation was made according to our optimisation algorithm presented in [13]. Later, protein-peptide mapping was performed offline.…”

Section: Experimental Set Upmentioning

confidence: 99%

A structured hardware software architecture for peptide based diagnosis — Sub-string matching problem with limited tolerance

Vidanagamachchi

Dewasurendra

7th International Conference on Information and Automation for Sustainability

et al. 2014

The problem of inferring proteins from complex peptide samples in shotgun proteomic workflow sets extreme demands on computational resources in respect of the required very high processing throughputs, rapid processing rates and reliability of results. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequence of amino acids due to the existence of splice variants and isoforms of that protein. Therefore, the problem of protein inference could be considered as one of identifying sequences of amino acids with some limited tolerance. Two problems arise from this: a) due to these (permitted) variations, the applicability of exact string matching methodologies could be questioned and b) the difficulty of defining a reference (peptide/amino acid) sequence for a particular set of proteins that are functionally indistinguishable, but with some variation in features. This paper presents a modelbased hardware acceleration of a structured and practical inference approach that is developed and validated to solve the inference problem. Our approach starts from an examination of the known set of splice variants and isoforms of a target protein to identify the Greatest Common Stable Substring (GCSS) of amino acids and the Substrings Subjects to Limited Variation (SSLV) and their respective locations on the GCSS. Then we define and solve the Sub-string Matching Problem with Limited Tolerance (SMPLT) using the Bit-Split Aho-Corasick Algorithm with Limited Tolerance (BSACLT) that we define and automate. This approach is validated on identified peptides in a labelled and clustered data set from UNIPROT. A model-based hardware software co-design strategy is used to accelerate the computational workflow of above described protein inference problem. Identification of Baylisascaris Procyonis infection was used as an application instance that achieved up to 70 times speedup compared to a software only system. This workflow can be generalised to any inexact multiple pattern matching application by replacing the patterns in a clustered and distributed environment which permits a distance between member strings to account for permitted deviations such as substitutions, insertions and deletions.

“…Studies have been performed considering implementations of Aho-Corasick algorithm on FPGAs as well [6,18]. It is shown that GPUs achieves comparable or higher speedups than CBE-based platforms for computation-intensive applications.…”

Section: Related Workmentioning

confidence: 99%

Accelerating string matching for bio-computing applications on multi-core CPUs

Herath

Lakmali

2012 IEEE 7th International Conference on Industrial and Information Systems (ICIIS)

2012

Huge amount of data in the form of strings are being handled in bio-computing applications and searching algorithms are quite frequently used in them. Many methods utilizing on both software and hardware are being proposed to accelerate processing of such data. The typical hardware-based acceleration techniques either require special hardware such as generalpurpose graphics processing units (GPGPUs) or need building a new hardware such as an FPGA based design. On the other hard, software-based acceleration techniques are easier since they only require some changes in the software code or the software architecture. Typical software-based techniques make use of computers connected over a network, also known as a network grid to accelerate the processing. In this paper, we test the hypothesis that multi-core architectures should provide better performance in this kind of computation, but still it would depend on the algorithm selected as well as the programming model being utilized. We present the acceleration of a stringsearching algorithm on a multi-core CPU via a POSIX thread based implementation. Our implementation on an 8-core processor (that supports 16-threads) resulted in 9x throughput improvement compared to a single thread implementation.