Next-Generation Sequencing technologies generate a vast and exponentially increasing amount of sequence data. The Interleaved Bloom Filter (IBF) is a novel indexing data structure which is stateof-the-art for distributing approximate queries with an in-memory data structure. With it, a main task of sequence analysis pipelines, (approximately) searching large reference data sets for sequencing reads or short sequence patterns like genes, can be significantly accelerated. To meet performance and energy-efficiency requirements, we chose a co-design approach of the IBF data structure on the FPGA platform. Further, our OpenCL-based implementation allows a seamless integration into the widely used SeqAn C++ library for biological sequence analysis. Our algorithmic design and optimization strategy takes advantage of FPGA-specific features like shift register and the parallelization potential of many bitwise operations. We designed a well-chosen schema to partition data across the different memory domains on the FPGA platform using the Shared Virtual Memory concept. We can demonstrate significant improvements in energy efficiency of up to 19 Ă and in performance of up to 5.6 Ă, respectively, compared to a well-tuned, multithreaded CPU reference.
CCS CONCEPTSâą Computer systems organization â Reconfigurable computing; âą Applied computing â Bioinformatics; Computational genomics.