A new class of efficient and flexible hardware accelerators for DNA local sequence alignment based on the widely used Smith-Waterman algorithm is proposed in this paper. This new class of accelerating structures exploits an innovative technique that tracks the origin coordinates of the best alignment to allow a significant reduction of the size of the dynamic programming matrix that needs to be recomputed during the subsequent traceback phase, providing a considerable reduction of the resulting time and memory requirements. The significant performance of the enhanced class of accelerators is attained by also providing support for an additional level of parallelism: the capability to concurrently align several query sequences with one or more reference sequences, according to the specific application requisites. Moreover, the accelerator class also includes specially designed processing elements that improve the resource usage when implemented in a Field Programmable Gate Array (FPGA), and easily provide several different configurations in an Application Specific Integrated Circuit (ASIC) implementation. Obtained results demonstrated that speedups as high as 278 can be obtained in ASIC accelerating structures. A FPGA-based prototyping platform, operating at a 40 times lower clock frequency and incorporating a complete alignment embedded system, still provides significant speedups as high as 27, compared with a pure software implementation. Copyright Figure 4. Enhanced architecture of processor element PE i .Therefore, considering that in most practical setups, there is a very significant number of shortread sequences that must be aligned, or there are several medium-sized query sequences to be aligned to different reference sequences, alternative arrangements of the available PEs are now proposed to maximally use the whole set of implemented PEs and perform several alignments at the same time. Hence, besides the typical single-stream operation mode, in which one query sequence is aligned to one reference sequence, the class of accelerator architectures that is now proposed possesses the capability to be easily reconfigured to operate in several multiple-stream modes: Single-Reference Multiple-Query (SRMQ) or Multiple-Reference Multiple-Query (MRMQ). This new feature significantly improves the actual performance of the array, because it allows a higher PE occupancy rate with the consequent increase on the achieved array throughput and leading to a greater speedup than would be achieved with just a single array.
Single-Reference Multiple-Query operation modeWhen the alignment of various short-read sequences with the same large reference sequence is considered, it is possible to optimize the performance of the proposed accelerator architecture by N. SEBASTIÃO, N. ROMA AND P. FLORES instance, it is possible to implement an accelerator on the basis of the SRMQ mode of operation by considering specific score and coordinates resolutions. FPGAs also allow the implementation of PE versions specifically optimized for a g...