Fast iterative circuits and RAM-based mergers to accelerate data sort in software/hardware systems

Sklyarov, Valeri; Skliarova, Iouliia; Rjabov, Artjom; Sudnitson, Alexander

doi:10.3176/proc.2017.3.07

Cited by 5 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has been shown that although very appreciable results have already been achieved, with various authors proving the benefits of reconfigurable hardware implemen-tations compared to software solutions, more work is still required in particular in the scope of exploring high-level synthesis potential and reducing the communication bottleneck, which is currently the main limiting factor of the majority of the designs. Different proposals have been carried out to mitigate the bandwidth limitation, such as communication-time data processing [32] and a data compression mechanism [51].…”

Section: Discussionmentioning

confidence: 99%

“…Sklyarov et al [30][31][32] performed analysis of different sorting networks and concluded that even-odd transition networks are among the most regular and easily scalable. As the Table 1 confirms, even-odd transition networks are often characterized as considerably slower and more resource consuming comparing with even-odd merge and bitonic merge networks.…”

Section: Implementations Of Sorting Networkmentioning

confidence: 99%

“…So, trying to increase the number of sortable in parallel elements might be useless if there is no sufficient bandwidth to supply these elements to the sorter. To alleviate this problem, a communication-time sorter has been proposed in [32] that is based on the network from [30] permitting to find minimum and maximum values and enables data sorting to be completely overlapped in time with data transfers so that sorting is completed as soon as the last data item is received. Sorting subsets in an FPGA-based hardware accelerator and merging in software running on a hard processor in a Zynq PSoC has also been explored in [32].…”

Section: Implementations Of Sorting Networkmentioning

confidence: 99%

See 2 more Smart Citations

A Survey of Network-Based Hardware Accelerators

Skliarova

2022

Electronics

View full text Add to dashboard Cite

Many practical data-processing algorithms fail to execute efficiently on general-purpose CPUs (Central Processing Units) due to the sequential matter of their operations and memory bandwidth limitations. To achieve desired performance levels, reconfigurable (FPGA (Field-Programmable Gate Array)-based) hardware accelerators are frequently explored that permit the processing units’ architectures to be better adapted to the specific problem/algorithm requirements. In particular, network-based data-processing algorithms are very well suited to implementation in reconfigurable hardware because several data-independent operations can easily and naturally be executed in parallel over as many processing blocks as actually required and technically possible. GPUs (Graphics Processing Units) have also demonstrated good results in this area but they tend to use significantly more power than FPGA, which could be a limiting factor in embedded applications. Moreover, GPUs employ a Single Instruction, Multiple Threads (SIMT) execution model and are therefore optimized to SIMD (Single Instruction, Multiple Data) operations, while in FPGAs fully custom datapaths can be built, eliminating much of the control overhead. This review paper aims to analyze, compare, and discuss different approaches to implementing network-based hardware accelerators in FPGA and programmable SoC (Systems-on-Chip). The performed analysis and the derived recommendations would be useful to hardware designers of future network-based hardware accelerators.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Implementations Of Sorting Networkmentioning

confidence: 99%

Section: Implementations Of Sorting Networkmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Network-Based Hardware Accelerators

Skliarova

2022

Electronics

View full text Add to dashboard Cite

show abstract

“…Some examples of parallel data processing networks are sorting networks [1], searching networks [1], and counting networks [2]. It has been shown by various studies that parallel data networks are well suited for implementation in reconfigurable hardware, such as Field-Programmable Gate Arrays (FPGA) and Programmable Systems-on-Chip (PSoC) [3][4][5][6][7][8][9][10][11][12][13][14]. This is because many processing elements can easily be instantiated, synthesized, and implemented according to the required network structure, and modern FPGAs contain plenty of distributed storage elements that can be used for effective pipelining.…”

Section: Introductionmentioning

confidence: 99%

Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm

Skliarova

2022

JLPEA

View full text Add to dashboard Cite

It is well known that network-based parallel data processing algorithms are well suited to implementation in reconfigurable hardware recurring to either Field-Programmable Gate Arrays (FPGA) or Programmable Systems-on-Chip (PSoC). The intrinsic parallelism of these devices makes it possible to execute several data-independent network operations in parallel. However, the approaches to designing the respective systems vary significantly with the experience and background of the engineer in charge. In this paper, we analyze and compare the pros and cons of using an embedded processor, high-level synthesis methods, and register-transfer low-level design in terms of design effort, performance, and power consumption for implementing a parallel algorithm to find the two smallest values in a dataset. This problem is easy to formulate, has a number of practical applications (for instance, in low-density parity check decoders), and is very well suited to parallel implementation based on comparator networks.

show abstract

Architectures of FPGA-Based Hardware Accelerators and Design Techniques

Skliarova

Sklyarov

2019

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

Fast iterative circuits and RAM-based mergers to accelerate data sort in software/hardware systems

Cited by 5 publications

References 0 publications

A Survey of Network-Based Hardware Accelerators

A Survey of Network-Based Hardware Accelerators

Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm

Architectures of FPGA-Based Hardware Accelerators and Design Techniques

Contact Info

Product

Resources

About