A Survey of Network-Based Hardware Accelerators

Skliarova, Iouliia

doi:10.3390/electronics11071029

Cited by 10 publications

(10 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the communication overheads (related to supplying the input data to the circuit) will always limit the ideal theoretical throughput. To solve this problem, designs have been proposed that allow the processing to be overlapped in time with data transfers [15].…”

Section: Methodsmentioning

confidence: 99%

“…This is because many processing elements can easily be instantiated, synthesized, and implemented according to the required network structure, and modern FPGAs contain plenty of distributed storage elements that can be used for effective pipelining. The principal characteristics, benefits, and limitations of the different approaches to implementing network-based hardware accelerators in FPGA and PSoC are reviewed in [15]. As indicated in [15], the majority of the analyzed implementations recur to low-level hardware designs (usually in VHDL/Verilog or in a specially developed language whose specifications are later translated to standard HDL (Hardware Description Language) RTL (Register-Transfer Level) descriptions).…”

Section: Introductionmentioning

confidence: 99%

“…The principal characteristics, benefits, and limitations of the different approaches to implementing network-based hardware accelerators in FPGA and PSoC are reviewed in [15]. As indicated in [15], the majority of the analyzed implementations recur to low-level hardware designs (usually in VHDL/Verilog or in a specially developed language whose specifications are later translated to standard HDL (Hardware Description Language) RTL (Register-Transfer Level) descriptions). None of the respective authors realized any study or comparison of using different specification methods (ranging from high-level descriptions to low-level code) when implementing a particular data processing network.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm

Skliarova

2022

JLPEA

Self Cite

View full text Add to dashboard Cite

It is well known that network-based parallel data processing algorithms are well suited to implementation in reconfigurable hardware recurring to either Field-Programmable Gate Arrays (FPGA) or Programmable Systems-on-Chip (PSoC). The intrinsic parallelism of these devices makes it possible to execute several data-independent network operations in parallel. However, the approaches to designing the respective systems vary significantly with the experience and background of the engineer in charge. In this paper, we analyze and compare the pros and cons of using an embedded processor, high-level synthesis methods, and register-transfer low-level design in terms of design effort, performance, and power consumption for implementing a parallel algorithm to find the two smallest values in a dataset. This problem is easy to formulate, has a number of practical applications (for instance, in low-density parity check decoders), and is very well suited to parallel implementation based on comparator networks.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm

Skliarova

2022

JLPEA

Self Cite

View full text Add to dashboard Cite

show abstract

“…The S1 receives ECON-T data, unpacks and calibrates it, and routes and sorts TCs in energy into projective 2 ϕ vs. 42 R/z bins per 120 • sector. The sorting uses batcher odd-even sorting networks [10][11][12], where on-the-fly truncation reduces the total number of firmware comparators required. Modules sums are here partially summed into module towers, and time multiplexing [13] with a 18 bunch-crossing period is applied before sending the data to S2.…”

Section: The Dataflow Of Hgcal Trigger Primitivesmentioning

confidence: 99%

Cluster reconstruction in the HGCAL at the Level 1 trigger

Alves

2024

EPJ Web of Conf.

View full text Add to dashboard Cite

The CMS collaboration has chosen a novel High Granularity Calorimeter for the endcap regions as part of its planned upgrade for the High Luminosity LHC. The calorimeter will have fine segmentation in both the transverse and longitudinal directions, and its data will be part of the Level 1 trigger of the CMS experiment. The trigger has tight constraints on latency and rate, and will need to be implemented in hardware. The high granularity results in around six million readout channels in total, reduced to one million that are used at 40 MHz as part of the Level 1 trigger, presenting a significant challenge in terms of data manipulation and processing; the trigger data volumes will be an order of magnitude above those currently handled at CMS. In addition, the high luminosity will result in an average of 140 (or more) interactions per bunch crossing. This leads to a huge rate by background processes which must be efficiently rejected by the trigger algorithms. Furthermore, reconstruction of the particle clusters to be used for particle flow in events with high hit rates is also a complex computational problem for the trigger. The status of the cluster reconstruction algorithms developed to tackle these major challenges, as well as the associated trigger architecture, is presented. Methods developed to mitigate the known issue of cluster splitting are described, incuding an iterative algorithm which has no impact on firmware resources.

show abstract

“…The computation is executed until the array elements are sorted and in each iteration two phases occurs, Odd and Even Phases. In the odd phase, we perform a bubble sort on odd, and we perform a bubble sort on even indexed elements (11) . This process continues for the alternating index (odd-even pairs, even-odd pairs) until no swapping operation is performed and finally the array is in sorted form.…”

Section: Introductionmentioning

confidence: 99%

FPGA Design for Low Delay Comparison-free, Odd-even Merge Sorter

Preethi¹,

Mohan²,

Augustine³

et al. 2022

IJST

View full text Add to dashboard Cite

Background/Objective: Reduced Instruction Set Computer (RISC) is one of the most common types of architecture involved in microprocessor that has several blocks. There is a lot of scope that can be observed in optimizing these blocks involved in RISC resulting in better and effective microprocessors. Method: One of the sub-blocks that plays a prominent role in RISC architecture is sorter, and it can be achieved by modifying the sorting algorithm. Findings: A novel odd-even comparison-free sorting that assists in arranging N several data components in roughly N clock cycles is proposed here. N identical blocks are arranged in streamlined manner that are stacked using handful primary logic elements resulting in sorter computation. In the proposed framework, classification and categorization activities are executed in a channeled fashion. The entire design is amalgamated for numerous data sets from imitated indiscriminately generated data elements to all exceptional elements, to all the similar elements, and also from random to completely sorted data elements. It has been observed that, the algorithm appears impartial to the input ordering. Novelty: comparison-free unit was implemented on odd-even sorter. Synthesis results indicate that the proposed approach consumes reasonably low FPGA resource. The number of elements consider for sorting was N=8, this architecture takes per element sorting delay as approximately 2.1 to 4.4 ns (1 clock cycle).

show abstract

A Survey of Network-Based Hardware Accelerators

Cited by 10 publications

References 48 publications

Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm

Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm

Cluster reconstruction in the HGCAL at the Level 1 trigger

FPGA Design for Low Delay Comparison-free, Odd-even Merge Sorter

Contact Info

Product

Resources

About