Modular high-throughput and low-latency sorting units for FPGAs in the Large Hadron Collider

Farmahini-Farahani, Amin; Gregerson, Anthony; Schulte, Michael; Compton, Katherine

doi:10.1109/sasp.2011.5941075

Cited by 14 publications

(12 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also present a generalized platformindependent methodology for designing high-performance pipelined partial sorting and max-set-selection units for which the width of the data to be sorted and the pipeline depth can easily be varied. This research is an extension of our previous work on FPGA-based sorting units in the Large Hadron Collider (LHC) [18]. The main contributions of this dissertation and [18] are:…”

Section: Introductionmentioning

confidence: 92%

“…This research is an extension of our previous work on FPGA-based sorting units in the Large Hadron Collider (LHC) [18]. The main contributions of this dissertation and [18] are:…”

Section: Introductionmentioning

confidence: 92%

“…The difference in the number of CAE blocks between bitonic and odd-even merge sorting units is 2 n−1 × (n − 2) + 1, which shows that the difference in the number of CAE blocks increases linearly with the number of inputs. For example, in the Large Hadron Collider [35], built by the European Organization for Nuclear Research (CERN), low-latency max-set-selection units identify important particle interactions that correspond to high-energy collisions [18,55]. In multimedia applications, partial sorting units speed up data sorting algorithms [17].…”

Section: Designing Large Sorting Networkmentioning

confidence: 99%

“…For instance, only 9 to 25 inputs need to be processed in certain filters [11,12], while the number of inputs can vary from 25 to 81 (or even higher) in certain image processing applications [45]. High-speed sorters on FPGAs in HEP applications deal with 128 to 256 data samples in 100 ns processing cycles [18,23]. Thousands of inputs are sorted in video [49] and database applications [20,31].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Modular Design of High-Throughput, Low-Latency Sorting Units

Farmahini-Farahani

Duwe

Schulte

et al. 2013

IEEE Trans. Comput.

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 92%

“…This research is an extension of our previous work on FPGA-based sorting units in the Large Hadron Collider (LHC) [18]. The main contributions of this dissertation and [18] are:…”

Section: Introductionmentioning

confidence: 92%

Section: Designing Large Sorting Networkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Modular Design of High-Throughput, Low-Latency Sorting Units

Farmahini-Farahani

Duwe

Schulte

et al. 2013

IEEE Trans. Comput.

View full text Add to dashboard Cite

“…In [24], they have presented a topKsorter. Basically, their architecture is an N -input bitonic sorter from which circuit elements unnecessary to find top K keys are removed.…”

Section: Introductionmentioning

confidence: 99%

Optimal Parallel Hardware K-Sorter and Top K-Sorter, with FPGA Implementations

Matsumoto

Nakano

Ito

2015

2015 14th International Symposium on Parallel and Distributed Computing

View full text Add to dashboard Cite

This paper presents a FIFO-based parallel merge sorter optimized for the latest FPGA. More specifically, we show a sorter that sorts K keys in latency K +log 2 K −1 using log 2 K comparators. It uses K M + log 2 K + log 2 M − 1 memory blocks with capacity M to implement FIFOs. It receives K keys one by one in every clock cycle and outputs the sorted sequence of them from K + log 2 K − 1 clock cycles after. Since K clock cycles are necessary to input all K keys, our sorter is almost optimal in terms of the latency. Also, since the total FIFO capacity is only K + M log 2 K + M log 2 M − M and at least K keys must be stored in the sorter, our sorter is also almost optimal in terms of the total FIFO capacity if M is small. This paper also presents topK-sorter, which outputs top K keys in N input keys for any large N . Our topK-sorter runs in latency N + log 2 K using log 2 K + 1 comparators. It uses memory blocks of size M and the total FIFO capacity is only 2K +M log 2 K +M log 2 M − 2M . Quite surprisingly, the total FIFO capacity is independent of N . Also, since the latency must be at least N , that of our topKsorter is almost optimal in terms of the latency. Finally, we have implemented our K-sorter and topK-sorter in a Xilinx Virtex-7 FPGA using built-in Distributed RAMs and Block RAMs. The implementation results show that our K-sorter reduces the used memory resources by half, and both K-sorter and topK-sorter are practical and efficient.14th International Symposium on Parallel and Distributed Computing 978-1-4673-7148-3/15 $31.00

show abstract