2006
DOI: 10.1504/ijhpcn.2006.010635
|View full text |Cite
|
Sign up to set email alerts
|

NIC-based reduction algorithms for large-scale clusters

Abstract: Efficient algorithms for reduction operations across a group of processes are crucial for good performance in many large-scale, parallel scientific applications. While previous algorithms limit processing to the host CPU, we utilize the programmable processors and local memory available on modern cluster network interface cards (NICs) to explore a new dimension in the design of reduction algorithms. In this paper, we present the benefits and challenges, design issues and solutions, analytical models, and exper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 28 publications
0
11
0
Order By: Relevance
“…Past work has shown that performing computation in the NICs during reduction operations increases both scalability and consistency with speedups of up to 121% [20]. However, this has only been applied to double-precision variables because of the complex data structures of libraries with arbitrary precision and the limited programmable logic in terms of both area and clock frequency in modern NICs [20].…”
Section: B Applicability To Network Operationsmentioning
confidence: 99%
See 4 more Smart Citations
“…Past work has shown that performing computation in the NICs during reduction operations increases both scalability and consistency with speedups of up to 121% [20]. However, this has only been applied to double-precision variables because of the complex data structures of libraries with arbitrary precision and the limited programmable logic in terms of both area and clock frequency in modern NICs [20].…”
Section: B Applicability To Network Operationsmentioning
confidence: 99%
“…However, this has only been applied to double-precision variables because of the complex data structures of libraries with arbitrary precision and the limited programmable logic in terms of both area and clock frequency in modern NICs [20]. For instance, Elan3 in Quadrics QsNet provides a userprogrammable, multi-threaded, 32-bit, 100MHz RISC-based processor with 64 MB local SDRAM, originally targeted for communication protocol modifications [35].…”
Section: B Applicability To Network Operationsmentioning
confidence: 99%
See 3 more Smart Citations