NIC-based reduction algorithms for large-scale clusters

Petrini, Fabrizio; Moody, Adam; Fernández, J.F.; Frachtenberg, Eitan; Panda, Dhabaleswar K.

doi:10.1504/ijhpcn.2006.010635

Cited by 6 publications

(11 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Past work has shown that performing computation in the NICs during reduction operations increases both scalability and consistency with speedups of up to 121% [20]. However, this has only been applied to double-precision variables because of the complex data structures of libraries with arbitrary precision and the limited programmable logic in terms of both area and clock frequency in modern NICs [20].…”

Section: B Applicability To Network Operationsmentioning

confidence: 99%

“…However, this has only been applied to double-precision variables because of the complex data structures of libraries with arbitrary precision and the limited programmable logic in terms of both area and clock frequency in modern NICs [20]. For instance, Elan3 in Quadrics QsNet provides a userprogrammable, multi-threaded, 32-bit, 100MHz RISC-based processor with 64 MB local SDRAM, originally targeted for communication protocol modifications [35].…”

Section: B Applicability To Network Operationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…Further work evaluated conducting reduction operations exclusively in NICs and concluded that doing so increases efficiency, scalability and reduces execution time compared to performing the computations in processors [20]. This is made possible by using the programmable logic in modern NICs [20]. However, this work only applied this method for double-precision variables due to the complex data structures and computation demands of arbitrary-precision libraries and the limited processing power of programmable logic in NICs.…”

Section: Introductionmentioning

confidence: 99%

“…However, this work only applied this method for double-precision variables due to the complex data structures and computation demands of arbitrary-precision libraries and the limited processing power of programmable logic in NICs. Even with double-precision variables, the low computational power of programmable logic can pose a significant performance issue [20]. The alternative, adding complex dedicated hardware to NICs for increased or arbitrary precision computations, increases design risk and complexity.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Extending Summation Precision for Network Reduction Operations

Bailey

Shalf

2013

2013 25th International Symposium on Computer Architecture and High Performance Computing

View full text Add to dashboard Cite

Abstract-Double precision summation is at the core of numerous important algorithms such as Newton-Krylov methods and other operations involving inner products, but the effectiveness of summation is limited by the accumulation of rounding errors, which are an increasing problem with the scaling of modern HPC systems and data sets. To reduce the impact of precision loss, researchers have proposed increasedand arbitrary-precision libraries that provide reproducible error or even bounded error accumulation for large sums, but do not guarantee an exact result. Such libraries can also increase computation time significantly. We propose big integer (BigInt) expansions of double precision variables that enable arbitrarily large summations without error and provide exact and reproducible results. This is feasible with performance comparable to that of double-precision floating point summation, by the inclusion of simple and inexpensive logic into modern NICs to accelerate performance on large-scale systems.

show abstract

Section: B Applicability To Network Operationsmentioning

confidence: 99%

Section: B Applicability To Network Operationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Extending Summation Precision for Network Reduction Operations

Bailey

Shalf

2013

2013 25th International Symposium on Computer Architecture and High Performance Computing

View full text Add to dashboard Cite

show abstract

Extending Summation Precision for Network Reduction Operations

Bailey

Shalf

2014

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

Tuple Spaces in Hardware for Accelerated Implicit Routing

Baker

Tripp

2011

2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PHD Forum

View full text Add to dashboard Cite

Organizing and optimizing data objects on networks with support for data migration and failing nodes is a complicated problem to handle as systems expand to hundreds of thousands of nodes. The goal of this work is to demonstrate that high levels of speedup can be achieved by moving responsibility for finding, fetching, and staging data into an FPGA-based network interface. We present a system for implicit routing of data via FPGA-based network cards. In this system, data structures are requested by name, and the network cooperatively finds the data and returns the information to the requester. This is achieved through successive examination of hardware hash tables implemented in the individual FPGA network cards. By avoiding the complex network software stacks between nodes, the data is quickly transferred entirely through FPGA-FPGA interaction. The performance of this system is approximately 26x faster vs. the software network on a per-node basis. This is due to the improved speed of the hash tables, higher levels of network abstraction and lowered latency between the network nodes.

show abstract

NIC-based reduction algorithms for large-scale clusters

Cited by 6 publications

References 28 publications

Extending Summation Precision for Network Reduction Operations

Extending Summation Precision for Network Reduction Operations

Extending Summation Precision for Network Reduction Operations

Tuple Spaces in Hardware for Accelerated Implicit Routing

Contact Info

Product

Resources

About