[1993] Proceedings Seventh International Parallel Processing Symposium
DOI: 10.1109/ipps.1993.262873
|View full text |Cite
|
Sign up to set email alerts
|

Global combine on mesh architectures with wormhole routing

Abstract: data elements per node. We also introduce a hybrid algorithm that is not asymptotically optimal, but in Several algorithms are discussed for implementing practice outperforms the others for wide ranges of n global combine (summation) oll distributed memory and p. In addition, we show that a different algocomputers using a two-dimensional mesh interconnect rithm, optimized for a hypercube, is often the fastest with wormhole routing. These include algorithms that method for meshes containing p = 2d nodes, if car… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
35
0

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 62 publications
(36 citation statements)
references
References 2 publications
1
35
0
Order By: Relevance
“…Our results also indicate that the performance of an implementation is influenced by the relationship among parameters of the parallel machine, as well as by the relationship of the parameters to the amount of data involved. This agrees with other research done on the implementation of communication operations [1,2,4,19].…”
Section: Validation Through Communication Operationssupporting
confidence: 92%
“…Our results also indicate that the performance of an implementation is influenced by the relationship among parameters of the parallel machine, as well as by the relationship of the parameters to the amount of data involved. This agrees with other research done on the implementation of communication operations [1,2,4,19].…”
Section: Validation Through Communication Operationssupporting
confidence: 92%
“…Ideas from our previous work on performing the global combine can be used to obtain an alternative tradeoff between the startup cost and the transfer cost [4]. We first present a simple algorithm for one-dimensional meshes, and then extend it to the two-dimensional case.…”
Section: Alternative Algorithm: Scatter-collectmentioning
confidence: 99%
“…Reduction collectives entail both communication (data transfer) and processing (data reduction operations), and therefore efficient implementations must consider the characteristics of the network, the processor, and the interactions between them. Over the years, many researchers have dedicated significant effort to derive optimal and scalable algorithms [1,2,3,4,5,8]. However, with respect to the underlying system characteristics, all of this work commonly assumed reduction processing must be performed by the host CPU.…”
Section: Introductionmentioning
confidence: 99%