“…algorithm: ring, Rabenseifner, pre-reduced ring (PRR) and sorted linear tree (SLT); size: (of data vector) 128 K, 512 K, 1 M, 2 M, 4 M, 8 M of floats (4 bytes long); mode: (of process delay) one-late (where only one process is delayed by maxDelay) and rand-late (where all processes are delayed randomly up to maxDelay); maxDelay: (of processes arrival times) 0, 1, 5, 10, 50, 100, 500, 1000 ms; P : (number of processes/nodes) 4,6,8,10,12,16,20,24,28,32,36,40,44,48; N : (number of iterations) 64-256, depending on maxDelay (more for lower delay); Table 3 presents the results of the benchmark execution for 1 M of floats of reduced data, 1 Gbps Ethernet network, where only one process was delayed on 48 nodes in a cluster environment of Tryton [14] HPC computer. The results are presented as absolute values of average elapsed time:ē alg and speedup: s alg , in comparison with ring algorithm s alg =ē rinḡ e alg , where alg is the evaluated algorithm.…”