3575 1.1 Implementing Parallel Algorithms on Contemporary Hardware Communication mechanisms within concurrent computer systems are extremely hostile to optimizing compilers. Also vector machines have fundamental performance bottlenecks [33][35] and their sustained average performance is by several orders of magnitude lower, than their peak rate [15, 33], even when creative coding techniques help the compiler [34]. VLIW (Very Long Instruction Word) architectures [11, 7] are much more optimizer-friendly by lower level of parallelism (at instruction level) [4, 27, 14] and relatively good optimization results have been reported for systolizable algorithms [4], but only for algorithms with only locally regular data dependencies (systolic algorithms or systolizable algorithms). VLIW architectures still have substantial drawbacks. Also data flow machines are optimizerhostile, since indeterministic operation does not permit compile-time optimization. Data flow machines throughput is also affected by other drawbacks: several new kinds of bottlenecks have been introduced. Code causes an enormous addressing overhead and data accessing conflicts [13]. A higher degree of parallelism may be achieved by Application-specific Array Processors (ASAPs). Even ASAPs have substantial drawbacks: extensive I/O overhead is caused by scrambling and unscrambling of data streams, expensive design of special hardware is required. A more important drawback is, that only algorithms with locally regular data dependencies (systolic or systolizable algorithms, see [32] and others) are supported. This drawback also holds for parallel computer architectures for systolic Abstract. This paper introduces a novel (non-von Neumann) paradigm of parallel computation supporting a much more efficient implementation of parallel algorithms. Acceleration factors of up to more than 2000 have been obtained experimentally on the MoM architecture for a number of important applications-although using a hardware being more simple than that of a single RISC microprocessor. The machine organization and the most important hardware features of xputers are briefly introduced. The programming paradigm and its flexibility is illustrated by simple DSP and image processing examples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.