This paper summarizes the results of a theoretical and experimental study of a general technice for the implementation of recursive and nonrecursive signal flow graphs and other arithmetic algorithms on synchronous digital machines composed of many identical programmable processors. This technique can be characterized by two fundamental Jroperties. First, it uses the Skewed Single Instruction Multiple Data (SSIND) mode in which exactly the same program is executed on all the processors [1], and that program is exactly a single processor realization of the entire algorithm being implemented. Second, all the data precedence relations among the processors are automatically maintained by the inherent synchrony of the system. .]is often results in processor-optimum solutions which the use of 1 processors leads exactly to an 1 fold increase in the system throughput.In the final analyses, the techniques discussed here result in a procedure in which the algorithm is specified in some simple notation, such as a set of difference equations, and from this a completely parallel multiprocessor implementation for the algorithm is generated. The resulting Implementation is always either processor-optimum or tine-optimum in which the absolute throughput limit for the technique has been reached. In addition, for a large class of recursive signal flow graphs, the implementations are absolute-optimum in the sense that there is no other implementation for a particular signal flow graph and a particular constituent processor which ever leads to greater systems throughput. The techniques discussed here have been tested on a synchronous multiprocessor
THE SSIND MODEThe fundamental computational mode which is utilized in these implementations is the Skewed Single Instruction Multiple Data Mode. In this tnode, exactly the same instruction stream is executed on all processors, but with a fixed time skew maintained between the instruction execution times on the separate processors.The fundamental concept is illustrated by the simple example of 'rig. 1. In this example, the second order direct form filter of Fig. la is implemented by a single processor program as shown in ig. 2b. In this single processor realization, none of the delay elements are realized directly, but rather the output from each delay element becomes an input to the program and the input to each delay element becomes an output of the program. In the SSIMD realization, these delayed values are not computed by this processor, but are supplied from identical computations on other processors. Fig. 2 shows a diagram of a one processor, a two processor, and a five processor SSIMD realization for the signal flow graph of Fig. 1. In the single processor solution of Fig. 2a, all of the past values of r(n) are supplied by the same processor, and there is never an issue of data availability. In the two processor realization of Fig. 2b, alternate points are supplied by each processor, and the two processors must be skewed such that the data requirements of each is always met by the ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.