The Cameron project has developed a language called single assignment C (SA-C), and a compiler for mapping image-based applications written in SA-C to field programmable gate arrays (FPGAs). The paper tests this technology by implementing several applications in SA-C and compiling them to an Annapolis Microsystems (AMS) WildStar board with a Xilinx XV2000E FPGA. The performance of these applications on the FPGA is compared to the performance of the same applications written in assembly code or C for an 800 MHz Pentium III. (Although no comparison across processors is perfect, these chips were the first of their respective classes fabricated at 0.18 microns, and are therefore of comparable ages.) We find that applications written in SA-C and compiled to FPGAs are between 8 and 800 times faster than the equivalent program run on the Pentium III.
The Cameron project has developed a language and compiler for mapping image-based applicarioris to field pro,qrammable gare arra,vs (FPGAs). This paper tests this technology on several applicariorrs and finds that FPGAs are between 8 arid 800 rimes faster thaii comparable Pentiurns for image based tusks.
Multithreadedor hybrid von Neumann/dataflow execution models have an advantage over the fine-grain dataflow model in that they significantly reduce the run time overhead incurred by matching.In thw paper, we look at two issues related to the evaluation of a coarse-grain dataflow model of execution.The first issue concerns the compilation into a coarsegrain code from a fine-grain one. In this study, the concept of coarse-grain code is captured by clusters which can be thought of se mini-dataflow graphs which execute strictly, deterministically and without blocking.We look at two bottom-up algorithms: the basic block and the dependence sets methods, to partition dataflow graphs into clusters. The second issue is the actual performance of the clusterbaaed execution se several architecture parameters are varied (e.g. number of processors, matching cost, network latency, etc.). From the extensive simulation data we evaluate (1) the potential speedup over the fine-grain execution and (2) the effects of the various architecture parameters on the coarse-grain execution time, allowing us to draw conclusions on their effectiveness.The results indicate that even with a simple bottom-up algorithm for generating clusters, cluster execution offers a good speedup over the fine-grain execution over a wide range of architectures.They also indicate that coarse-grain execution is scalable, tolerates network latency and high matching cost well; it can benefit from a higher output bandwidth of a processor and finally, a simple superscalar processor with the issue rate of two is sufficient to exploit the internal parallelism of a cluster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.