This paper outlines the basic structure of a dynamic dataflow architecture based on the argument-fetching dataflow principle(41. In particular, we present a scheme to exploit fine-grain parallelism in function invocation based on the argument-fetching principle. We extend the static architecture by associating a frame of consecutive memory space for each parallel function invocation, called a function overlay, and identify each invocation instance with the base address of its overlay. Furthermore, our scheme gains efficiency by making effective use of the power provided by the argument-fetching dataflow principle: the separation of the instruction scheduling mechanism and their execution. While no loop unravelling is allowed, the architecture inherits the power of dataflow software pipelining[5] from the static architecture. The proposed architecture will have a memory overlay manager separate from the pipelined execution unitto handle function applications and memory management. To verify our design, a set of standard benchmark programs were mapped onto the new architecture and executed on an experimental general-purpose dataflow architecture simulation testbed.
Dataflow software pipelining was proposed as a means of structuring fine-grain parallelism and has been studied mostly under an idealized dataflow architecture model with infinite resources [9]. In this paper, we investigate the effects of software pipelining under realistic architecture models with finite resources. Our target architecture is the McGill Dataflow Architecture which employs conventional pipelined techniques to achieve fast instruction execution, while exploiting fine-grain parallelism via a data-driven instruction scheduler. To achieve optimal execution efficiency, the compiled code must be able to make a balanced use of both the parallelism in the instruction execution unit and the finegrain synchronization power of the machine. A detailed analysis based on simulation results is presented, focusing on two key architectural factors -the finegrain synchronization capacity and the scheduling mechanism for enabling instructions.On one hand, our results provide experimental evidence that software pipelining is an effective method for exploiting fine-grain parallelism in loops. On the other, the experiments have also revealed the (somewhat pessimistic) fact that even a fully software pipelined code may not achieve good performance if the overhead for fine-grain synchronization exceeds the capacity of the machine.
Dataflow software pipelining was proposed as a means of structuring fine-grain parallelism and has been studied mostly under an idealized dataflow architecture model with infinite resources [7]. In this paper, we address some issues of software pipelining under a realistic architecture model with finite resources. A general framework for fine-grain code scheduling in pipelined machines is developed which simultaneously addresses both time and space efficiency issues for loops typically found in general-purpose scientific computations. This scheduling method exploits fine-grain parallelism through a loop optimization technique which limitedly balances 1 the program graph at compile time, while the instruction-level scheduling is done dynamically at runtime in a data-driven manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.