“…Two important characteristics of this abstract form of representing a computation are that (1) there is no specification of a particular order of execution of the op-erations: although the program executes the operations in a specific sequential order, the CDAG abstracts the schedule of operations by only specifying partial ordering constraints as edges in the graph; (2) there is no association of memory locations with the source operands or result of any operation. We use the notation of Bilardi & Peserico [5] to formally describe CDAGs. We begin with the model of CDAG used by Hong & Kung:…”
“…Several works followed Hong & Kung's work on I/O complexity in deriving lower bounds on data accesses [2,1,18,6,5,23,24,19,20,29,13,3,4,8,28,26]. Aggarwal et al provided several lower bounds for sorting algorithms [2].…”
Section: Related Workmentioning
confidence: 99%
“…More recently, Demmel et al have developed lower bounds as well as optimal algorithms for several linear algebra computations including QR and LU decomposition and all-pairs shortest paths problem [3,4,13,28]. Bilardi et al [6,5] develop the notion of access complexity and relate it to space complexity. Bilardi and Preparata [7] developed the notion of the closed-dichotomy size of a DAG G that is used to provide a lower bound on the data access complexity in those cases where recomputation is not allowed.…”
Section: Related Workmentioning
confidence: 99%
“…Valiant proposed a hierarchical computational model [29] that offers the possibility to reason in an arbitrarily complex parameterized memory hierarchy model. Unlike Hong & Kung's original model, several models have been proposed that do not allow recomputation of values (also referred to as "no repebbling") [3,4,5,27,19,23,24,26,9,18,20,21]. Savage [23] develops results for FFT using no repebbling.…”
Section: Related Workmentioning
confidence: 99%
“…Savage [23] develops results for FFT using no repebbling. Bilardi and Peserico [5] explore the possibility of coding a given algorithm so that it is efficiently portable across machines with different hierarchical memory systems, without the use of recomputation. Ballard et al [3,4] assume no recomputation is allowed in deriving lower bounds for linear algebra computations.…”
Abstract:Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer systems. The fundamental ratio of aggregate data movement bandwidth to the total computational power (also referred to the machine balance parameter ) in parallel computer systems is decreasing. It is therefore of considerable importance to characterize the inherent data movement requirements of parallel algorithms, so that the minimal architectural balance parameters required to support it on future systems can be well understood. In this paper, we develop an extension of the well-known red-blue pebble game to develop lower bounds on the data movement complexity for the parallel execution of computational directed acyclic graphs (CDAGs) on parallel systems. We model multi-node multi-core parallel systems, with the total physical memory distributed across the nodes (that are connected through some interconnection network) and in a multi-level shared cache hierarchy for processors within a node. We also develop new techniques for lower bound characterization of non-homogeneous CDAGs. We demonstrate the use of the methodology by analyzing the CDAGs of several numerical algorithms, to develop lower bounds on data movement for their parallel execution.
“…Two important characteristics of this abstract form of representing a computation are that (1) there is no specification of a particular order of execution of the op-erations: although the program executes the operations in a specific sequential order, the CDAG abstracts the schedule of operations by only specifying partial ordering constraints as edges in the graph; (2) there is no association of memory locations with the source operands or result of any operation. We use the notation of Bilardi & Peserico [5] to formally describe CDAGs. We begin with the model of CDAG used by Hong & Kung:…”
“…Several works followed Hong & Kung's work on I/O complexity in deriving lower bounds on data accesses [2,1,18,6,5,23,24,19,20,29,13,3,4,8,28,26]. Aggarwal et al provided several lower bounds for sorting algorithms [2].…”
Section: Related Workmentioning
confidence: 99%
“…More recently, Demmel et al have developed lower bounds as well as optimal algorithms for several linear algebra computations including QR and LU decomposition and all-pairs shortest paths problem [3,4,13,28]. Bilardi et al [6,5] develop the notion of access complexity and relate it to space complexity. Bilardi and Preparata [7] developed the notion of the closed-dichotomy size of a DAG G that is used to provide a lower bound on the data access complexity in those cases where recomputation is not allowed.…”
Section: Related Workmentioning
confidence: 99%
“…Valiant proposed a hierarchical computational model [29] that offers the possibility to reason in an arbitrarily complex parameterized memory hierarchy model. Unlike Hong & Kung's original model, several models have been proposed that do not allow recomputation of values (also referred to as "no repebbling") [3,4,5,27,19,23,24,26,9,18,20,21]. Savage [23] develops results for FFT using no repebbling.…”
Section: Related Workmentioning
confidence: 99%
“…Savage [23] develops results for FFT using no repebbling. Bilardi and Peserico [5] explore the possibility of coding a given algorithm so that it is efficiently portable across machines with different hierarchical memory systems, without the use of recomputation. Ballard et al [3,4] assume no recomputation is allowed in deriving lower bounds for linear algebra computations.…”
Abstract:Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer systems. The fundamental ratio of aggregate data movement bandwidth to the total computational power (also referred to the machine balance parameter ) in parallel computer systems is decreasing. It is therefore of considerable importance to characterize the inherent data movement requirements of parallel algorithms, so that the minimal architectural balance parameters required to support it on future systems can be well understood. In this paper, we develop an extension of the well-known red-blue pebble game to develop lower bounds on the data movement complexity for the parallel execution of computational directed acyclic graphs (CDAGs) on parallel systems. We model multi-node multi-core parallel systems, with the total physical memory distributed across the nodes (that are connected through some interconnection network) and in a multi-level shared cache hierarchy for processors within a node. We also develop new techniques for lower bound characterization of non-homogeneous CDAGs. We demonstrate the use of the methodology by analyzing the CDAGs of several numerical algorithms, to develop lower bounds on data movement for their parallel execution.
We prove an analogue of Brent's lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v'-processor configuration with v' ≤ v and the same overall memory size. For a wide class of computations the simulation exhibits optimal O (v/v') slowdown. The simulation strategy aims at translating communication locality into temporal locality. As an important special case (v' = 1), our simulation can be employed to obtain efficient hierarchy-conscious sequential algorithms from efficient fine-grained ones
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.