Paolo D'Alberto scite author profile

et al. 2005

Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel loop are distributed across different processors. Thus, partitioning of parallel loops is of key importance for high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of rectangular iteration spaces, the problem of partitioning of non-rectangular iteration spaces-e.g. triangular, trapezoidal iteration spaces-has not been given enough attention so far. In this paper, we present a geometric approach for partitioning N-dimensional non-rectangular iteration spaces for optimizing performance on parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented.

show abstract

On the Space and Access Complexity of Computation DAGs

Bilardi

Pietracaprina

2000

R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks

2007

Algorithmica

We propose a novel divide-and-conquer algorithm for the solution of the all-pair shortest-path problem for directed and dense graphs with no negative cycles. We propose R-Kleene, a compact and in-place recursive algorithm inspired by Kleene's algorithm. R-Kleene delivers a better performance than previous algorithms for randomly generated graphs represented by highly dense adjacency matrices, in which the matrix components can have any integer value. We show that R-Kleene, unchanged and without any machine tuning, yields consistently between 1 7 and 1 2 of the peak performance running on five very different uniprocessor systems. Introduction.The all-pair shortest-paths problem (APSP) is a well-studied and basic problem in graph theory but it is also a crucial and real problem in large networks such as sensor networks, switch networks or complex targeting systems.Consider the scenario where many thousands of nodes are located across a large area and every node has a processor with little memory space and computational power. In this scenario the computation of APSP is neither feasible nor practical by a single node, nonetheless it is a key feature for efficient data routing and broadcasting. Despite the node-processor computational/memory limitations, a node in the network is able to determine the locations and distances of its neighbors rather easily. Such local information can be coded, sent on the network and collected by an observer node such as a satellite, a global router or a computer cluster. Then the observer node may construct the adjacency matrix, compute the solution and send the result back on the network where each node will store the necessary local information.Any network is naturally represented by a directed graph and we formalize APSP as follows. Given a graph G = (V, E) where V is a set of nodes and E is a set of directed edges, we label every node in the graph by an integer ι ∈ [0, n − 1] where n = |V | (n = |V | is the cardinality of the set V ), and an edge in E is defined by a unique ordered pair of integers (i, j) with i, j ∈ [0, n − 1]. In fact, we assume that there is at most one directed edge connecting two nodes and, therefore, the graph has

show abstract

Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance

Bilardi¹,

2001

Static Analysis of Parameterized Loop Nests for Energy Efficient Use of Data Caches

Veidenbaum

et al. 2003