Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel loop are distributed across different processors. Thus, partitioning of parallel loops is of key importance for high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of rectangular iteration spaces, the problem of partitioning of non-rectangular iteration spaces-e.g. triangular, trapezoidal iteration spaces-has not been given enough attention so far. In this paper, we present a geometric approach for partitioning N-dimensional non-rectangular iteration spaces for optimizing performance on parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented.
We propose a novel divide-and-conquer algorithm for the solution of the all-pair shortest-path problem for directed and dense graphs with no negative cycles. We propose R-Kleene, a compact and in-place recursive algorithm inspired by Kleene's algorithm. R-Kleene delivers a better performance than previous algorithms for randomly generated graphs represented by highly dense adjacency matrices, in which the matrix components can have any integer value. We show that R-Kleene, unchanged and without any machine tuning, yields consistently between 1 7 and 1 2 of the peak performance running on five very different uniprocessor systems.
Introduction.The all-pair shortest-paths problem (APSP) is a well-studied and basic problem in graph theory but it is also a crucial and real problem in large networks such as sensor networks, switch networks or complex targeting systems.Consider the scenario where many thousands of nodes are located across a large area and every node has a processor with little memory space and computational power. In this scenario the computation of APSP is neither feasible nor practical by a single node, nonetheless it is a key feature for efficient data routing and broadcasting. Despite the node-processor computational/memory limitations, a node in the network is able to determine the locations and distances of its neighbors rather easily. Such local information can be coded, sent on the network and collected by an observer node such as a satellite, a global router or a computer cluster. Then the observer node may construct the adjacency matrix, compute the solution and send the result back on the network where each node will store the necessary local information.Any network is naturally represented by a directed graph and we formalize APSP as follows. Given a graph G = (V, E) where V is a set of nodes and E is a set of directed edges, we label every node in the graph by an integer ι ∈ [0, n − 1] where n = |V | (n = |V | is the cardinality of the set V ), and an edge in E is defined by a unique ordered pair of integers (i, j) with i, j ∈ [0, n − 1]. In fact, we assume that there is at most one directed edge connecting two nodes and, therefore, the graph has
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.