Andrew Stone scite author profile

Olschanowsky

et al. 2013

Abstract-There is a significant, established code base in the scientific computing community. Some of these codes have been parallelized already but are now encountering scalability issues due to poor data locality, inefficient data distributions, or load imbalance. In this work, we introduce a new abstraction called loop chaining in which a sequence of parallel and/or reduction loops that explicitly share data are grouped together into a chain. Once specified, a chain of loops can be viewed as a set of iterations under a partial ordering. This partial ordering is dictated by data dependencies that, as part of the abstraction, are exposed, thereby avoiding inter-procedural program analysis. Thus a loop chain is a partially ordered set of iterations that makes scheduling and determining data distributions across loops possible for a compiler and/or run-time system. The flexibility of being able to schedule across loops enables better management of the data locality and parallelism tradeoff. In this paper, we define the loop chaining concept and present three case studies using loop chains in scientific codes: the sparse matrix Jacobi benchmark, a domain-specific library, OP2, used in full applications with unstructured grids, and a domain-specific library, Chombo, used in full applications with structured grids. Preliminary results for the Jacobi benchmark show that a loop chain enabled optimization, full sparse tiling, results in a speedup of as much as 2.68x over a parallelized, blocked implementation on a multicore system with 40 cores.

May/must analysis and the DFAGen data-flow analysis generator

Information and Software Technology

Behere

2009

a b s t r a c tData-flow analysis is a common technique for gathering program information for use in program transformations such as register allocation, dead-code elimination, common subexpression elimination, and scheduling. Current tools for generating data-flow analysis implementations enable analysis details to be specified orthogonally to the iterative analysis algorithm but still require implementation details regarding the may and must use and definition sets that occur due to the effects of pointers, side effects, arrays, and user-defined structures. This paper presents the Data-Flow Analysis Generator tool (DFAGen), which enables analysis writers to generate analyses for separable and nonseparable data-flow analyses that are pointer, aggregate, and side-effect cognizant from a specification that assumes only scalars. By hiding the compiler-specific details behind predefined set definitions, the analysis specifications for the DFAGen tool are typically less than ten lines long and similar to those in standard compiler textbooks. The main contribution of this work is the automatic determination of when to use the may or must variant of a predefined set usage in the analysis specification.

Automatic Determination of May/Must Set Usage in Data-Flow Analysis

Behere

2008

Data-flow analysis is a common technique to gather program information for use in transformations such as register allocation, dead-code elimination, common subexpression elimination, scheduling, and others. Tools for generating data-flow analysis implementations remove the need for implementers to explicitly write code that iterates over statements in a program, but still require them to implement details regarding the effects of aliasing, side effects, arrays, and user-defined structures. This paper presents the DFAGen Tool, which generates implementations for locally separable (e.g. bit-vector) data-flow analyses that are pointer, side-effect, and aggregate cognizant from an analysis specification that assumes only scalars. Analysis specifications are typically seven lines long and similar to those in standard compiler textbooks. The main contribution of this work is the automatic determination of may and must set usage within automatically generated data-flow analysis implementations.

Scalable simulation of complex network routing policies

DiBenedetto

et al. 2010

Modern routing protocols for the internet implement complex policies that take more into account than just path length. However, current routing protocol simulators are limited to either working with hard-coded policies or working on small networks (1000 nodes or less). It is currently not possible to ask questions about how the routing tables will change on all of the autonomous systems (e.g., AT&T, Sprint, etc.) in the internet, given a change in the routing protocol. This paper presents a routing policy simulation framework that enables such simulations to be done on resources that are readily available to researchers, such as a small set of typical desktops. We base the policy simulation framework on the Routing Algebra Meta-Language (RAML), which is a formal framework for specifying routing policies. Our theoretical contributions include proving that the signatures and the meet operation induced by the preference operator in RAML define a semilattice and that routing policy simulation frameworks are analogous to dataflow analysis frameworks.The main problem we address is that direct implementation of routing policy simulation has scaling issues due to the O(n 2 ) memory requirements for routing tables. However, due to properties of routing algebras specified in RAML, we are able to segment the simulation problem into multiple runs that propagate route information for subsets of the network on each run. This strategy enables us to perform a simulation that does not exceed system memory on typical desktops and enables the 43 minute, parallel simulation of a real network topology (33k nodes) and an approximation of the common BGP protocol.

Establishing a Miniapp as a programmability proxy

Dennis

2012

Miniapps serve as test beds for prototyping and evaluating new algorithms , data structures, and programming models before incorporating such changes into larger applications. For the miniapp to accurately predict how a prototyped change would affect a larger application it is necessary that the miniapp be shown to serve as a proxy for that larger application. Although many benchmarks claim to proxy the performance for a set of large applications, little work has explored what criteria must be met for a benchmark to serve as a proxy for examining programmability. In this poster we describe criteria that can be used to establish that a miniapp serves as a performance and programmability proxy.