Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and runtime speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications.In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application that performs agglomerative clustering. By studying the algorithms and data structures used in these applications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncovered by compiler analysis. In principle, optimistic techniques such as thread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish this because they do not use the proper abstractions for the data structures in these programs.These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2) assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation.We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelism in irregular programs.
Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and runtime speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications.In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application that performs agglomerative clustering. By studying the algorithms and data structures used in these applications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncovered by compiler analysis. In principle, optimistic techniques such as thread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish this because they do not use the proper abstractions for the data structures in these programs.These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2) assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation.We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelism in irregular programs.
Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and runtime speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications.In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application that performs agglomerative clustering. By studying the algorithms and data structures used in these applications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncovered by compiler analysis. In principle, optimistic techniques such as thread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish this because they do not use the proper abstractions for the data structures in these programs.These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2) assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation.We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelism in irregular programs.
Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on elements of worklists. In some irregular applications, the computations on different elements are independent. In other applications, there may be complex patterns of dependences between these computations.The Galois system was designed to exploit this kind of irregular data parallelism on multicore processors. Its main features are (i) two kinds of set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Detection of conflicts and rolling back iterations requires information from class implementors.In this paper, we introduce mechanisms to improve the execution efficiency of Galois programs: data partitioning, data-centric work assignment, lock coarsening, and over-decomposition. These mechanisms can be used to exploit locality of reference, reduce mis-speculation, and lower synchronization overhead. We also argue that the design of the Galois system permits these mechanisms to be used with relatively little modification to the user code. Finally, we present experimental results that demonstrate the utility of these mechanisms.
Recent application studies have shown that many irregular applications have a generalized data parallelism that manifests itself as iterative computations over worklists of different kinds. In general, there are complex dependencies between iterations. These dependencies cannot be elucidated statically because they depend on the inputs to the program; thus, optimistic parallel execution is the only tractable approach to parallelizing these applications.We have built a system called Galois that supports this style of parallel execution. Its main features are (i) set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed.Our work builds on the Galois system, and it addresses the problem of scheduling iterations of set iterators on multiple cores. The policy used by the base Galois system is to assign an iteration to a core whenever it needs work to do, but we show in this paper that this policy is not optimal for many applications. We also argue that OpenMP-style DO-ALL loop scheduling directives such as chunked and guided self-scheduling are too simplistic for irregular programs. These difficulties led us to develop a general scheduling framework for irregular problems; OpenMP-style scheduling strategies are special cases of this general approach. We also provide hooks into our framework, allowing the programmer to leverage application knowledge to further tune a schedule for a particular application.To evaluate this framework, we implemented it as an extension of the Galois system. We then tested the system using five realworld, irregular, data-parallel applications. Our results show that (i) the optimal scheduling policy can be different for different ap- *
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.