Effective out-of-core parallel Delaunay mesh refinement using off-the-shelf software

Kot,; Chernikov,; Chrisochoides,

doi:10.1109/ipdps.2006.1639361

Cited by 5 publications

(3 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On SMP nodes, the OHPDR (listed as OHPDR2) performs slightly worse. This is the opposite of the results we have seen on another system [Kot et al 2006]. It is likely due to smaller cache (per core) and/or different implementations of MPI and OpenMP.…”

Section: Performance Evaluationcontrasting

confidence: 94%

Effective out-of-core parallel delaunay mesh refinement using off-the-shelf software

Kot

Chernikov

Chrisochoides

2011

ACM J. Exp. Algorithmics

View full text Add to dashboard Cite

We present three related out-of-core parallel mesh generation algorithms and their implementations for small size computational clusters. Computing out-of-core permits to solve larger problems than otherwise possible on the same hardware setup. Also, when using shared computing resources with high demand, a problem can take longer to compute in terms of wall-clock time when using an in-core algorithm on many nodes instead of using an out-of-core algorithm on few nodes. The difference is due to wait-in-queue delays that can grow exponentially to the number of requested nodes. In one specific case, using our best method and only 16 nodes it can take several times less wall-clock time to generate a 2 billion element mesh than to generate the same size mesh in-core with 121 nodes.Although our best out-of-core method exhibits unavoidable overheads (could be as low as 19% in some cases) over the corresponding in-core method (for mesh sizes that fit completely in-core), this is a modest and expected performance penalty. We evaluated our methods on traditional clusters of workstations as well as presented preliminary performance evaluation on emerging BlueWaters supercomputer.

show abstract

Section: Performance Evaluationcontrasting

confidence: 94%

Effective out-of-core parallel delaunay mesh refinement using off-the-shelf software

Kot

Chernikov

Chrisochoides

2011

ACM J. Exp. Algorithmics

View full text Add to dashboard Cite

show abstract

“…There are many implementations of parallel graph algorithms on a variety of architectures, including distributedmemory supercomputers [30], shared-memory supercomputers [1], and multicore machines [16]. DMR in particular has been extensively studied in sequential [6], multicore [3] and distributedmemory settings [15].…”

Section: Related Workmentioning

confidence: 99%

Auto-parallelization of data structure operations for GPUs

Nasre

2014

Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems

View full text Add to dashboard Cite

We present an auto-parallelization technique for generating GPU implementation of data-structure operations from a sequential specification. The technique partitions the data-structure operations into barrier-separated phases such that each phase executes only homogeneous operations. Homogeneity is dictated by the method type, which is derived from the specification. Two key aspects of our technique are: (i) it ensures linearizability of the data-structure, and (ii) it is capable of composing multiple data-structure operations with the guarantee of optimal barrier placement, which we formally prove. We illustrate the usefulness of our techniques by synthesizing efficient GPU implementations of practical graph algorithms like single-source shortest paths which uses a concurrent worklist, Delaunay mesh refinement that uses a worklist and a mesh, and a doubly linked-list supporting arbitrary insertion and deletion.

show abstract

“…In addition to the parallel Delaunay mesh generation methods that are directly related to this work, there are many more classes of parallel mesh generation methods: (1) octree/quadtree based methods [de Cougny et al 1994;Löhner and Cebral 1999;Rypl and Bittnar 2001], (2) edge-subdivision based methods [Jones and Plassmann 1994;Rivara et al 2006], and (3) a class of parallel out-of-core methods [Isenburg et al 2006;Kot et al 2006]. In [Isenburg et al 2006] the authors compute a billion triangle mesh on a laptop in 48 minutes.…”

mentioning

confidence: 99%

Algorithm 872

Chernikov

Chrisochoides

2008

ACM Trans. Math. Softw.

View full text Add to dashboard Cite

Delaunay refinement is a widely used method for the construction of guaranteed quality triangular and tetrahedral meshes. We present an algorithm and a software for the parallel constrained Delaunay mesh generation in two dimensions. Our approach is based on the decomposition of the original mesh generation problem into N smaller subproblems which are meshed in parallel. The parallel algorithm is asynchronous with small messages which can be aggregated and exhibits low communication costs. On a heterogeneous cluster of more than 100 processors our implementation can generate over one billion triangles in less than 3 minutes, while the single-node performance is comparable to that of the fastest to our knowledge sequential guaranteed quality Delaunay meshing library (the Triangle).

show abstract

Effective out-of-core parallel Delaunay mesh refinement using off-the-shelf software

Cited by 5 publications

References 12 publications

Effective out-of-core parallel delaunay mesh refinement using off-the-shelf software

Effective out-of-core parallel delaunay mesh refinement using off-the-shelf software

Auto-parallelization of data structure operations for GPUs

Algorithm 872

Contact Info

Product

Resources

About