2011 IEEE Symposium on Large Data Analysis and Visualization 2011
DOI: 10.1109/ldav.2011.6092324
|View full text |Cite
|
Sign up to set email alerts
|

Scalable parallel building blocks for custom data analysis

Abstract: We present a set of building blocks that provide scalable data movement capability to computational scientists and visualization researchers for writing their own parallel analysis. The set includes scalable tools for domain decomposition, process assignment, parallel I/O, global reduction, and local neighborhood communication-tasks that are common across many analysis applications. The global reduction is performed with a new algorithm, described in this paper, that efficiently merges blocks of analysis resul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 51 publications
(28 citation statements)
references
References 36 publications
0
28
0
Order By: Relevance
“…It is the job of the runtime to map those instructions to code running on a process and messages exchanged between processes. The starting point of our work is DIY1 [32,34], a C library that structures data into blocks. It expects the computation to be organized in a bulk-synchronous pattern, but does not enforce this structure through programming convention.…”
Section: Data Parallelism and Block-structured Abstractionsmentioning
confidence: 99%
See 2 more Smart Citations
“…It is the job of the runtime to map those instructions to code running on a process and messages exchanged between processes. The starting point of our work is DIY1 [32,34], a C library that structures data into blocks. It expects the computation to be organized in a bulk-synchronous pattern, but does not enforce this structure through programming convention.…”
Section: Data Parallelism and Block-structured Abstractionsmentioning
confidence: 99%
“…We compare the performance with DIY1 [32,34] (which only supports in-core blocks) and when possible, with equivalent collective functions in MPI itself (by assigning one block per MPI rank). We want to Figure 2: Cian mini-app merge-and swap-reduce using DIY2 compared with MPI using reduce and reducescatter, respectively.…”
Section: Benchmark Applicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…It also uses the DIY [23] data-parallel programming library for the multi-GPU extension, the HDF5 library for data I/O, and the Simple DirectMedia Layer (SDL) [24] library for OpenGL visualization. The programming language used within CUDA (CUDA-C) is an extension of the C programming language which allows one to implement GPU-based parallel functions, called kernels, which, when called, are executed n times in parallel by n different CUDA threads.…”
Section: Parallel Implementationmentioning
confidence: 99%
“…Given any of these patterns, spatially contiguous scan subregions can be defined such that the degree of overlap between adjacent scan points is preserved. Data partitioning is achieved using the DIY parallel programming library [23] that is written on top of MPI to facilitate communication between parallel processes. In DIY terminology, we assign a DIY block (not to be confused with CUDA blocks) to each GPU.…”
Section: Multi-gpu Algorithmmentioning
confidence: 99%