An Iteration Aware Multidimensional Data Distribution Prototype for Computing Clusters

Yan, Bin; Rhodes, Philip J.

doi:10.1109/clustr.2006.311863

Cited by 3 publications

(5 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It does not work well for cases where the gaps between the requests are large, which can be remedied when combined with collective I/O such as in [30]. Granite can effectively take advantage of data sieving while doing gapped spatial query and aggregate I/O when applied in cluster computing [31].…”

Section: Spatial Data and Non-contiguous I/omentioning

confidence: 98%

“…By using a cache iterator, the iteration aware spatial data distribution system [31] reduces both disk and network latency by transforming a large number of small requests into a small number of large requests that fill an n-dimensional collective cache block on the cluster head node. The job iterator is responsible for the job extraction out of the cache and job distribution to compute nodes for data parallelization.…”

Section: Pattern Convertermentioning

confidence: 99%

“…An incomplete access pattern must eventually be turned into a complete one through a pattern con- 1 We used term indefinite access pattern in [31].…”

Section: Pattern Convertermentioning

confidence: 99%

“…Counting the number of 1-bits in B e yields the dimensionality of a dependency pattern. The work described in [31] applied to 1D dependencies. Our current work concentrates on 2D dependencies, and also applies to the 3D case.…”

Section: Dependency Descriptormentioning

confidence: 99%

“…Rather than transfer entire datasets on multiple disks via FedEx, a scientist should be able to visualize or otherwise process a subset of interest via remote access. In addition to determining an effective partitioning strategy for an application, our system aggregates cluster I/O requests to minimize the effect of disk and network latency on performance [31].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Toward automatic parallelization of spatial computation for computing clusters

Yan

Rhodes

2008

Proceedings of the 17th International Symposium on High Performance Distributed Computing

Self Cite

View full text Add to dashboard Cite

High performance parallel computing infrastructures, such as computing clusters, have recently become freely available for scientific researchers to solve problems of unprecedented scale through data parallelization. However scientists are not necessarily skilled in writing efficient parallel code, especially when dealing with spatial datasets. Two important performance issues involved are the heavy I/O costs and the communication overhead. To address this issue, we are developing an scheme that helps scientists realize I/O friendly and scalable data parallelization for spatial computation.Built upon our iteration aware spatial prefetching and caching techniques, this data parallelization scheme takes an explicit specification of data dependency, identifies the best feasible access patterns while applying some I/O efficiency rules and then wraps them in separate spatial data iterators for efficient cache loading and data partitioning respectively. This scheme prioritizes but reconciles the I/O costs in the different stages of a data intensive cluster application to achieve the overall best I/O performance while maintaining fair computational scalability.

show abstract

Section: Spatial Data and Non-contiguous I/omentioning

confidence: 98%

Section: Pattern Convertermentioning

confidence: 99%

“…An incomplete access pattern must eventually be turned into a complete one through a pattern con- 1 We used term indefinite access pattern in [31].…”

Section: Pattern Convertermentioning

confidence: 99%

Section: Dependency Descriptormentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations