Some results in memory conflict analysis

Calahan, D. A.

doi:10.1145/76263.76351

Cited by 15 publications

(10 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, we found that a straightforward shared-memory variant of the BSP does not properly account for contention at the banks, since there is no way to account for the relative speed of memory banks and proces-I sors. On the other hand, previous models of multibank memory systems [4], [5], [9], [10], [11], [13], [15], [18], [28], [29], [47], [48], [49], [52] are highly detailed, and the studies have only considered either regular or random access patterns. In this paper, we are interested in modeling algorithms with irregular, but not necessarily random, access patterns without requiring a complicated model.…”

Section: Introductionmentioning

confidence: 99%

Accounting for memory bank contention and delay in high-bandwidth multiprocessors

Blelloch

Gibbons

Matias

et al. 1995

Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA '95

View full text Add to dashboard Cite

Abstract-For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several shared-memory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant's bulk-synchronous parallel (BSP) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)-BSP. We show experimentally that the (d, x)-BSP captures the impact of bank contention and delay on the CRAY C90 and J90 for irregular access patterns, without modeling machine-specific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the CRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)-BSP as a bridging model for emulating a very high-level abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMs on the (d, x)-BSP.

show abstract

Section: Introductionmentioning

confidence: 99%

Accounting for memory bank contention and delay in high-bandwidth multiprocessors

Blelloch

Gibbons

Matias

et al. 1995

Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA '95

View full text Add to dashboard Cite

show abstract

“…A heuristic delay mode1 with unit-stride vector load accesses is constructed from two arguments (see [5]). …”

Section: 2 Sequential Vector Load Accesses: a Heuristic Modelmentioning

confidence: 99%

“…After startup, a vector is affected only by other vector startups. Each of these (30) This function is similar to the delay given in [5], except for the presence of the (1 -u ) denominator term.…”

Section: 2 Sequential Vector Load Accesses: a Heuristic Modelmentioning

confidence: 99%

Access conflicts in multiprocessor memories queueing models and simulation studies

Bucher¹,

Calahan²

1990

Proceedings of the 4th International Conference on Supercomputing

View full text Add to dashboard Cite

show abstract

“…Some studies evaluate commercial systems [14,15,16,17], and some other works evaluate the global system composed of processors running real applications, with different classes of interconnection networks and memory organization [18,19].…”

Section: Introductionmentioning

confidence: 99%

Vector multiprocessors with arbitrated memory access

Peiron

Valero

Ayguadé

et al. 1995

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

The high latency of memory accesses is one of the factors that most contribute to reduce the performance of current vector supercomputers. The conflicts that can occur in the memory modules plus the collisions in the interconnection network in the case of multiprocessors make that the execution time of applications increases significantly. In this work we propose a memory access method that for both cases of vector uniprocessors and multiprocessors allows to perform stream accesses with the smallest possible latency in the majority of the cases. The basic idea is to arbitrate the memory access by defining the order in which the memory modules are visited. The stream elements are requested out of order. In addition, the access method also reduces the cost of the interconnection network.The high latency of the memory accesses is one of the main factors that reduces the performance of current vector supercomputers. In such systems, to achieve the required bandwidth, the memory is organized into a set of M = 2m independent modules that are accessed in parallel, The latency of each memory module is of T processor cycles.Conflicts occur between different accesses that visit the same memory module whenever these accesses are separated by a number of cycles that is less than the module latency.Moreover, in multiprocessor systems, collisions can occur also in the interconnection network. These two facts make it Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copyin is by permission of the Association of Computing Y Machinery. o copy otherwise, or to republish, requires a fee and/or specific permission. ISCA '95, Santa Margherita Ligure Italy 0 1995 ACM 0-89791 -698-0/95/0006 ...$3.50 243 difficult to perform accesses with low latency and to effectively use the available memory bandwidth. For the case of a single vector processor with one memory port and a matched memory system (M = T), :several storage schemes have been proposed to efficiently i~ccess streams with the most frequent strides. The basic scheme is interleaving [1], in which the module number is obtained from the m lowest bits of the address; this storage scheme allows a minimum-latency in-order access for streams of odd stride, but results in degraded performance fcjr even strides. other storage schemes, such as skewing [2] and linear transformations [3], also allow the conflict-free access for one family of strides, where the family x is defined as the set of strides S = (S.2X with G odd [4]. However, these latter schemes have the advantage that the degradation for families that are not conflict free can be reduced by the use of buffers [5].To increase the number of conflict-free families,proposals have been made in two directions: more modules are added to the memory system resulting in an unmatched memory system, or a block-interleaved storage sch...

show abstract

Some results in memory conflict analysis

Cited by 15 publications

References 4 publications

Accounting for memory bank contention and delay in high-bandwidth multiprocessors

Accounting for memory bank contention and delay in high-bandwidth multiprocessors

Access conflicts in multiprocessor memories queueing models and simulation studies

Vector multiprocessors with arbitrated memory access

Contact Info

Product

Resources

About