Memory access buffering in multiprocessors

Dubois, Michel; Scheurich, C.; Briggs, Fayé A.

doi:10.1145/285930.285991

Cited by 50 publications

(42 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, if the consistency model is relaxed, i.e. not all possible orderings between memory operations are enforced, propagation of unordered memory operations can be delayed until an order can be re-established through synchronization boundaries [15,25,37]. In other words, lazy coherence protocols exploit the fact that relaxed consistency models require memory to be consistent only at synchronization boundaries.…”

Section: Eager Versus Lazy Coherencementioning

confidence: 99%

“…However, it is possible to degrade performance for infrequently written but frequently read lines, suggested by our implementation of CC-shared-to-L2. Coherence for relaxed consistency: Dubois and Scheurich [15,37] first gave insight into reducing coherence overhead in relaxed consistency models, particularly that the requirement of "coherence on synchronization points" is sufficient. Instead of enforcing coherence at every write (also referred as the SWMR property [41]), recent works [7,12,17,21,28,35,42] enforce coherence at synchronization boundaries by self-invalidating shared data in private caches.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

TSO-CC: Consistency directed cache coherence for TSO

Elver

Nagarajan

2014

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

Traditional directory coherence protocols are designed for the strictest consistency model, sequential consistency (SC). When they are used for chip multiprocessors (CMPs) that support relaxed memory consistency models, such protocols turn out to be unnecessarily strict. Usually this comes at the cost of scalability (in terms of per core storage), which poses a problem with increasing number of cores in today's CMPs, most of which no longer are sequentially consistent.Because of the wide adoption of Total Store Order (TSO) and its variants in x86 and SPARC processors, and existing parallel programs written for these architectures, we propose TSO-CC, a cache coherence protocol for the TSO memory consistency model. TSO-CC does not track sharers, and instead relies on self-invalidation and detection of potential acquires using timestamps to satisfy the TSO memory consistency model lazily. Our results show that TSO-CC achieves average performance comparable to a MESI directory protocol, while TSO-CC's storage overhead per cache line scales logarithmically with increasing core count.

show abstract

Section: Eager Versus Lazy Coherencementioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

TSO-CC: Consistency directed cache coherence for TSO

Elver

Nagarajan

2014

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

show abstract

“…Memory consistency models describe the rules that guarantee memory accesses will be predictable. There are several memory consistency models that have been proposed, including sequential consistency (SC) [17], weak consistency (WC) [18], processor consistency (PC) [19], release consistency (RC) [15], entry consistency (EC) [6], scope consistency (ScC) [14].…”

Section: Related Workmentioning

confidence: 99%

Regional Consistency: Programmability and Performance for Non-cache-coherent Systems

Ramesh

Ribbens

Varadarajan

2013

2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications

View full text Add to dashboard Cite

Abstract-Parallel programmers face the often irreconcilable goals of programmability and performance. HPC systems use distributed memory for scalability, thereby sacrificing the programmability advantages of shared memory programming models. Furthermore, the rapid adoption of heterogeneous architectures, often with non-cache-coherent memory systems, has further increased the challenge of supporting shared memory programming models. Our primary objective is to define a memory consistency model that presents the familiar threadbased shared memory programming model, but allows good application performance on non-cache-coherent systems, including distributed memory clusters and accelerator-based systems. We propose regional consistency (RegC), a new consistency model that achieves this objective. Results on up to 256 processors for representative benchmarks demonstrate the potential of RegC in the context of our prototype distributed shared memory system.

show abstract

“…The Java shared-memory model is not the sequentially consistent model [13], which is commonly used for writing multi-threaded programs. Instead, Java employs a form of weak consistency [14], to allow for shared-data optimization opportunities. In the sequential consistency model, any update to a shared variable must be visible to all other threads.…”

Section: The Java Parallel Programming Modelmentioning

confidence: 99%

SPMD programming in Java

Hummel

Ngo

Srinivasan

1997

Concurrency: Pract. Exper.

View full text Add to dashboard Cite

SUMMARYWe consider the suitability of the Java concurrent constructs for writing high-performance SPMD code for parallel machines. More specifically, we investigate implementing a financial application in Java on a distributed-memory parallel machine. Despite the fact that Java was not expressly targeted to such applications and architectures per se, we conclude that efficient implementations are feasible. Finally, we propose a library of Java methods to facilitate SPMD programming. ©1997 by John Wiley & Sons, Ltd. MOTIVATIONAlthough Java was not specifically designed as a high-performance parallel-computing language, it does include concurrent objects (threads), and its widespread acceptance makes it an attractive candidate for writing portable computationally-intensive parallel applications. In particular, Java has become a popular choice for numerical financial codes, an example of which is arbitrage -detecting when the buying and selling of securities is temporarily profitable. These applications involve sophisticated modeling techniques such as successive over-relaxation (SOR) and Monte Carlo methods [1]. Other numerical financial applications include data mining (pattern discovery) and cryptography (secure transactions).In this paper, we use an SOR code for evaluating American options (see Figure 1)[1], to explore the suitability of using Java as a high-performance parallel-computing language. This work is being conducted in the context of a research effort to implement a Java runtime system (RTS) for the IBM POWERparallel System SP machine[2], which is designed to effectively scale to large numbers of processors. The RTS is being written in C with calls to MPI (message passing interface) [3] routines. Plans are to move to a Java plus MPI version when one becomes available.The typical programming idiom for highly parallel machines is called data-parallel or single-program multiple-data (SPMD), where the data provide the parallel dimension. Parallelism is conceptually specified as a loop whose iterates operate on elements of a, perhaps multidimensional, array. Data dependences between parallel-loop iterates lead to a producer-consumer type of sharing, wherein one iterate writes variables that are later read by another, or collective communication, wherein all iterates participate. The communication pattern between iterates is often very regular, for example a bidirectional flow of variables between consecutive iterates (as in the code in Figure 1). This paper explores the suitability of the Java concurrency constructs for writing SPMD programs. In particular, the paper:1. identifies the differences between the parallelism supported by Java and data parallelism

show abstract

Memory access buffering in multiprocessors

Cited by 50 publications

References 6 publications

TSO-CC: Consistency directed cache coherence for TSO

TSO-CC: Consistency directed cache coherence for TSO

Regional Consistency: Programmability and Performance for Non-cache-coherent Systems

SPMD programming in Java

Contact Info

Product

Resources

About