Reducing Verification Complexity of a Multicore Coherence Protocol Using Assume/Guarantee

Chen, Xiaofang; Yang, Yi; Gopalakrishnan, Ganesh; Chou, Ching-Tsun

doi:10.1109/fmcad.2006.28

Cited by 20 publications

(30 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We applied the standard method for debugging cache coherence protocols: we built a formal model of our protocols and performed an exhaustive reachability analysis of the model for a small configuration size [28,25,8] using explicit-state model checking with the Murphi [11] model checker. We extended the DASH protocol model provided as part of the Murphi release, ran the resulting model through Murphi, and found that none of the invariants provided in the DASH model were violated by our changes.…”

Section: Verificationmentioning

confidence: 99%

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

Cheng

Carter

Dai

2007

2007 IEEE 13th International Symposium on High Performance Computer Architecture

View full text Add to dashboard Cite

show abstract

Section: Verificationmentioning

confidence: 99%

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

Cheng

Carter

Dai

2007

2007 IEEE 13th International Symposium on High Performance Computer Architecture

View full text Add to dashboard Cite

show abstract

“…This is because exclusive [4] or non-inclusive [5] caching both incur three-way communication (i.e. among data requester, directory, and owner) to locate shared data and they also introduce nontrivial design and verification complexity [6].…”

Section: Introductionmentioning

confidence: 99%

Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

Meng

Skadron

2009

2009 IEEE International Conference on Computer Design

View full text Add to dashboard Cite

Abstract-Without high-bandwidth broadcast, large numbers of cores require a scalable point-to-point interconnect and a directory protocol. In such cases, a shared, inclusive last level cache (LLC) can improve data sharing and avoid threeway communication for shared reads. However, if inclusion encompasses thread-private data, two problems arise with the shared LLC. First, current memory allocators align stack bases on page boundaries, which emerges as a source of severe conflict misses for large numbers of threads on data-parallel applications. Second, correctness does not require the private data to reside in the shared directory or the LLC. This paper advocates stack-base randomization that eliminates the major source of conflict misses for large numbers of threads. However, when capacity becomes a limitation for the directory or last-level cache, this is not sufficient. We then propose non-inclusive, semi-coherent cache organization (NISC) that removes the requirement for inclusion of private data and reduces capacity misses. Our data-parallel benchmarks show that these limitations prevent scaling beyond 8 cores, while our techniques allow scaling to at least 32 cores for most benchmarks. At 8 cores, stack randomization provides a mean speedup of 1.2X, but stack randomization with 32 cores gives a speedup of 2.7X over the best baseline configuration. Comparing to conventional performance with a 2 MB LLC, our technique achieves similar performance with a 256 KB LLC, suggesting LLCs may be typically overprovisioned. When very limited LLC resources are available, NISC can further improve system performance by 1.8X.

show abstract

“…In turn, Abs #j employs assumptions that are justified by verifying some number of Abs #i's (further details of such meta-circular dependencies are discussed in Section II-A). In [1], we show that (i) each Abs #i has far less states than the original hierarchical protocol, (ii) the additive complexity of verifying the Abs #i's in turn is also far less than the complexity of the original protocol. However, as will be seen from Figure 3, even one Abs #i involves the product state space of one entire unit (such as 'Home cluster' in Figure 3) and two simplified units (such as 'Remote clusters' in Figure 3).…”

Section: Introductionmentioning

confidence: 94%

“…This is bad news, considering that the intra cluster protocol state space itself will be very large, and that any product of the states of three intra cluster protocols and one inter cluster protocol ( Figure 1) would be unacceptably large. In our previous work [1], we presented a compositional approach for partly mitigating this problem. The workflow of the approach is shown in Figure 2.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical cache coherence protocol verification one level at a time through assume guarantee

Chen

Yang

DeLisi

et al. 2007

2007 IEEE International High Level Design Validation and Test Workshop

View full text Add to dashboard Cite

Abstract-Due to the error-prone nature of modern cache coherence protocols, in all modern processor design flows these protocols are formally specified at the level of interleaving atomic transactions and model checked. Explicit state enumeration methods are almost always used for coherence protocol verification, as symbolic methods have failed to deliver advantages in this area. The move towards multicores implies that hierarchical organizations of several different cache coherence protocols will be employed in future. The product state space of all these protocols jointly operating in a multicore cache hierarchy is beyond the reach of all available explicit state model checkers. In [1], an assume guarantee technique that allowed these protocols to be handled for the first time was reported. In this approach, a method was proposed to create a set of initial abstract protocols {Abs #i} where each Abs #i simulates the given hierarchical protocol. Since the various Abs #i depend on each other, verification consists of dealing with the set {Abs #i} in an assume guarantee manner, refining Abs #i in the process. The drawbacks of [1] were: (i) even one single Abs #i modeled more than one cluster; in particular, portions of other clusters and directory structures were also modeled, thus still creating very large product spaces, (ii) details such as non-inclusive caching hierarchies could not be handled. This paper overcomes both these limitations, handling non-inclusive caching hierarchies, and bringing about a 95% reduction in the total state space encountered during any single explicit enumeration search, and requiring only a few such runs to finish verification.

show abstract

Reducing Verification Complexity of a Multicore Coherence Protocol Using Assume/Guarantee

Cited by 20 publications

References 9 publications

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

Hierarchical cache coherence protocol verification one level at a time through assume guarantee

Contact Info

Product

Resources

About