Mathematizing C++ concurrency

Batty, Mark; Owens, Scott; Sarkar, Susmit; Sewell, Peter; Weber, Tjark

doi:10.1145/1926385.1926394

Cited by 270 publications

(333 citation statements)

References 20 publications

Supporting

Mentioning

329

Contrasting

Unclassified

Order By: Relevance

“…This increases the demands on compilers that implement these memory models, so it is important to check that our changes do not invalidate existing compilation schemes. To this end, we prove that all of the formalised C11 compilation schemes of which we are aware (namely, those for Power [8] and x86 [7] machines) remain sound after our changes, and we argue informally that our OpenCL changes preserve the soundness of the only formalised OpenCL compilation scheme (namely, that for AMD GPUs [41]). …”

Section: Main Contributionsmentioning

confidence: 74%

“…Correctness in any relaxed memory setting is notoriously evasive; indeed, the subtleties of relaxed memory have previously led to confirmed bugs in language specifications [7,10], deployed processors [1], compilers [27,40] and vendor-endorsed programming guides [3]. The importance of correctness in the context of C11 is well-known.…”

Section: Introductionmentioning

confidence: 99%

“…The C11 memory model has been formalised by several researchers, in varying degrees of completeness, and with varying degrees of fidelity to the standard [2,7,38]. These formalisation efforts have proved fruitful; they have, for instance, enabled the construction of simulators that automatically explore the allowed behaviours of small C11 programs (called litmus tests) [2,7,13,29], underpinned the design of program logics for specifying and verifying C11 programs [37,38], and they provide a firm foundation for ongoing debate about the design of the C11 memory model itself [10,39]. The OpenCL memory model (introduced in version 2.0 of the standard) has received comparatively little academic attention, with the notable exception of the work of Gaster et al [17], which we discuss further in §7.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Overhauling SC atomics in C11 and OpenCL

2016

Self Cite

View full text Add to dashboard Cite

Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A consequence of our simplification is that the SC operations in an execution no longer need to be totally ordered. This relaxation enables, for the first time, efficient and exhaustive simulation of litmus tests that use SC atomics. We extend our improved C11 model to obtain the first rigorous memory model formalisation for OpenCL (which extends C11 with support for heterogeneous many-core programming). In the OpenCL setting, we refine the SC axioms still further to give a sensible semantics to SC operations that employ a ‘memory scope’ to restrict their visibility to specific threads. Our overhaul requires slight strengthenings of both the C11 and the OpenCL memory models, causing some behaviours to become disallowed. We argue that these strengthenings are natural, and that all of the formalised C11 and OpenCL compilation schemes of which we are aware (Power and x86 CPUs for C11, AMD GPUs for OpenCL) remain valid in our revised models. Using the HERD memory model simulator, we show that our overhaul leads to an exponential improvement in simulation time for C11 litmus tests compared with the original model, making *exhaustive* simulation competitive, time-wise, with the *non-exhaustive* CDSChecker tool.

show abstract

Section: Main Contributionsmentioning

confidence: 74%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Overhauling SC atomics in C11 and OpenCL

2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…Additionally, it defines a threaded concurrency model that communicates via shared memory, and a relaxed, but rather complex, memory model [11]. In contrast to C99, handling concurrency and memory consistency mostly relies on the programmer.…”

Section: C++11mentioning

confidence: 99%

“…It assumes that the programmer can identify variables that should be declared atomic and access it accordingly. However, Batty et al [11] conclude that this model is not clearly defined by the standard, and the corresponding mathematical model might not be 'sufficiently widely accessible' . Because of the complexity of the model, it is unclear whether it defines the weakest (usable) model.…”

Section: Steinke and Nuttmentioning

confidence: 99%

Programming models for many-core architectures: a co-design approach

Rutgers¹

View full text Add to dashboard Cite

It is unlikely that general-purpose single-core performance will improve much in the coming years. The clock speed is limited by physical constraints, and recent architectural improvements are not as beneficial for performance as those were several years ago. However, the transistor count and density per chip still increase, as feature sizes reduce, and material and processing techniques improve. Given a limited single-core performance, but plenty of transistors, the logical next step is towards many-core.A many-core processor contains at least tens of cores and usually distributed memory, which are connected (but physically separated) by an interconnect that has a communication latency of multiple clock cycles. In contrast to a multicore system, which only has a few tightly coupled cores sharing a single bus and memory, several complex problems arise. Notably, many cores require many parallel tasks to fully utilize the cores, and communication happens in a distributed and decentralized way. Therefore, programming such a processor requires the application to exhibit concurrency. Moreover, a concurrent application has to deal with memory state changes with an observable (non-deterministic) intermediate state, whereas singlecore applications observe all state changes to happen atomically. The complexity introduced by these problems makes programming a many-core system with a single-core-based programming approach notoriously hard.The central concept of this thesis is that abstractions, which are related to (manycore) programming, are structured in a single platform model. A platform is a layered view of the hardware, a memory model, a concurrency model, a model of computation, and compile-time and run-time tooling. Then, a programming model is a specific view on this platform, which is used by a programmer.In this view, some details can be hidden from the programmer's perspective, some details cannot. For example, an operating system presents an infinite number of parallel virtual execution units to the application-details regarding scheduling and context switching of processes on one core are hidden from the programmer. On the other hand, a programmer usually has to take full control over separation, distribution, and balancing of workload among different worker threads. To what extent a programmer can rely on automated control over low-level platform-specific details is part of the programming model. This thesis presents modifications to different abstraction layers of a many-core architecture, in order to make the system as a whole more efficient, and to reduce the complexity that is exposed to the programmer via the programming model. viFor evaluation of many-core hardware and corresponding (concurrent) programming techniques, a 32-core MicroBlaze system, named Starburst, is designed and implemented on FPGA. On the hardware architecture level, a network-on-chip is presented that is tailored towards a typical many-core application communication pattern. All cores can access a shared memory, but as this ...

show abstract

High‐coverage metamorphic testing of concurrency support in C compilers

Windsor

Donaldson

Wickerson

2022

Software Testing Verif & Rel

View full text Add to dashboard Cite

SummaryWe present a technique and automated toolbox for randomized testing of C compilers. Unlike prior compiler‐testing approaches, we generate concurrent test cases in which threads communicate using fine‐grained atomic operations, and we study actual compiler implementations rather than abstract mappings. Our approach is (1) to generate test cases with precise oracles directly from an axiomatization of the C concurrency model; (2) to apply metamorphic fuzzing to each test case, aiming to amplify the coverage they are likely to achieve on compiler codebases; and (3) to execute each fuzzed test case extensively on a range of real machines. Our tool, C4, benefits compiler developers in two ways. First, test cases generated by C4 can achieve line coverage of parts of the LLVM C compiler that are reached by neither the LLVM test suite nor an existing (sequential) C fuzzer. This information can be used to guide further development of the LLVM test suite and can also shed light on where and how concurrency‐related compiler optimizations are implemented. Second, C4 can be used to gain confidence that a compiler implements concurrency correctly. As evidence of this, we show that C4 achieves high strong mutation coverage with respect to a set of concurrency‐related mutants derived from a recent version of LLVM and that it can find historic concurrency‐related bugs in GCC. As a by‐product of concurrency‐focused testing, C4 also revealed two previously unknown sequential compiler bugs in recent versions of GCC and the IBM XL compiler.

show abstract

Mathematizing C++ concurrency

Cited by 270 publications

References 20 publications

Overhauling SC atomics in C11 and OpenCL

Overhauling SC atomics in C11 and OpenCL

Programming models for many-core architectures: a co-design approach

High‐coverage metamorphic testing of concurrency support in C compilers

Contact Info

Product

Resources

About