Speculative execution is a technique that allows serial tasks to execute in parallel. An implementation of speculative execution can be divided into two parts: (1) a policy that specifies what operations and values to predict, what actions to allow during speculation, and how to compare results; and (2) the mechanisms that support speculative execution, such as checkpointing, rollback, causality tracking, and output buffering.In this paper, we show how to separate policy from mechanism. We implement a speculation mechanism in the operating system, where it can coordinate speculations across all applications and kernel state. Policy decisions are delegated to applications, which have the most semantic information available to direct speculation.We demonstrate how custom policies can be used in existing applications to add new features that would otherwise be difficult to implement. Using custom policies in our separated speculation system, we can hide 85% of program load time by predicting the program's launch, decrease SSL connection latency by 15% in Firefox, and increase a BFT client's request rate by 82%. Despite the complexity of the applications, small modifications can implement these features since they only specify policy choices and rely on the system to realize those policies. We provide this increased programmability with a modest performance trade-off, executing only 8% slower than an optimized, applicationimplemented speculation system.
Detecting data races in multithreaded programs is a crucial part of debugging such programs, but traditional data race detectors are too slow to use routinely. This paper shows how to speed up race detection by spreading the work across multiple cores. Our strategy relies on uniparallelism, which executes time intervals of a program (called epochs) in parallel to provide scalability, but executes all threads from a single epoch on a single core to eliminate locking overhead. We use several techniques to make parallelization effective: dividing race detection into three phases, predicting a subset of the analysis state, eliminating sequential work via transitive reduction, and reducing the work needed to maintain multiple versions of analysis via factorization. We demonstrate our strategy by parallelizing a happens-before detector and a lockset-based detector. We find that uniparallelism can significantly speed up data race detection. With 4× the number of cores as the original application, our strategy speeds up the median execution time by 4.4× for a happens-before detector and 3.3× for a lockset race detector. Even on the same number of cores as the conventional detectors, the ability for uniparallelism to elide analysis locks allows it to reduce the median overhead by 13% for a happens-before detector and 8% for a lockset detector.
Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, replaying shared memory multiprocessor systems at low overhead on commodity hardware is still an open problem. This paper presents Respec, a new way to support deterministic replay of shared memory multithreaded programs on commodity multiprocessor hardware. Respec targets online replay in which the recorded and replayed processes execute concurrently.Respec uses two strategies to reduce overhead while still ensuring correctness: speculative logging and externally deterministic replay. Speculative logging optimistically logs less information about shared memory dependencies than is needed to guarantee deterministic replay, then recovers and retries if the replayed process diverges from the recorded process. Externally deterministic replay relaxes the degree to which the two executions must match by requiring only their system output and final program states match. We show that the combination of these two techniques results in low recording and replay overhead for the common case of datarace-free execution intervals and still ensures correct replay for execution intervals that have data races.We modified the Linux kernel to implement our techniques. Our software system adds on average about 18% overhead to the execution time for recording and replaying programs with two threads and 55% overhead for programs with four threads.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.