SyncProbe improves the end-to-end predictability of distributed systems by providing applications with a real-time estimate of the maximum expected message delay (upper bound on communication latency) for network paths. The upper bound is adjusted over time in response to the monitored network latency and serves as a real-time assurance of synchrony. We deployed SyncProbe on PlanetLab and assessed its performance with respect to violations, duration of synchrony, upper bound cost and recoverability. Experiments revealed that SyncProbe is successful in providing upper bound estimate in real-time for a variety of paths. The estimated upper bound can be utilized for more formal assessment of information related to timeouts, ordering of events or knowledge about global states. We describe the design and methodology of SyncProbe and discuss various issues related to its performance.
IntroductionDistributed systems demonstrate low predictability in communication latency when deployed on the Internet. That is, communication latency differs for each path and varies over time and an application has a little knowledge about the expected message delay.The absence of bound in message latency leads to asynchrony 1 and results in low or unpredictable performance for many applications that make inferences about time-outs, order events or assess information about global states. Several distributed systems assess information about the maximum delay or timeouts based on past knowledge. However, since Internet latency varies over time and could experience an abrupt change, this does not provides the most accurate mechanism.If an application is provided with a real-time estimate of an upper bound, than it can lead to improved performance, as the application will have a more formal basis of correctness. In addition, many formal distributed systems problems, including consensus, leader election and predicate detection, that require assurance of maximum expected delay (upper bound on This material is based on work supported by the National Science Foundation under CAREER grant ANI-0347222. 1 A system is synchronous if there is a fixed bound on the communication latency between two processes [6]. communication latency) in order to admit a solution [6], are likely to execute correctly and efficiently. A real-time estimate of upper bound can also help in meeting QoS requirements and better performance in adapting to network behavior. Our work is motivated, in part, by the Timed Asynchronous model (TA) [4], which provides bounded message latency, utilizes a static value of upper bound and has only been verified for local area networks. However, since communication on the Internet experiences much higher degree of asynchrony and uncertainty, the estimation of an upper bound becomes a more significant challenge for communication involving Internet paths. These include: Prompt responsiveness to changing network behavior. Communication on the Internet experiences various uncertainties that could arise from route changes, transient and persiste...