Bubble-flux

Yang, Hailong; Breslow, Alex D.; Mars, Jason; Tang, Lingjia

doi:10.1145/2485922.2485974

Cited by 237 publications

(21 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Apart from software heterogeneity, datacenter hardware is also becoming increasingly heterogeneous as special-purpose architectures [20-22, 39, 55] and FPGAs are used to accelerate critical operations [19,25,40,75]. This adds to the existing server heterogeneity in the cloud where servers are progressively replaced and upgraded over the datacenter's provisioned lifetime [31,33,65,68,95], and further complicates the effort to guarantee predictable performance.…”

Section: Netflixmentioning

confidence: 99%

“…A second line of work tries to identify resources that will allow a new, potentially-unknown application to meet its performance (throughput or tail latency) requirements [29,31,32,34,66,68,95]. Paragon uses classification to determine the impact of platform heterogeneity and workload interference on an unknown, incoming workload [30,31].…”

Section: Related Workmentioning

confidence: 99%

“…Nathuji et al developed a feedback-based scheme that tunes resource assignments to mitigate memory interference [69]. Yang et al developed an online scheme that detects memory pressure and finds colocations that avoid interference on latency-sensitive workloads [95]. Similarly, DeepDive detects and manages interference between co-scheduled workloads in a VM environment [71].…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Seer

Gan

Zhang

et al. 2019

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Syst

183

View full text Add to dashboard Cite

Performance unpredictability is a major roadblock towards cloud adoption, and has performance, cost, and revenue ramifications. Predictable performance is even more critical as cloud services transition from monolithic designs to microservices. Detecting QoS violations after they occur in systems with microservices results in long recovery times, as hotspots propagate and amplify across dependent services. We present Seer, an online cloud performance debugging system that leverages deep learning and the massive amount of tracing data cloud systems collect to learn spatial and temporal patterns that translate to QoS violations. Seer combines lightweight distributed RPC-level tracing, with detailed low-level hardware monitoring to signal an upcoming QoS violation, and diagnose the source of unpredictable performance. Once an imminent QoS violation is detected, Seer notifies the cluster manager to take action to avoid performance degradation altogether. We evaluate Seer both in local clusters, and in large-scale deployments of end-to-end applications built with microservices with hundreds of users. We show that Seer correctly anticipates QoS violations 91% of the time, and avoids the QoS violation to begin with in 84% of cases. Finally, we show that Seer can identify applicationlevel design bugs, and provide insights on how to better architect microservices to achieve predictable performance.

show abstract

Section: Netflixmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Seer

Gan

Zhang

et al. 2019

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Syst

183

View full text Add to dashboard Cite

show abstract

“…Again, this work can be done at the processor-level [23,24,44,56], DRAM [11], storage [30], and across a data-center [33,39]. At the data-center or cluster-level, power can be saved by consolidating workloads to use fewer physical machines [14,27,34,35,37,54], coordinating co-existing applications [45], and scheduling with green power [20] Scheduling jobs under a power cap has recently become a major concern for HPC operating systems [7,15] and job schedulers [3,19]. Recent work suggests that HPC workloads can actually achieve higher performance by over-provisioning large-scale installationssuch that using all nodes at full capacity would drastically violate the power budget-and severely power capping the individual nodes [47].…”

Section: Related Workmentioning

confidence: 99%

Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps

Zhang

Hoffmann

2018

Proceedings of the 47th International Conference on Parallel Processing

View full text Add to dashboard Cite

Large scale parallel machines are subject to system-wide power budgets (or caps). As these machines grow in capacity, they can concurrently execute dependent applications that were previously processed serially. Such application coupling saves IO and time as the applications now communicate at runtime instead of through disk. Such coupled applications are predicted to be a major workload for future exascale supercomputers; e.g., scientific simulations will execute concurrently with in situ analysis. While support for system-wide power caps has been widely studied, prior work does not consider the impact on coupled applications. We study techniques for maximizing coupled application performance under a system-wide power cap and implement them on a 26-node cluster. We compare to SLURM, a state-of-the-art job scheduler that considers power, but not coupling. The proposed techniques increase mean performance over SLURM by 7-14%. Unlike existing approaches, the proposed techniques also recognize when it is not possible to increase performance and, instead, reduce energy, achieving 18% energy reduction for a 5% performance loss. Finally, the dynamic techniques are resilient to tail behavior and system noise, improving performance in noisy environments by 30-36%.

show abstract

“…Our approach explores this issue at higher throughputs and with tighter-latency SLOs. Bubble-Flux [Yang et al 2013] additionally controls background threads; we control background and latency-sensitive threads. CPI 2 [Zhang et al 2013] detects performance interference by observing changes in CPI and throttles offending jobs.…”

Section: Related Workmentioning

confidence: 99%

The IX Operating System

Belay¹,

Prekas

Primorac

et al. 2016

ACM Trans. Comput. Syst.

View full text Add to dashboard Cite

The conventional wisdom is that aggressive networking requirements, such as high packet rates for small messages and μs-scale tail latency, are best addressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O performance and high resource efficiency while maintaining the protection and isolation benefits of existing kernels.IX uses hardware virtualization to separate management and scheduling functions of the kernel (control plane) from network processing (dataplane). The dataplane architecture builds upon a native, zero-copy API and optimizes for both bandwidth and latency by dedicating hardware threads and networking queues to dataplane instances, processing bounded batches of packets to completion, and eliminating coherence traffic and multicore synchronization. The control plane dynamically adjusts core allocations and voltage/frequency settings to meet service-level objectives.We demonstrate that IX outperforms Linux and a user-space network stack significantly in both throughput and end-to-end latency. Moreover, IX improves the throughput of a widely deployed, key-value store by up to 6.4× and reduces tail latency by more than 2×. With three varying load patterns, the control plane saves 46%-54% of processor energy, and it allows background jobs to run at 35%-47% of their standalone throughput.

show abstract

Bubble-flux

Cited by 237 publications

References 47 publications

Seer

Seer

Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps

The IX Operating System

Contact Info

Product

Resources

About