2019
DOI: 10.14778/3311880.3311887
|View full text |Cite
|
Sign up to set email alerts
|

Cache-aware load balancing of data center applications

Abstract: Our deployment of cache-aware load balancing in the Google web search backend reduced cache misses by ∼0.5x, contributing to a double-digit percentage increase in the throughput of our serving clusters by relieving a bottleneck. This innovation has benefited all production workloads since 2015, serving billions of queries daily. A load balancer forwards each query to one of several identical serving replicas. The replica pulls each term's postings list into RAM from flash, either locally or over the network. F… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(5 citation statements)
references
References 67 publications
0
5
0
Order By: Relevance
“…A similar application is to boost IO throughput in the Google search engine backend by improving cache utilization. Archer et al [15] initialize a voting table that assigns search requests, using graph partitioning on a bipartite graph where vertices are search terms and queries, and an edge exists for each term contained in a query. Subsequent simulation-and-refinement further boosts the prediction accuracy (item in cache) of the voting table.…”
Section: Applicationsmentioning
confidence: 99%
“…A similar application is to boost IO throughput in the Google search engine backend by improving cache utilization. Archer et al [15] initialize a voting table that assigns search requests, using graph partitioning on a bipartite graph where vertices are search terms and queries, and an edge exists for each term contained in a query. Subsequent simulation-and-refinement further boosts the prediction accuracy (item in cache) of the voting table.…”
Section: Applicationsmentioning
confidence: 99%
“…In this work, we consider the connectivity metric e∈E (λ(e) − 1)ω(e) where λ(e) := {V i | e ∩ V i = ∅} denotes the number of different blocks connected by hyperedge e ∈ E and ω(e) denotes its weight. Often balanced partitioning is used as an acceleration technique for other applications, such as quantum circuit simulation [31], sharding distributed databases [15,36], load balancing (for scientific computing) [13], route planning [17,32], or boosting cache utilization in a search engine backend [7].…”
Section: Introductionmentioning
confidence: 99%
“…Caching is a fundamental technique for boosting systems performance. In particular, software-managed caches, aka software caches, are employed in multiple data-stores and databases [2,12,15,19,20,21,36,40,42,41], operating systems, middleware, streaming services, and is a major capability of edge computing. The common motivation behind caching is to store data closer to the application than its source and avoid recalculating queries, query plans, and temporal indices.…”
Section: Introductionmentioning
confidence: 99%