Scale-out NUMA

Novaković, Stanko; Daglis, Alexandros; Bugnion, Edouard; Falsafi, Babak; Grot, Boris

doi:10.1145/2541940.2541965

Cited by 89 publications

(54 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…10 describes the emulation platform for the so-NUMA architecture. This platform [47] is designed to (i) run server nodes at regular wall-clock speed, and (ii) approximate the latency and bandwidth of the fabric. The emulation platform relies on hardware virtualization to create a RackOut unit of up to 16 nodes.…”

Section: Experimental Methodologymentioning

confidence: 99%

“…EMC/Isilon [24]) solutions to clients connected via a conventional network. AppliedMicro's X-Gene2 server SoC [40] and Oracle's Sonoma [41] integrate the RDMA controller directly on chip, HP Moonshot [30] combines low-power processors with RDMA NICs, and research proposals further argue for on-chip support for one-sided remote access primitives [18,47]. Building larger logical entities using such rack-scale memory pooling approaches instead of the cache-coherent NUMA approach comes at a lower cost and complexity.…”

Section: Architectural Building Blocksmentioning

confidence: 99%

“…A representative of such emerging tightly integrated solutions is Scale-Out NUMA (soNUMA) [18,47], which delivers remote memory access latency within a small factor over local memory access. soNUMA is an architecture and protocol that supports one-sided remote read and write operations, i.e., a strict subset of RDMA operations.…”

Section: The Impact Of Faster Remote Readsmentioning

confidence: 99%

“…We evaluate the benefits of RackOut as a function of server count, size of the RackOut unit, and read/write ratio for datasets following a power-law popularity distribution. For a Zipfian read-only distribution with α = 0.99, the model predicts that a RackOut deployment of 512 servers grouped into 16-server RackOut units increases throughput by 6× with RDMA and 8.6× with Scale-Out NUMA (soNUMA) [18,19,47] without violating SLO.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

The Case for RackOut

Novaković¹,

Daglis²,

Bugnion³

et al. 2016

Proceedings of the Seventh ACM Symposium on Cloud Computing

Self Cite

View full text Add to dashboard Cite

To provide low latency and high throughput guarantees, most large key-value stores keep the data in the memory of many servers. Despite the natural parallelism across lookups, the load imbalance, introduced by heavy skew in the popularity distribution of keys, limits performance. To avoid violating tail latency service-level objectives, systems tend to keep server utilization low and organize the data in micro-shards, which provides units of migration and replication for the purpose of load balancing. These techniques reduce the skew, but incur additional monitoring, data replication and consistency maintenance overheads.In this work, we introduce RackOut, a memory pooling technique that leverages the one-sided remote read primitive of emerging rack-scale systems to mitigate load imbalance while respecting service-level objectives. In RackOut, the data is aggregated at rack-scale granularity, with all of the participating servers in the rack jointly servicing all of the rack's micro-shards. We develop a queuing model to evaluate the impact of RackOut at the datacenter scale. In addition, we implement a RackOut proof-of-concept key-value store, evaluate it on two experimental platforms based on RDMA and Scale-Out NUMA, and use these results to validate the model. Our results show that RackOut can increase throughput up to 6× for RDMA and 8.6× for Scale-Out NUMA compared to a scale-out deployment, while respecting tight tail latency service-level objectives.

show abstract

Section: Experimental Methodologymentioning

confidence: 99%

Section: Architectural Building Blocksmentioning

confidence: 99%

Section: The Impact Of Faster Remote Readsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The Case for RackOut

Novaković¹,

Daglis²,

Bugnion³

et al. 2016

Proceedings of the Seventh ACM Symposium on Cloud Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Emerging low-latency rack-scale fabrics [7] may provide a way to efficiently aggregate memory to reduce the pressure on dynamic replication.…”

Section: Dynamic Replicationmentioning

confidence: 99%

An Analysis of Load Imbalance in Scale-out Data Serving

NovakovićStanko¹,

DaglisAlexandros²,

BugnionEdouard³

et al. 2016

SIGMETRICS Perform. Eval. Rev.

Self Cite

View full text Add to dashboard Cite

Despite the natural parallelism across lookups, performance of distributed key-value stores is often limited due to load imbalance induced by heavy skew in the popularity distribution of the dataset. To avoid violating service level objectives expressed in terms of tail latency, systems tend to keep server utilization low and organize the data in micro-shards, which in turn provides units of migration and replication for the purpose of load balancing. These techniques reduce the skew, but incur additional monitoring, data replication and consistency maintenance overheads. This work shows that the trend towards extreme scale-out will further exacerbate the skew-induced load imbalance, and hence the overhead of migration and replication.

show abstract

Post‐Moore Datacenter Server Architecture

Falsafi

2021

Multi‐Processor System‐on‐Chip 2

View full text Add to dashboard Cite

Scale-out NUMA

Cited by 89 publications

References 50 publications

The Case for RackOut

The Case for RackOut

An Analysis of Load Imbalance in Scale-out Data Serving

Post‐Moore Datacenter Server Architecture

Contact Info

Product

Resources

About