Proceedings of the 42nd Annual International Symposium on Computer Architecture 2015
DOI: 10.1145/2749469.2750415
|View full text |Cite
|
Sign up to set email alerts
|

Manycore network interfaces for in-memory rack-scale computing

Abstract: Datacenter operators rely on low-cost, high-density technologies to maximize throughput for data-intensive services with tight tail latencies. In-memory rack-scale computing is emerging as a promising paradigm in scale-out datacenters capitalizing on commodity SoCs, low-latency and highbandwidth communication fabrics and a remote memory access model to enable aggregation of a rack's memory for critical data-intensive applications such as graph processing or key-value stores. Low latency and high bandwidth not … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(14 citation statements)
references
References 37 publications
0
14
0
Order By: Relevance
“…Another drawback is that it is currently only possible for such protocols to work with devices and device drivers that explicitly supports them. A proposed approach for overcoming the protocol translation overhead would be to integrate network interface functionality directly into SoCs [7], but the improvement only takes effect when the SoCs are in communication with each other. This idea is followed in the rack-scale architecture [6], which generalizes a trend returning from switched cluster architectures to hypercube architectures [11,32].…”
Section: Distributed Io Using Rdmamentioning
confidence: 99%
“…Another drawback is that it is currently only possible for such protocols to work with devices and device drivers that explicitly supports them. A proposed approach for overcoming the protocol translation overhead would be to integrate network interface functionality directly into SoCs [7], but the improvement only takes effect when the SoCs are in communication with each other. This idea is followed in the rack-scale architecture [6], which generalizes a trend returning from switched cluster architectures to hypercube architectures [11,32].…”
Section: Distributed Io Using Rdmamentioning
confidence: 99%
“…EMC/Isilon [23]) solutions to clients connected via a conventional network. Ap-pliedMicro's X-Gene2 server SoC [49] and Oracle's Sonoma [34] integrate the RDMA controller directly on chip, HP Moonshot [36] combines low-power processors with RDMA NICs, and research proposals further argue for on-chip support for one-sided remote access primitives [17,56]. The benefit of such rack-scale memory pooling approaches is that building larger logical entities comes at a lower cost and complexity as compared to the cache-coherent NUMA (ccNUMA) approach.…”
Section: Architectural Building Blocksmentioning
confidence: 99%
“…• We implement RackOut KVS (RO-KVS), a proof-of-concept KVS using a conventional network for client access and an RDMA fabric for memory access. RO-KVS is based on FaRM [22] and is ported to both Mellanox RDMA [52] and Scale-Out NUMA [17,56]. We evaluate RO-KVS using RackOut_static scheduling in terms of its 99th percentile tail latency for the hottest rack of a 512-server deployment.…”
Section: Introductionmentioning
confidence: 99%
“…In-memory processing and the use of remote direct memory access as the underlying communications system is a growing trend in large-scale computing. Architectures such as scale-out non-uniform memory access (NUMA) [ 30 ] for rack-scale computers are very sensitive to latency and thus have latency-reducing designs [ 31 ]. However, they have limited scalability due to intrinsic physical limitations of the propagation delay among different elements of the system.…”
Section: Limitations Of Current-day Architecturesmentioning
confidence: 99%
“…A fibre used for inter-server connection has a propagation delay of 5 ns/m; thus, within a standard height rack, the propagation delay between the top and bottom rack units is approximately 9 ns, and the round-trip time to fetch remote data is 18 ns. While for current generation architectures this order of latency is reasonable [ 31 ], it indicates scale-out NUMA machines at data-centre scale (with each round-trip taking at least 1 μs) are not plausible, as the round-trip latency alone is many magnitudes the time-scale for memory retrieval off local random access memory or the latency contribution of any other element in the system. With latencies aggressively reduced across all other elements of in-memory architectures, such propagation delays set a limit on the physical size and thus the scalability of such an architecture.…”
Section: Limitations Of Current-day Architecturesmentioning
confidence: 99%