DiME

Buragohain, Dhantu; Ghogare, Abhishek; Patel, Trishal; Vutukuru, Mythili; Kulkarni, Purushottam

doi:10.1145/3124680.3124731

Cited by 10 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The realization of fully disaggregated DC architecture with rack‐scale memory disaggregation is very challenging because the disaggregation of memory from blade scale to rack scale inevitably increases memory access time. Memory access time between CPU and disaggregated memory may increase to few microseconds, whereas memory access time of legacy servers is few nanoseconds [1,6–10]. When 25% local memory is used with remote memory and the memory access time is 5 μs, the application performance is degraded by approximately 5%.…”

Section: Introductionmentioning

confidence: 99%

Distributed memory access architecture and control for fully disaggregated datacenter network

et al. 2022

View full text Add to dashboard Cite

In this paper, we propose novel disaggregated memory module (dMM) architecture and memory access control schemes to solve the collision and contention problems of memory disaggregation, reducing the average memory access time to less than 1 μs. In the schemes, the distributed scheduler in each dMM determines the order of memory read/write access based on delay‐sensitive priority requests in the disaggregated memory access frame (dMAF). We used the memory‐intensive first (MIF) algorithm and priority‐based MIF (p‐MIF) algorithm that prioritize delay‐sensitive and/or memory‐intensive (MI) traffic over CPU‐intensive (CI) traffic. We evaluated the performance of the proposed schemes through simulation using OPNET and hardware implementation. Our results showed that when the offered load was below 0.7 and the payload of dMAF was 256 bytes, the average round trip time (RTT) was the lowest, ~0.676 μs. The dMM scheduling algorithms, MIF and p‐MIF, achieved delay less than 1 μs for all MI traffic with less than 10% of transmission overhead.

show abstract

Section: Introductionmentioning

confidence: 99%

Distributed memory access architecture and control for fully disaggregated datacenter network

et al. 2022

View full text Add to dashboard Cite

show abstract

“…They leverage the idle memory present in the host or other VMs on the host to implement the proposed disaggregated memory system. Buragohain et al [21] present a performance emulator for disaggregated memory architectures. It works by injecting delays to protected portions of the virtual address space of the process under emulation that correspond to the remote disaggregated memory.…”

Section: Disaggregated Solutionsmentioning

confidence: 99%

“…Since the concept of remote memory decoupled from the processor is not easy to implement in real prototypes and due to the absence of an available prototype [21], we emulate a disaggregated shared memory architecture (following the concept presented in Section 2.2), without the need for real hardware, using a conventional multi-socket server. This approach takes advantage of a two-socket server and its separated LLC to create pressure only in the desired shared resource, i.e.…”

Section: Problem Definition: Global Memory Emulationmentioning

confidence: 99%

“…These resources exhibit different trends in terms of cost, performance, and power scaling [12]. While storage has been one of the first resources to be disaggregated, memory is much more challenging [21]. However, advances in network speed and scalability with new technologies, are enabling fast access to hardware components that are disaggregated across the network [22].…”

mentioning

confidence: 99%

See 1 more Smart Citation

Job scheduling for disaggregated memory in high performance computing systems

Vieira Zacarias

View full text Add to dashboard Cite

(English) In a typical HPC cluster system, a node is the elemental component unit of this architecture. Memory and compute resources are tightly coupled in each node and the rigid boundaries between nodes limits compute and memory resource utilization. The problem is increased by the fact that HPC applications have a widely varying per-node memory footprint due to diverse application characteristics, differing problem sizes, and strong scaling. In fact, 25% to 76% of the system's total memory capacity typically remains idle. Disaggregated memory offers a way to improve memory utilization, as memory becomes a pool that can be dynamically composed to match the needs of the workloads. It enables fine-grained allocation of memory capacity to jobs while maintaining the cost-effectiveness and scalability of a cluster architecture. A key component for the distribution of computing power within the cluster infrastructure is the RJMS or simply resource manager. Its goal is to satisfy users' demands and achieve acceptable performance in the overall system utilization by efficiently matching requests to resources. Even though several researches on RJMS have been carried out to solve problems related to the current state-of-the-art on HPC systems, memory disaggregation is still under development. Therefore, adopting a disaggregated architecture means redesigning the resource manager services. In this thesis we propose an efficient memory disaggregated infrastructure for a cluster resource manager and its evaluation at scale through a structured simulated experimental methodology employing a contention model that models the impact of shared resources in disaggregated scenarios. Sharing common memory devices or interfaces in a disaggregated infrastructure may incur an unsatisfactory loss of performance because concurrent memory access can saturate the resource; we start our study by introducing a systematic methodology to build a contention model. Extensive real-machine experimentation and the results of workloads have shown that our contention model predicts performance degradation with at most an average error of 1.19% and max error of 14.6%. Compared with the state-of-the-art, the relative improvements are almost 24 % on average and 33% for the worst case. In sequence, we argue that it is possible to increase throughput and utilization using memory disaggregated in a resource manager. We show that depending on the level of imbalance between the system and memory demands of scheduled jobs, memory disaggregation enables resource savings of up to 33% compared to the state-of-the-art resource manager. In addition, on average, it can increase the memory utilization by a factor of 1.6, while having almost 90% of CPU utilization. In our study, we also investigate how critical memory demand bounds are for maximising system throughput and minimising job response time. We analyse to what degree the users would have a natural incentive to provide accurate memory bounds. We demonstrate that even when there is a large effect on system throughput (-25%) and response time (5 times higher), there is a very little direct incentive for the users to be accurate in their estimates, with only an 8% increase in response time. We further demonstrate that taking advantage of memory temporal and spatial imbalance among jobs delivers improvements up to 18% in throughput, 38% in throughput per dollar, and up to 69% reduction in job response time (median) when there are imbalanced memory usage and overestimated demands on underprovisioned systems. Overall, we believe our study provides valuable insights on the importance of design space exploration for disaggregated memory HPC systems. We demonstrate that by understanding disruptive architectural changes on future systems and the demands of the workloads, system provisioning can be carefully designed to achieve the best cost-benefit. (Español) En un clúster HPC. un nodo es la unidad elemental de esta arquitectura. Como la memoria y computación están estrechamente acoplados los límites rígidos entre nodos limitan la utilización de los recursos. El problema se agrava debido a que las aplicaciones de HPC tienen requerimientos de memoria muy variables. De hecho, normalmente se deja inactive entre un 25% y un 76% de la capacidad total de memoria del sistema. La memoria desagregada propone una forma de mejorar la utilización de la memoria, ya que la convierte en un conjunto que puede componerse dinámicamente. Permite una asignación de capacidad de memoria de granularidad fina a los trabajos, manteniendo a la vez la rentabilidad y escalabilidad de una arquitectura de clúster. Un componente clave para la distribución de recursos en un clúster es el RJMS o simplemente gestor de recursos. Su objetivo es satisfacer las demandas de los usuarios y lograr un rendimiento aceptable en la utilización general del sistema, mediante una asignación eficiente de los recursos. Aunque se han llevado a cabo varias investigaciones sobre RJMS, la memoria desagregada aun esta en desarrollo. En esta tesis, proponemos una infraestructura eficiente de memoria desagregada para un gestor de recursos de clúster y evaluamos su rendimiento a gran escala mediante una metodología experimental simulada y estructurada que utiliza un modelo de contención que simula el impacto de los recursos compartidos en escenarios desagregados. Compartir interfaces de memoria en una infraestructura desagregada puede generar una pérdida de rendimiento con acceso concurrente a la memoria. Comenzamos nuestro estudio presentando una metodología sistemática para construir un modelo de contención. Experimentación extensiva en máquinas reales y los resultados de las cargas de trabajo han demostrado que nuestro modelo de contención predice una degradación del rendimiento con un error promedio de 1.19% y un error máximo de 14.6%. En comparación con el estado del arte, las mejoras relativas son casi del 24% en promedio y del 33% en el peer de los casos. A continuación, argumentarnos que es posible aumentar el rendimiento y la utilización del clúster utilizando la memoria desagregada en un gestor de recursos. Mostramos que, dependiendo del nivel de desequilibrio entre el sistema y las demandas de memoria de los trabajos, la desagregación de la memoria permite ahorrar recursos de hasta un 33% en comparación con el estado del arte. Además, en promedio, puede aumentar la utilización de la memoria en un factor de 1.6, mientras se utiliza casi el 90% de la CPU. En nuestro estudio, también investigamos cuan crucial son las demandas de memoria para maximizar el rendimiento del sistema y minimizar el tiempo de respuesta de los trabajos. Analizamos en qué medida los usuarios tendrían un incentivo natural para ser precisos. Demostramos que incluso cuando hay un gran impacto en el rendimiento del sistema (-25%} y en el tiempo de respuesta (5 veces mayor}, hay muy poco incentive directo para que los usuarios sean precises en sus estimaciones, con solo un aumento del 8% en el tiempo de respuesta. Además, demostramos que aprovechar el desequilibrio temporal y espacial de la memoria entre los trabajos proporciona mejoras de hasta un 18% en el rendimiento, un 38% en el rendimiento por d61ar y hasta un 69% de reducción en el tiempo de respuesta del trabajo (mediana), cuando existe un uso de memoria desequilibrado y demandas sobreestimadas en sistemas subabastecidos. En general, creemos que nuestro estudio proporciona información valiosa sobre la importancia de la exploración del espacio de diseño para los sistemas HPC con memoria desagregada. Demostramos que al comprender los cambios arquitectónicos disruptivos en los sistemas futuros y las demandas de las cargas de trabajo, la provisión del sistema se puede diseñar cuidadosamente para lograr el mejor costo-beneficio.

show abstract