Comparative performance evaluation of cache-coherent NUMA and COMA architectures

StenströmPer,; JoeTruman,; GuptaAnoop,

doi:10.1145/146628.139705

Cited by 11 publications

(1 citation statement)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, the modi cation of this shared variable results in invalidating the cache lines including the lock variable of waiting threads and even lock holders. In particular, when a cache line is shared among threads in a NUMA environment, frequent invocations of the cache line invalidation to maintain the cache coherence become a signi cant overhead [19], [29]. In the case of SRL(tree,i), when it dynamically allocates slvs, the cache line for atomic operations can be shared.…”

Section: B Performance Evaluation 1) Lock Acquisition Latencymentioning

confidence: 99%

Parallelizing Shared File I/O Operations of NVM File System for Manycore Servers

Kim

Jamil

et al. 2021

IEEE Access

View full text Add to dashboard Cite

NOVA, a state-of-the-art non-volatile memory (NVM) le system, has limited performance due to its coarse-grained per-le lock when multiple threads perform I/Os to a shared le in a manycore environment. For instance, a writer lock blocks other threads attempting to access the same le, although they access di erent regions of a le. When multiple threads reading the same le share a cache line containing a reader counter, performance can be signi cantly degraded due to cache consistency protocol as we increase the number of readers. This paper proposes a ne-grained segment-based range lock (SRL) that divides a le into multiple segments and manages a lock variable dynamically for each segment. Consequently, write operations can be parallelized without blocking unless there is a con ict in accessing the same range in a le. Moreover, SRL maintains a reader counter per segment that allows multiple reader threads to perform read operations without causing a performance bottleneck. We evaluated an SRL-based NOVA on an Intel Optane DC persistent memory (PM) manycore server. The benchmarking results showed that the average write throughput of the SRL-based NOVA is 3× higher than the original NOVA, and the average read throughput scales linearly, while the original NOVA does not scale.

show abstract

Section: B Performance Evaluation 1) Lock Acquisition Latencymentioning

confidence: 99%

Parallelizing Shared File I/O Operations of NVM File System for Manycore Servers

Kim

Jamil

et al. 2021

IEEE Access

View full text Add to dashboard Cite

show abstract

Twister2: Design of a big data toolkit

Kamburugamuve

Govindarajan

Wickramasinghe

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Data-driven applications are essential to handle the ever-increasing volume, velocity, and veracity of data generated by sources such as the Web and Internet of Things (IoT) devices. Simultaneously, an event-driven computational paradigm is emerging as the core of modern systems designed for database queries, data analytics, and on-demand applications. Modern big data processing runtimes and asynchronous many task (AMT) systems from high performance computing (HPC) community have adopted dataflow event-driven model. The services are increasingly moving to an event-driven model in the form of Function as a Service (FaaS) to compose services. An event-driven runtime designed for data processing consists of well-understood components such as communication, scheduling, and fault tolerance. Different design choices adopted by these components determine the type of applications a system can support efficiently. We find that modern systems are limited to specific sets of applications because they have been designed with fixed choices that cannot be changed easily. In this paper, we present a loosely coupled component-based design of a big data toolkit where each component can have different implementations to support various applications. Such a polymorphic design would allow services and data analytics to be integrated seamlessly and expand from edge to cloud to HPC environments.

show abstract

Parallel Performance Prediction

1999

Performance Evaluation, Prediction and Visualization of Parallel Systems

View full text Add to dashboard Cite

Comparative performance evaluation of cache-coherent NUMA and COMA architectures

Cited by 11 publications

References 7 publications

Parallelizing Shared File I/O Operations of NVM File System for Manycore Servers

Parallelizing Shared File I/O Operations of NVM File System for Manycore Servers

Twister2: Design of a big data toolkit

Parallel Performance Prediction

Contact Info

Product

Resources

About