Matthias Diener scite author profile

Shared memory architectures have recently experienced a large increase in thread-level parallelism, leading to complex memory hierarchies with multiple cache memory levels and memory controllers. These new designs created a Non-Uniform Memory Access (NUMA) behavior, where the performance and energy consumption of memory accesses depend on the place where the data is located in the memory hierarchy. Accesses to local caches or memory controllers are generally more efficient than accesses to remote ones. A common way to improve the locality and balance of memory accesses is to determine the mapping of threads to cores and data to memory controllers based on the affinity between threads and data. Such mapping techniques can operate at different hardware and software levels, which impacts their complexity, applicability, and the resulting performance and energy consumption gains. In this article, we introduce a taxonomy to classify different mapping mechanisms and provide a comprehensive overview of existing solutions.

show abstract

High Performance Computing in the cloud: Deployment, performance and cost efficiency

Roloff

Diener

Carissimi

et al. 2012

View full text Add to dashboard Cite

High-Performance Computing (HPC) in the cloud has reached the mainstream and is currently a hot topic in the research community and the industry. The attractiveness of cloud for HPC is the capability to run large applications on powerful, scalable hardware without needing to actually own or maintain this hardware. In this paper, we conduct a detailed comparison of HPC applications running on three cloud providers, Amazon EC2, Microsoft Azure and Rackspace. We analyze three important characteristics of HPC, deployment facilities, performance and cost efficiency and compare them to a cluster of machines.For the experiments, we used the well-known NAS parallel benchmarks as an example of general scientific HPC applications to examine the computational and communication performance. Our results show that HPC applications can run efficiently on the cloud. However, care must be taken when choosing the provider, as the differences between them are large. The best cloud provider depends on the type and behavior of the application, as well as the intended usage scenario. Furthermore, our results show that HPC in the cloud can have a higher performance and cost efficiency than a traditional cluster, up to 27% and 41%, respectively. C OpenMP, MPI LU Lower and Upper Triangular Regular communication Fortran OpenMP, MPI MG Multigrid Regular communication Fortran OpenMP, MPI SP Scalar Pentadiagonal Floating point performance Fortran OpenMP, MPI UA Unstructured Adaptive Irregular communication Fortran OpenMP B. Machines

show abstract

Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors

Diener

Madruga

Rodrigues

et al. 2010

View full text Add to dashboard Cite

Abstract. Process placement is a technique widely used on parallel machines with heterogeneous interconnections to reduce the overall communication time. For instance, two processes which communicate frequently are mapped close to each other. Finding the optimal mapping between threads and cores in a shared-memory environment (for example, OpenMP and Pthreads) is an even more complex task due to implicit communication. In this work, we examine data sharing patterns between threads in different workloads and use those patterns in a similar way as messages are used to map processes in cluster computers. We evaluated our technique on two state-of-the-art multi-core processors and achieved moderate improvements in the common case and considerable improvements in some cases, reducing execution time by up to 45%.

show abstract

SiNUCA: A Validated Micro-Architecture Simulator

Alves

Villavieja

Diener

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Matthias Diener

Characterizing communication and page usage of parallel applications for thread and data mapping

Affinity-Based Thread and Data Mapping in Shared Memory Systems

High Performance Computing in the cloud: Deployment, performance and cost efficiency

Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors

SiNUCA: A Validated Micro-Architecture Simulator

Contact Info

Product

Resources

About