Yves Lhuillier scite author profile

Since processor performance scalability will now mostly be achieved through thread-level parallelism, there is a strong incentive to parallelize a broad range of applications, including those with complex control flow and data structures. And writing parallel programs is a notoriously difficult task. Beyond processor performance, the architect can help by facilitating the task of the programmer, especially by simplifying the model exposed to the programmer. In this article, among the many issues associated with writing parallel programs, we focus on finding the appropriate parallelism granularity, and efficiently mapping tasks with complex control and data flow to threads. We propose to relieve the user and compiler of both tasks by delegating the parallelization decision to the architecture at run-time, through a combination of hardware and software support and a tight dialogue between both. For the software support, we leverage an increasingly popular approach in software engineering, called component-based programming; the component contract assumes tight encapsulation of code and data for easy manipulation. Previous research works have shown that it is possible to augment components with the ability to split/spawn, providing a simple and fitting approach for programming parallel applications with complex control and data structures. However, such environments still require the programmer to determine the appropriate granularity of parallelism, and spawning incurs significant overheads due to software run-time system management. For that purpose, we provide an environment with the ability to spawn conditionally depending on available hardware resources, and we delegate spawning decisions and actions to the architecture. This conditional spawning is implemented through frequent hardware resource probing by the program. This, in turn, enables rapid adaptation to varying workload conditions, data sets and hardware resources. Furthermore, thanks to appropriate combined hardware and compiler support, the probing has no significant overhead on program performance. We demonstrate this approach on an 8-context SMT, several non-trivial algorithms and re-engineered SPEC CINT2000 benchmarks, written using component syntax processed by our toolchain. We achieve speedups ranging from 1.1 to 3.0 on our test suite.

show abstract

Parallel Architecture Benchmarking: From Embedded Computing to HPC, a FiPS Project Perspective

Lhuillier

Philippe

Guerre

et al. 2014

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yves Lhuillier

Two‐Terminal Carbon Nanotube Programmable Devices for Adaptive Architectures

CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs

Parallel Architecture Benchmarking: From Embedded Computing to HPC, a FiPS Project Perspective

Contact Info

Product

Resources

About