Basilio B. Fraguela scite author profile

Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or sequential components of parallel programs is enhanced.In this paper, a data type -Hierarchically Tiled Arrays or HTAs -that facilitates the direct manipulation of tiles is introduced. HTA operations are overloaded array operations. We argue that the implementation of HTAs in sequential OO languages transforms these languages into powerful tools for the development of high-performance parallel codes and codes with high degree of locality. To support this claim, we discuss our experiences with the implementation of HTAs for MATLAB and C++ and the rewriting of the NAS benchmarks and a few other programs into HTA-based parallel form.

show abstract

Probabilistic miss equations: evaluating memory hierarchy performance

Fraguela

Doallo

Zapata

2003

IEEE Trans. Comput.

View full text Add to dashboard Cite

The increasing gap between processor and main memory speeds makes the role of the memory hierarchy behavior in the system performance essential. Both hardware and software techniques to improve this behavior require good analysis tools that help predict and understand such behavior. Analytical modeling arises as a good choice in this field due to its high speed if its traditional limited precision is overcome. We present a modular analytical modeling strategy for arbitrary set-associative caches with LRU replacement policy. The model differs from all the previous related works in its probabilistic approach. Both perfectly and nonperfectly nested loops as well as reuse between different nests are considered by this model, so it makes the analysis of complete programs with regular computations feasible. Moreover, the model achieves good levels of accuracy while being extremely fast and flexible enough to allow its extension. Our approach has been extensively validated using well-known benchmarks. Finally, the model has also proven its ability to drive code optimizations even more successfully than current production compilers.

show abstract

Exploiting heterogeneous parallelism with the Heterogeneous Programming Library

Viñas

Bozkuş

Fraguela

2013

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Adaptive line placement with the set balancing cache

Rolan

Fraguela

Doallo

2009

View full text Add to dashboard Cite

Efficient memory hierarchy design is critical due to the increasing gap between the speed of the processors and the memory. One of the sources of inefficiency in current caches is the non-uniform distribution of the memory accesses on the cache sets. Its consequence is that while some cache sets may have working sets that are far from fitting in them, other sets may be underutilized because their working set has fewer lines than the set. In this paper we present a technique that aims to balance the pressure on the cache sets by detecting when it may be beneficial to associate sets, displacing lines from stressed sets to underutilized ones. This new technique, called Set Balancing Cache or SBC, achieved an average reduction of 13% in the miss rate of ten benchmarks from the SPEC CPU2006 suite, resulting in an average IPC improvement of 5%.

show abstract

Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures

Mallón

Taboada

Teijeiro

et al. 2009

View full text Add to dashboard Cite

The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI performance against Unified Parallel C (UPC) and OpenMP on multicore architectures. From the analysis of the results, it can be concluded that MPI is generally the best choice on multicore systems with both shared and hybrid shared/distributed memory, as it takes the highest advantage of data locality, the key factor for performance in these systems. Regarding UPC, although it exploits efficiently the data layout in memory, it suffers from remote shared memory accesses, whereas OpenMP usually lacks efficient data locality support and is restricted to shared memory systems, which limits its scalability.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.