Ruben Titos-Gil scite author profile

Negi

et al. 2014

Hardware transactional memory implementations are becoming increasingly available. For instance, the Intel Core TM i7 4770 implements Restricted Transactional Memory (RTM) support for Intel Transactional Synchronization Extensions (TSX). In this paper, we present a detailed evaluation of RTM performance and energy expenditure. We compare RTM behavior to that of the TinySTM software transactional memory system, first by running microbenchmarks, and then by running the STAMP benchmark suite. We find that which system performs better depends heavily on the workload characteristics. We then conduct a case study of two STAMP applications to assess the impact of programming style on RTM performance and to investigate what kinds of software optimizations can help overcome RTM's hardware limitations.

ZEBRA: Data-Centric Contention Management in Hardware Transactional Memory

IEEE Trans. Parallel Distrib. Syst.

Negi

Acacio

et al. 2014

Characterizing Energy Consumption in Hardware Transactional Memory Systems

Gaona-Ramirez

Fernández

et al. 2010

Transactional Memory is currently being advocated as a promising alternative to lock-based synchronization because it simplifies multithreaded programming. In this way, future many-core CMP architectures may need to provide hardware support for transactional memory. On the other hand, power dissipation constitutes a first class consideration in multicore processor design. In this work, we characterize the performance and energy consumption of two well-known Hardware Transactional Memory systems that employ opposite policies for data versioning and conflict management. More specifically, we compare the LogTM-SE Eager-Eager system and a version of the Scalable TCC Lazy-Lazy system that enables parallel commits. To the best of our knowledge, this is the first characterization in terms of energy consumption of hardware transactional memory systems. To do that, we extended the GEMS simulator to estimate the energy consumed in the on-chip caches according to CACTI, and used the interconnection network energy model given by Orion 2. Results show that the energy consumption of the Eager-Eager system is 60% higher on average than in the Lazy-Lazy case, whereas performance differences between the two systems are 42% on average. Finally, we found that although on average Lazy-Lazy beats Eager-Eager there are considerable deviations in performance depending on the particular characteristics of each application.

Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional Memory

Armejach

Seyedi

et al. 2011

Abstract-Transactional Memory (TM) potentially simplifies parallel programming by providing atomicity and isolation for executed transactions. One of the key mechanisms to provide such properties is version management, which defines where and how transactional updates (new values) are stored. Version management can be implemented either eagerly or lazily. In Hardware Transactional Memory (HTM) implementations, eager version management puts new values in-place and old values are kept in a software log, while lazy version management stores new values in hardware buffers keeping old values in-place. Current HTM implementations, for both eager and lazy version management schemes, suffer from performance penalties due to the inability to handle two versions of the same logical data efficiently.In this paper, we introduce a reconfigurable L1 data cache architecture that has two execution modes: a 64KB general purpose mode and a 32KB TM mode which is able to manage two versions of the same logical data. The latter allows to handle old and new transactional values within the cache simultaneously when executing transactional workloads. We explain in detail the architectural design and internals of this Reconfigurable Data Cache (RDC), as well as the supported operations that allow to efficiently solve existing version management problems. We describe how the RDC can support both eager and lazy HTM systems, and we present two RDC-HTM designs. Our evaluation shows that the Eager-RDC-HTM and Lazy-RDC-HTM systems achieve 1.36× and 1.18× speedup, respectively, over state-of-theart proposals. We also evaluate the area and energy effects of our proposal, and we find that RDC designs are 1.92× and 1.38× more energy-delay efficient compared to baseline HTM systems, with less than 0.3% area impact on modern processors.

Eager Meets Lazy: The Impact of Write-Buffering on Hardware Transactional Memory

Negi

Acacio

et al. 2011

Abstract-Hardware transactional memory (HTM) systems have been studied extensively along the dimensions of speculative versioning and contention management policies. The relative performance of several designs policies has been discussed at length in prior work within the framework of scalable chipmultiprocessing systems. Yet, the impact of simple structural optimizations like write-buffering has not been investigated and performance deviations due to the presence or absence of these optimizations remains unclear. This lack of insight into the effective use and impact of these interfacial structures between the processor core and the coherent memory hierarchy forms the crux of the problem we study in this paper. Through detailed modeling of various write-buffering configurations we show that they play a major role in determining the overall performance of a practical HTM system. Our study of both eager and lazy conflict resolution mechanisms in a scalable parallel architecture notes a remarkable convergence of the performance of these two diametrically opposite design points when write buffers are introduced and used well to support the common case. Mitigation of redundant actions, fewer invalidations on abort, latency-hiding and prefetch effects contribute towards reducing execution times for transactions. Shorter transaction durations also imply a lower contention probability, thereby amplifying gains even further. The insights, related to the interplay between buffering mechanisms, system policies and workload characteristics, contained in this paper clearly distinguish gains in performance to be had from write-buffering from those that can be ascribed to HTM policy. We believe that this information would facilitate sound design decisions when incorporating HTMs into parallel architectures.