Transactional coherence and consistency: simplifying parallel hardware and software

Hammond, Lance; Carlstrom, Brian D.; Wong, Vicky; Chen, M.; Koryrakis, C.; Olukotun, Kunle

doi:10.1109/mm.2004.91

Cited by 57 publications

(38 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If a memory operation in one processor's memory transaction would cause a coherence or consistency violation with a second processor's uncommitted loads and stores, then this violation can trigger the second processor to rollback and resume execution from a checkpoint preceding the violation. In contrast to the previous proposal, 15 the kilo-instruction processor checkpoint mechanism does not require any new instructions or re-writing of parallel software for correct operation. Moreover, there are a number of hardware implemented enhancements that can improve performance by adaptively selecting checkpoints based on dynamically observed memory behaviour.…”

Section: Future Research In Kilo-instruction Processorsmentioning

confidence: 90%

“…Even more interesting is the use of the kiloinstruction processor checkpoint capability to implement multiprocessor transaction coherence and consistency 15 in a completely transparent manner. Checkpoints allow a processor to combine a series of memory load and store operations into a bundle that is committed to memory as a single transaction.…”

Section: Future Research In Kilo-instruction Processorsmentioning

confidence: 99%

See 1 more Smart Citation

Kilo-Instruction Processors: Overcoming the Memory Wall

et al. 2005

View full text Add to dashboard Cite

Historically, advances in integrated circuit technology have driven improvements in processor microarchitecture and led to today's microprocessors with sophisticated pipelines operating at very high clock frequencies. However, performance improvements achievable by high-frequency microprocessors have become seriously limited by main-memory access latencies because main-memory speeds have improved at a much slower pace than microprocessor speeds. It's crucial to deal with this performance disparity, commonly known as the memory wall, 1 to enable future high-frequency microprocessors to achieve their performance potential.To overcome the memory wall, we propose kilo-instruction processors-superscalar processors that can maintain a thousand or more simultaneous in-flight instructions. Doing so means designing key hardware structures so that the processor can satisfy the high resource requirements without significantly decreasing processor efficiency or increasing energy consumption. Nature of the memory wallOne of the first approaches to the memory wall problem was the development of cache memory hierarchies. Cache memories exploit program locality and can dramatically reduce the number of long-latency accesses to main memory. The first level, or L1 cache, is built into the processor core and typically takes one to three processor clock cycles to access. If there is a miss in the L1 cache, the on-chip L2 cache takes on the order of 10 processor cycles. Accessing main memory, on the other hand, takes at least an order of magnitude longer, and in the future this will become two orders of magnitude, that is, several hundred clock cycles. (In general, the cache hierarchy can have more than two levels, but to simplify our discussion here, we assume two levels with the understanding that the same principles apply to systems with deeper cache hierarchies.)Modern superscalar processors employ outof-order execution as a way of smoothing out disruptions caused by data cache misses (see the "Hiding latency in superscalar processors" sidebar). If a load instruction should experience a data cache miss, then instructions that depend on the miss data must wait in the issue queue(s). Meanwhile, independent instructions are free to execute; they issue from the issue queue(s) and essentially "pass" the blocked load instruction and its dependent instructions. For an L1 cache miss, these outof-order instructions can often completely hide the L2 access latency, so the miss causes little or no performance loss.This approach is much less effective for the long L2 cache misses, however. For example, along the top of Figure 1 is a sequence of instructions in program order. Following a

show abstract

Section: Future Research In Kilo-instruction Processorsmentioning

confidence: 90%

Section: Future Research In Kilo-instruction Processorsmentioning

confidence: 99%

Kilo-Instruction Processors: Overcoming the Memory Wall

et al. 2005

View full text Add to dashboard Cite

show abstract

“…Transactional Coherence and Consistency (TCC) [41,39,40,67,16] is based on the observation that for well synchronized programs, coherence and consistency are only needed to be maintained at synchronization points. TCC is proposed as a new shared memory model where atomic transactions always are the basic units of work and communication, as well as for memory coherence and consistency.…”

Section: Transactional Coherence and Consistency Tccmentioning

confidence: 99%

Transactional memory

Grahn¹

2010

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Current and future processor generations are based on multicore architectures where the performance increase comes from an increasing number of cores on a chip. In order to utilize the performance potential of multicore architectures the programs also need to be parallel, but writing parallel programs is a non-trivial task. Transactional memory tries to ease parallel program development by providing atomic and isolated execution of code sequences, enabling software composability and protected access to shared data. In addition, transactional memory has the ability to execute atomic code sequences in parallel as long as no data conflicts occur. Transactional memory implementation proposals exist for both hardware and software, as well as hybrid solutions. This special issue on transactional memory introduces transactional memory as a concept, presents an overview of some of the most important approaches so far, and finally, includes five articles that advances the state-of-the-art in transactional memory research.

show abstract

“…Currently, HTM attracts the interests from both research communities and industries. Some HTM designs were proposed, such as LogTM [7], UTM [8], VTM [9], PTM [10] and TCC [11]. More and more papers on HTM are published in research conferences.…”

Section: Related Workmentioning

confidence: 99%