Reducing branch misprediction penalties via dynamic control independence detection

Chou, Yuan; Fung, Jason M.; Shen, John Paul

doi:10.1145/305138.305175

Cited by 34 publications

(29 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 6 shows the performance improvement of dynamichammock-predication, dual-path, multipath, and DMP over the baseline processor. The average IPC improvement over all benchmarks is 3.5% for dynamic-hammock-predication, 4.8% for dual-path, 8.8% for multipath, 11 and 19.3% for DMP. DMP improves the IPC by more than 20% on vpr (58%), mcf (47%), parser (26%), twolf (31%), compress (23%), and ijpeg (25%).…”

Section: Compiler Support For Diverge Branch and Cfm Point Selectionmentioning

confidence: 97%

See 1 more Smart Citation

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Kim

Joao

Mutlu

et al. 2006

2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

View full text Add to dashboard Cite

show abstract

Section: Compiler Support For Diverge Branch and Cfm Point Selectionmentioning

confidence: 97%

“…Several hardware mechanisms were proposed to exploit control flow independence [35,36,11,8,16]. These techniques aim to avoid flushing the processor pipeline if the processor is known to be at a control-independent point in the program when a mispredicted branch is resolved.…”

Section: Related Work On Control Flow Independencementioning

confidence: 99%

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Kim

Joao

Mutlu

et al. 2006

2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

View full text Add to dashboard Cite

show abstract

“…Finally, techniques have been proposed to salvage some of the work performed on the incorrect control path via squash reuse [17], Control Independence [11,12,6], and Register Integration [13].…”

Section: Related Workmentioning

confidence: 99%

“…However, target prediction for indirect jumps is typically less accurate than conditional branch prediction, and may contribute a significant fraction of total control misspeculations. This heightens the importance of identifying reconvergence points for these instructions if it provides additional opportunity for techniques which mitigate misprediction costs [11,12,6,5].…”

Section: Survey Of Program Behaviormentioning

confidence: 99%

Control Flow Optimization Via Dynamic Reconvergence Prediction

Collins

Tullsen

Wang

2004

37th International Symposium on Microarchitecture (MICRO-37'04)

View full text Add to dashboard Cite

show abstract

“…While such selective replay recovery mechanisms have been proposed [6], [10], most aggressive processors simply squash all instructions following a misspeculated branch due to the complexity of identifying control re-convergence and re-inserting instructions into the middle of the scheduling window.…”

Section: Fig 3 Comparison Of Parallelism Exploitable By Different Lmentioning

confidence: 99%

Chip multi-processor scalability for single-threaded applications

Vachharajani

Iyer

Ashok

et al. 2005

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

The exponential increase in uniprocessor performance has begun to slow. Designers have been unable to scale performance while managing thermal, power, and electrical effects. Furthermore, design complexity limits the size of monolithic processors that can be designed while keeping costs reasonable. Industry has responded by moving toward chip multi-processor architectures (CMP). These architectures are composed from replicated processors utilizing the die area afforded by newer design processes. While this approach mitigates the issues with design complexity, power, and electrical effects, it does nothing to directly improve the performance of contemporary or future single-threaded applications.This paper examines the scalability potential for exploiting the parallelism in single-threaded applications on these CMP platforms. The paper explores the total available parallelism in unmodified sequential applications and then examines the viability of exploiting this parallelism on CMP machines. Using the results from this analysis, the paper forecasts that CMPs, using the "intrinsic" parallelism in a program, can sustain the performance improvement users have come to expect from new processors for only 6-8 years provided many successful parallelization efforts emerge. Given this outlook, the paper advocates exploring methodologies which achieve parallelism beyond this "intrinsic" limit of programs.

show abstract

Reducing branch misprediction penalties via dynamic control independence detection

Cited by 34 publications

References 7 publications

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Control Flow Optimization Via Dynamic Reconvergence Prediction

Chip multi-processor scalability for single-threaded applications

Contact Info

Product

Resources

About