Reducing misprediction penalty in the Branch Target Buffer

Abdelhak, Sherine; Sil, Abhijit; Wang, Yi; Tzeng, Nian-Feng; Bayoumi, Magdy

doi:10.1109/mwscas.2007.4488750

Cited by 3 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They are divided into two categories: (1) those which save the target instruction(s) without the target address, (2) those which save the target instruction(s) in addition to the target address .In the first case, the target address should be eventually calculated before fetching can start, which introduces an unnecessary delay [7]. These two categories, however, assumed that the cached instruction could be directly fed into the pipeline, skipping the fetch stage.…”

Section: Related Workmentioning

confidence: 99%

Implementation of branch delay in Superscalar processors by reducing branch penalties

Khanna

Verma²,

Biswas

et al. 2010

2010 IEEE 2nd International Advance Computing Conference (IACC)

View full text Add to dashboard Cite

Branch prediction is crucial to maintaining high performance in modern Superscalar processor. Today's Superscalar processors achieve high performance by executing multiple independent instructions in parallel. One of the most impedement to the performance of wide-issue superscalar processor is the presence of conditional branches. Conditional branches can occur as frequently as one in every 5 or 6 instructions, leading to heavy misprediction penalties in superscalar architectures. Ideal speed-up in superscalar processor is seldom achieved due to stalls and breaks in the execution stream. These interrupts are caused by data and control hazards which deteroits the superscalar processor performance. Branch target buffer (BTB) can reduces the performance penalty of branches in superscalar processor by predicting the path of the branch and caching information used by the branch. No stalls will be encountered if the branch entry is found in the BTB and prediction is correct. Otherwise, the penalty will be of atleast '2' cycles. This paper proposes an algorithm for superscalar processor based on changing the BTB structure to eliminate the misprediction penalty. It also highlights a problem in the previous BTB algorithm (nested branches problem) and proposes a solution to it.

show abstract

Section: Related Workmentioning

confidence: 99%

Implementation of branch delay in Superscalar processors by reducing branch penalties

Khanna

Verma²,

Biswas

et al. 2010

2010 IEEE 2nd International Advance Computing Conference (IACC)

View full text Add to dashboard Cite

show abstract

“…It is necessary to consider performance improvement and power consumption during a processor design. Although most processors use a delayed branch for normal pipeline operations, pipeline operation delay caused by a delayed branch has become a serious obstacle for improving performance [1][2][3][4].Caches have become the most basic and the most important elements determining the performance of high performance microprocessors with unsatisfyingly limited memory bandwidth provided by the systems. In addition, most of the dynamic power of a processor is dissipated on the clock and data-path related circuits.…”

mentioning

confidence: 99%

mentioning

confidence: 99%

Performance Improvement and Power Consumption Reduction of an Embedded RISC Core

Jung¹,

Jin²,

Ryoo

2012

Journal of information and communication convergence engineerin

View full text Add to dashboard Cite

AS the densities of very-large-scale integration (VLSI) circuits and process technologies have been developed quickly in the past several years, performances of embedded processors have been greatly improved and used in many embedded systems including network systems, communication systems, and household appliances. Various technical approaches to performance have been applied during the process implementation of these systems. This is attributed to the development of embedded processors. It is necessary to consider performance improvement and power consumption during a processor design. Although most processors use a delayed branch for normal pipeline operations, pipeline operation delay caused by a delayed branch has become a serious obstacle for improving performance [1][2][3][4].Caches have become the most basic and the most important elements determining the performance of high performance microprocessors with unsatisfyingly limited memory bandwidth provided by the systems. In addition, most of the dynamic power of a processor is dissipated on the clock and data-path related circuits. This is a central part of a low-power design to reduce the dynamic power dissipated on high-activity lines [5,6]. In this paper, we propose architectures for improving the power and performance of an embedded processor. The architectures are the branch predictor, 4-way set-associative cache architecture for performance improvement and clockgating logics using observability don't care (ODC) conditions for a low-power embedded processor [7].The rest of this paper is organized as follows. Section II briefly reviews the OpenRISC core. In section III, we present the architecture of the 4-way set-associative cache. Section IV shows the dynamic branch prediction algorithm that uses branch target buffer (BTB). Section V presents the AbstractThis paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of an embedded RISC core and a clock-gating algorithm with observability don't care (ODC) operation to reduce the power consumption of the core. The branch prediction algorithm has a structure using a branch target buffer (BTB) and 4-way set associative cache that has a lower miss rate than a direct-mapped cache. Pseudo-least recently used (LRU) policy is used for reducing the number of LRU bits. The clock-gating algorithm reduces dynamic power consumption. As a result of estimation of the performance and the dynamic power, the performance of the OpenRISC core applied to the proposed architecture is improved about 29% and the dynamic power of the core with the Chartered 0.18 μm technology library is reduced by 16%.

show abstract

An energy-efficient 32-bit RISC processor for sensor platform in 90nm technology

Sil

Balusu

Yalamanchili

et al. 2012

2012 International Conference on Energy Aware Computing

View full text Add to dashboard Cite

Reducing misprediction penalty in the Branch Target Buffer

Cited by 3 publications

References 13 publications

Implementation of branch delay in Superscalar processors by reducing branch penalties

Implementation of branch delay in Superscalar processors by reducing branch penalties

Performance Improvement and Power Consumption Reduction of an Embedded RISC Core

An energy-efficient 32-bit RISC processor for sensor platform in 90nm technology

Contact Info

Product

Resources

About