



## Location of the branch prediction buffer

- "Special cache"
  - Accessed during IF (with the PC)
  - Prediction bits used during ID if the instruction is decoded as a branch
- Instruction cache
  - Requires more space (the instruction cache is usually much larger than the "special cache")
  - Reduces the likelihood that "conflicts" occurs between different branches
- Accuracy of branch prediction
  - Misprediction rates range from 1% to 18% (using a 4K entry branch prediction buffer)
  - Static rates are around **30%** for many programs

## Improving accuracy: correlated predictions

- The accuracy of our predictor is critical to exploiting more ILP
- How can we improve accuracy?
  - Increasing the size of the cache does not help (much)
  - Increasing the number of bits beyond 2 does not help (much)
- Consider the behavior of "surrounding" branches?
  - Works particularly well if there are common "paths" through code that require several branches, as in the following code:
    - if (aa == 2) // B1 aa = 0;if (bb == 2) // B2
    - bb = 0;
    - if (aa != bb) ... // B3
  - B3 is correlated with B1 and B2
    - $\Rightarrow$  If both *if* statements are **TRUE**, then (aa != bb) is **FALSE**

Chapter 4

6-Mar-00

Chapter 4





Chapter 4

15

6-Mar-00

UMBC

CMSC 611 (Advanced Computer Architecture), Spring 2000

Chapter 4

16





## Dynamic superscalar CPUs today

- Modern CPUs may have
  - 2+ integer ALUs
  - Load/store (memory) unit
  - Branch unit
- CPU attempts to keep each functional unit busy
  - Extensive dynamic scheduling to work around many RAW hazards
    - Integer instructions can now have RAW hazards!
    - Lots of dynamic reordering to keep the units busy
  - FP/integer conflicts often less of an issue: not much FP computation
- Branch delays are a huge problem

184 July

2 cycle delay is up to 11 lost instructions for 4-way issue (3 in the same cycle, 4 each in following cycles)

6-Mar-00

UMBC CMSC 611 (Advanced Computer Architecture), Spring 2000 C

Chapter 4 25