





## Split vs. unified caches

- Should there be a single or two caches in the system?
- Unified cache: all memory requests go through a single cache
  - + Requires less hardware
  - Has lower bandwidth
  - More opportunity for collisions
- Split I & D caches: instructions & data are stored in separate caches
  - Uses additional hardware
    - Some simplifications (I-cache is read-only)
  - + Higher bandwidth (2 is greater than 1)
  - + No collisions between data & instructions

## Cache performance

- Average memory access time (AMAT) is a useful measure to evaluate the performance of a memory-hierarchy configuration AMAT = hit time + miss rate \* miss penalty
- AMAT shows how much penalty the memory system imposes on each access (on average)
  - $\Rightarrow$  It can easily be converted into clock cycles for a particular CPU
- Leaving the penalty in nanoseconds allows two systems with different clock cycles times to be compared using a given memory system

11

Chapter 5

UMBC

6-Apr-00

Chapter 5



## Cache performance example

- Compute the CPI penalty separately for instructions and data
- First, figure out the miss penalty in terms of clock cycles: 100 ns/2 ns = 50 cycles
- Unified cache
  - Instruction access penalty is (0 + 1.35% \* 50) = 0.675 cycles
  - Data access penalty is (1 + 1.35% \* 50) = 1.675 cycles
  - Overall penalty is 0.675 + (1/3) \* 1.675 = 1.23 cycles per instruction
- Split cache
  - Instruction access penalty is (0 + 0.64% \* 50) = 0.32 cycles
  - Data access penalty is (0 + 4.82% \* 50) = 2.41 cycles
  - Overall penalty is 0.32 + (1/3) \* 2.41 = 1.12
- Split cache performs better => no stall on data accesses

6-Apr-00

15

## Effects of cache on CPU performance

- Low CPI machines suffer more relative to some fixed CPI memory penalty
  - A machine with a CPI of 5 suffers little from a 1 CPI penalty.
  - A processor with a CPI of 0.5 has its execution time tripled!
- Cache miss penalties are measured in cycles, not nanoseconds ⇒ A faster machine will stall more cycles on the same memory system
- Amdahl's Law raises its ugly head again
  - Fast machines with low CPI are affected significantly from memory access penalties
  - Fast machines spend most of their time accessing memory!

