CMSC 611 (Spring 2000) : Homework #5

Homework #5

CMSC 611, Spring 2000

Assigned: 18 Apr 2000
Due: 25 Apr 2000 at 5:45 PM

Problem 5.1 from the text.
You just purchased a new computer, and want to know whether there's enough extra main memory bandwidth to add a new peripheral. Your measurements of the computer have found the following information:
- There are I & D on-chip caches. The I-cache has a 96% hit rate and 8 word blocks, and the D-cache has a 90% hit rate and 2 word blocks. The D-cache is write-through. Hits are handled with no penalty, and writes are handled via a write buffer. Additionally, reads are given priority, so writes cause no delay to the L2 cache.
- There's a level 2 cache off-chip with a global hit rate of 99.8% (assume the same hit rate for data & instructions). Block size is 8 words, and 50% of cache blocks are dirty when replaced. The L2 access time is 10 ns.
- The bus can support multiple word operations (i.e., pay memory latency once and fetch multiple words).
- The main memory bus runs at 100 MHz, and the main memory has an access latency of 50 ns (this isn't overlapped with bus operations). A bus cycle may transfer an address to memory, data to or from memory, or both address and data. The bus is 64 data bits wide.
- The processor runs at 500 MHz and has a native CPI of 0.8 without memory accesses.
- 15% of all instructions are loads, and 8% are stores.
1. What are the local hit rates of the L2 cache for data & instructions?
2. What is the memory utilization of the main memory?
3. How much faster would the system run if the main memory bus were doubled to 128 bits wide?
Problem 5.5 from the text.
How does the use of a TLB affect memory system performance? To answer this question, assume that the CPU has split I & D caches, where the miss rate is 1% and 4% for I & D (respectively), and the miss penalty is 50 ns for either cache. Also assume that the TLB has a miss rate of 0.01% (i.e., 1 miss for every 10,000 instructions) and that the TLB is filled by a software trap handler routine that requires exactly 12 CPU instructions, for which the instructions are always in the I-cache but the data (4 words are fetched) is never in the D-cache (always misses and fetches from main memory). The CPU has a base CPI of 0.8 (not including any memory stalls) and runs at 1 GHz. How does the TLB penalty compare to the penalty imposed by regular cache misses?
Smith and Goodman [1983] found that, for a small instruction cache, a cache using direct mapping could consistently outperform one using fully associative mapping with LRU replacement. Explain why this would be possible (Hint: the three C's model won't work because it ignores replacement policy...). Describe a scenario where the fully-associative cache experiences a miss but the direct-mapped cache does not.
Problem 5.20 from the text.
Problem 5.21 from the text.