CMSC 611 (Spring 2000) : Homework #1

Homework #1

CMSC 611, Spring 2000

Assigned: 8 Feb 2000
Due: 15 Feb 2000 at 5:45 PM (in class)

We are considering the addition of a vector processing unit to a CPU. The vector unit speeds up vectorizable floating point computations but doesn't affect the speed of integer computation or non-vectorizable FP. Assume that the vector unit provides a speedup of 12 over normal floating point computations. Your measurements of scientific programs have shown that 90% of the time is spent doing floating point.
1. Assume that the FP computations are 100% vectorizable. How much speedup would be gained?
2. What fraction of FP computations must be vectorized to get a speedup of 4 over the unvectorized CPU?
3. You are given a choice between increasing the vector speedup by a factor of 2 (to 24x) or increasing the overall clock rate by a factor of 1.5, thus speeding up integer, floating point, and vector computations. Which results in a faster CPU, assuming that the FP code is 80% vectorizable?
You've built a new computer with multimedia (MM) instructions, and discover that MM instructions account for 30% of the run time of a set of benchmarks. You know that the MM instructions are 4 times faster than the "normal" instructions that they replaced.
1. How much faster does the CPU run with MM instructions relative to its original (without MM) speed?
2. What percentage of the original execution time was converted to MM instructions?
You have measured the "ideal" CPI of a program to be 0.5 (superscalar issue allows CPIs to be below 1). However, this doesn't include the memory system. Assume that cache hits cost nothing and that cache misses cost 50 cycles. If instruction references miss 1% of the time and data references miss 4% of the time, what is the overall average CPI? Assume the instruction frequencies shown in Figure 1.17 in the text.
There are three possible enhancements for a computer system. The enhancements (A, B, and C) speed up the processor by a factor of 20x, 10x, and 5x respectively. However, the enhancements conflict with one another, so at most one can be active at any time.
1. Suppose enhancements A and B each run for 25% of the original execution time. What fraction of the original exeuction time must enhancement C run to get an overall speedup of 5?
2. If enhancements B and C each apply to 25% of the original execution time and enhancement A applies to 35% of the original execution time, what fraction of the new execution time (running with enhancements turned on) is unenhanced?
3. The results of running a benchmark suite have shown that 20% of the execution time could be improved by enhancement A, 30% could be improved by enhancement B, and 40% could be improved by enhancement C. If only one enhancement could be implemented, which should it be? If only two could be implemented, which two provide the most improvement?

Consider the following two programs, P1 & P2 (all times in seconds). Each program executes 500 million floating point operations.

	Computer A	Computer B	Computer C
P1	20	200	50
P2	300	5	50
Total time	320	205	100

Calculate the MFLOPS rating of each program.
What is the arithmetic, harmonic, and geometric mean of MFLOPS for each machine?
Which mean most accurately reflects total performance for the two programs? Assume the benchmark runs each program exactly once.

What are the risks of optimizing a CPU by benchmarks alone? Explain why CPU designers often optimize their programs for common benchmarks (SPEC, etc.), and list reasons why this approach to building faster computers may not result in the best performance for end users.