CMSC 611 (Spring 2000) : Homework #2

Homework #2

Assigned: 17 Feb 2000
Due: 24 Feb 2000 at 5:45 PM

Problem 2.1 from the text.
The PowerPC chip (used in the Macintosh) includes an addressing mode that updates the index register for a load or store with the memory address computed by the instruction. This allows the instruction sequence
lwz r4, 8(r8) addic r8, r8, #8
to be replaced with
lwzu r4, 8(r8)
which will update r8 to r8+8 as well as performing the load (lwz and lwzu are loads; this addressing mode is available for stores as well). This new addressing mode does not affect the CPI, but might affect the instruction count and/or the clock speed. Assume the instruction frequencies are those for espresso in Figure 2.26. Also, assume the CPI values are those listed in Problem 4.
Including this instruction decreases the clock speed from 600 MHz to 550 MHz. What fraction of load & store instructions must use the updating addressing mode for the CPU speed to remain the same? Remember that converting a load or store to use updating addresses reduces the instruction count by eliminating an add or subtract.
Do problem 2.3 from the text, but use the following code sequence instead of the one listed in the book:
A = X[B] + Y[B] E = Y[B] + Z[B]

Compute the effective CPI for DLX. You've measured the following average CPI for each instruction class:

Instruction class	Clock cycles
ALU instructions & FP move	0.8
Loads (integer or FP)	1.7
Stores (integer or FP)	1.0
Conditional branches (taken)	2.2
Conditional branches (not taken)	0.6
Jumps	1.6
FP add, subtract, compare	2.1
FP multiply	2.5
FP divide	4.8
Other FP	5.5

Assume that branches are taken 60% of the time and that miscellaneous integer instructions take the same time as ALU instructions.

What is the effective CPI for compress, using the instruction frequencies in Figure 2.26?
What is the effective CPI for hydro2d, using the instruction frequencies in Figure 2.27?
If the FP unit were removed from DLX and each FP operation were replaced with one or more integer instructions, how would this affect the CPI for hydro2d? How would it affect execution time for hydro2d? Don't compute an exact answer; instead, comment on what would be likely to happen.

Compile the Dhrystone benchmark (available on the Web at netlib) on an SGI in the cs.umbc.edu domain using maximum optimization (gcc -O3) and produce an assembly language program (the -S option). Count the number of each type of instruction (you can do this by having the compiler produce assembly code, and tracing the loops in the original C code as well) that the benchmark executes. Compare this to the instruction frequencies in Figure 2.26 from the text. How do the instruction frequencies compare? Is dhrystone a good match for any particular program? If so, which one(s)?