Homework #4

CMSC 611, Spring 2000

Assigned: 16 Mar 2000
Due: 30 Mar 2000 at 5:45 PM

  1. [10 points] Problem 4.2 in the text.
  2. [10 points] Problem 4.5 in the text. In addition, calculate the speedup gained from unrolling and scheduling the loop.
  3. [20 points] For the following code, show the state of a scoreboard and Tomasulo's algorithm immediately after issuing the last instruction. Indicate whether the CPU needed to stall between issuing the first instruction and the last one. In this CPU, loads have a 1 cycle latency, ADDD/SUBD has a 4 cycle latency, MULTD has a 5 cycle latency, and DIVD has a 16 cycle latency. Both scoreboard and Tomasulo's have one integer unit, two add/subtract units, two multipliers, and one divide unit and issue at most one instruction per cycle. The code is:
    LD    F8,0(R3)
    ADDD  F6,F4,F2
    SUBD  F4,F8,F0
    MULTD F10,F4,F0
    DIVD  F12,F8,F2
    LD    F14,0(R4)
    MULTD F0,F6,F8
  4. [15 points] Consider the following code. (The .... marks indicate instructions that are ignored in this example)
           .......
           ADDI R8, R0, #7
    LOOP1: ADDI R4, R0, #5
           .......
    LOOP2: SUBI R4, R4, #1 ....... BNEZ R4, LOOP2 ....... BNEZ R8, LOOP1 .......
    1. Analyze the branch behavior for the branch to the LOOP2 label. What percentage of the time is the branch instruction taken and not taken?
    2. Choose the best static branch prediction scheme for this example (including both LOOP1 and LOOP2). What percentage of the time will this static branch prediction be correct for LOOP2?
    3. The designers are going to try and use dynamic branch prediction and want to know about performance implications. Draw the state machine for a one-bit branch predictor. Be sure to clearly identify or define the meaning of each state. For the inner loop (LOOP2), what will be the misprediction rate of the one-bit branch predictor?
    4. Draw the state diagram for a 2-bit dynamic branch predictor. Again, clearly label all states. What will be the misprediction rate of the 2-bit branch predictor for LOOP2?
  5. [10 points] You are trying to improve the speed of a CPU with branch prediction by using a branch prediction buffer. The misprediction penalty for the buffer is 5 cycles, and the miss penalty for the buffer is 3 cycles. Branches successfully predicted do not stall the processor. If branches hit in the buffer 80% of the time and the buffer is 90% accurate, how much faster (or slower) is this scheme than a CPU that has a fixed penalty of 2 cycles for taken branches and 0 cycles for non-taken branches? Assume that branches make up 15% of the instructions, branches are taken 60% of the time, and that the base CPI is 1.2.
  6. [15 points] Problem 4.21 from the text.
  7. [20 points] Convert the following loop into DLXV assembly language and calculcate T200, R200, N1/2, and Rinfinity for the code run on a 400 Mhz DLXV processor with chaining, 2 memory pipelines, and latencies of 15 cycles, 8 cycles, and 6 cycles for vector loads/stores, multiplies, and adds respectively. Assume that the base addresses for arrays a, b and c are in Ra, Rb, and Rc respectively. You may also assume that N, x and y are contained in RN, Fx, and Fy respectively. Remember that you may need to handle the first odd lot of elements using separate code.
    for (i = 0; i < N; i++) {
      c[i] = (a[i] + x) * b[i] * y;
    }