Solution to HW4 of CMSC611


 
 

Problem 1

Dependency:

    1. S2 uses a value computed by S1 in the same iteration;
    2. S3 uses a value computed by S1 in the same iteration;
Actually the value computed by S3 will be replaced by the value computed by next iteration. So S3 is useless.
for (i = 1; i<100; i = i+1)

{ a[i] = b[i] + c[i];

b[i] = a[i] + d[i];

}

a[100] = a[99] + e[99];

Problem 2

MULTD F0, F0, F4

LD F1, -8(R1)

LD F5, - 8(R2)

LD F3, -16(R1)

MULTD F1, F1, F5

ADDD F2, F0, F2

LD F6, -16(R2)

SUB R1, R1, 24

MULTD F3, F3, F6]

ADDD F2, F1, F2

BNEQZ R1, loop

SUB R2, R2, 24

ADDD F2, F2, F3

Speedup = 7/(13/3) = 1.62

Problem 3

State of scoreboard:

Instruction status
Instruction Issue Read Execution Write
LD F8,0(R3) Yes Yes Yes Yes
ADDD F6, F4, F2 Yes Yes Yes Yes
SUBD F4, F8, F0 Yes Yes Yes Yes
MULTD F10, F4, F0 Yes      
DIVD F12, F8, F2 Yes Yes    
LD F14, 0(R4) Yes Yes Yes Yes
MULTD F0, F6, F8 Yes      
Function units status
  BUSY OP Fi Fj Fk Qj Qk Rj Rk
Integer NO                
ADD1 NO                
ADD2 NO                
MUL1 Yes MUL F10 F4 F0 ADD2   Yes  Yes
MUL2 Yes MUL F0 F6 F8     Yes  Yes
DIV Yes DIV F12 F8 F2      No No
Register status
  F0 F2 F4 F6 F8 F10 F12
FU MUL         MUL DIV

 

State of Tomasulo's algorithm:

Instruction status
Instruction Issue Execute Write Result
LD F8, 0(R3) Yes Yes Yes
ADDD F6, F4, F2 Yes Yes Yes
SUBD F4, F8, F0 Yes Yes Yes
MULTD F10, F4, F0 Yes    
DIVD F12, F8, F2 Yes Yes Yes
LD F14, 0(R4) Yes Yes  
MULTD F0, F6, F8 Yes    

 
Reservation stations
Name Busy Op Vj Vk Oj Qk
Integer No          
ADD1 No          
ADD 2 No          
MUL1 Yes Mul   Regs[F0] ADD2  
MUL2 No          
DIV Yes DIV Regs[F8] Regs[F2]    
Register status
Field F0 F2 F4 F6 F8 F10 F12
            MUL DIV

 

Problem 4

a. For loop2, the percentage of the branch instruction taken is: 4/5 = 80%

the percentage of the branch instruction not taken is: 1/5 = 20%
b. For both loop1 and loop2, the best static branch prediction scheme is predict taken. Percentage of correct prediction is 80%.

c. Predictor was initialized to taken "T":
 
R4 =  Loop2

Prediction

Loop2

Action

New Loop2

Prediction

4

3

2

1

0

T

T

T

T

T

T

T

T

T

NT -x-->

T

T

T

T

NT

4

3

2

1

0

NT

T

T

T

T

T

T

T

T

NT -x-->

T

T

T

T

NT


 

One-bit predictor for Loop1:
R8= Loop1

Prediction

Loop1

Action

New Loop1

prediction

7 T T T
7 T T T

When outer loop is executed n times, the inner loop will executed 5n times. Every iteration, the predictor will predict wrongly 2 times except for the first time. At the first time the inner loop execute, it will predict wrong for 1 time. All the following, it will predict wrong 2 times out of 5 predictions. So the misprediction rate is (2n-1)/5n.The misprediction rate is about 40%.

d. Predictor was initialized to taken "T"(11):

The bits represent:( 00-NT,01-NT,10-T,11-T)

Two-bit predictor for Loop2:
R4= Loop2

Prediction

Loop2

Action

New Loop2

Prediction

 
4

3

2

1

0

T(11)

T(11)

T(11)

T(11)

T(11)

T

T

T

T

NT

T(11)

T(11)

T(11)

T(11)

T(10)

 
4

3

2

1

0

T(10)

T(11)

T(11)

T(11)

T(11)

T

T

T

T

NT

T(11)

T(11)

T(11)

T(11)

T(10)

 
         
Two-bit predictor for Loop1:
R8= Loop1

Prediction

Loop1

Action

New Loop1

Prediction

7

7

T(11)

T(11)

T

T

T(11)

T(11)


 

If we initialize the predictor to be "11", the misprediction rate is 0.

Problem 5

Branch penalty a:

80% * (1-90%) * 5 + 20% * 3 = 0.4 + 0.6 = 1

Branch penalty b:

60% * 2 + (1-60%) * 0 = 1.2

speedup = (1.2+1.2*15%)/(1.2+1 * 15%) = 1..022

Problem 6

1.Issue:

If there is an empty entry both in reservation buffer and in history buffer, we issue the instruction, otherwise stall it. Sending operands to the reservation station are available in the register file.
2.Execute:
If one or two of the operands is not yet available, monitoring CDB while waiting for register to be computed. This step checks the RAW hazard.
3.Write result:
when result available, write in on the CDB, from there into the reservation station for this result, and write back to register file, before doing it, make a original of this register in the "history buffer" and fill in history buffer fields. If it is a store instruction, put it in the store buffer, don't write the result memory for now.
Reservation stations
Name Busy Op Vj Vk Qj Qk Dest
Nult1 No Nult Mem[Reg[R1]] [Reg[F2]     #2
Nult2 No Nult Mem[Reg[R1]] Reg[F2]     #7

 
  History buffer
Entry Busy Instruction Register Value Speculative
1

2

3

4

5

6

7

8

9

10

No

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

LD F0, 0(R1)

MULT F4, F0, F2

SD 0(R1), F4

SUBI F1, F1, 8

BNEZ R1, loop

LD F0, 0(R1)

MULT F4, F0,F2

SD 0(R1), F4

SUBI R1, R1, 8

BNEZ R1, loop

F0

F4

R1

F0

F4

R1

Mem[Reg[R1]]

Reg[F0]xReg[F4]

#2

R1-8

Mem[Reg[R1]]

Reg[F0]xReg[F4]

#7

R1-8

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Yes

Register status (FP)
Field  F0  F2  F4  F6  F8  F10  F12  ... F30 
History  #6    #7            

The changes we need to make to figure 4.37:
 
Instruction status Wait until Action or bookkeeping
Issue Station and history buffer both available Change all the Recorder[] to History []
Execute Same  
Write Result After filling in the entry for history buffer Results go directly to destination reg. Keep original copy of reg in history buffer

 

Problem 7

SLL R2, RN, 3

ADD R2, R2, Ra

SLL R1, RN, 26

SRL R1, R2, 26

ADDI R3, R0, 64

MOVI2S VLR, R3

BEQZ R1, LOOP

MOVI2S VLR,R!

SLL R1, R1, 3

LOOP:LV V1, Ra

DDVS V1, V1, FX

LV V2, Rb

MULTV V1, V1, V2

MULTVS V3, V1, Fy

SV Rc, V

ADD Ra, Ra, R1

ADD Rb, Rb, R1

ADD Rc, Rc, R1

ADDI R1, R0, 512

MOVI2S VLR, R3

SUB R4, R2, Ra

BNZ R4, LOOP

There are 2 memory pipelines, for the 6 vector operations, we can arrange them in 2 convoys.

Tchime = 2 We assume Tloop = 15 Rbase = 0

Tstart = 15 + 6 + 15 + 8 + 8 + 15 = 67

T200 = Tbase + ceiling(n/MVL) * (Tloop + Tstart) + n * Tchime

= 0 + ceiling(200/64) * (15 + 67) + 200 * 2 = 728

R200 = 3 * 200 * 400 / 728 = 329.67

Cycles/element = lim 2n + n * (15 + 67) / 64 = +82/64 = 3.28

n->infinite

Rinfinite = 3 * 400 / 3.28 = 365.85

For N1/2: MFLOPS for N1/2 = 365.85/2 = 182.93 = 3 * N1/2/T1/2 =>T1/2 = 6.56 * N1/2

Assuming N1/2 <= 64, so Tn<=64 = 1 * 82 + 2n

-->82 + 2 * N1/2 = 6.56 * N1/2

-->N1/2 = 17.98 = 18