1. a. Fraction enhanced = 100% * 90% =0.9
Speedup enhanced
= 12
So the overall
speedup = 1 / ( 0.1 + 0.9/12) = 5.714
b. Suppose f stands for the fraction enhanced,
then we have
1/ ( 1-f + f/12)
= 4 , then we get f = 9 / 11
And since 90 % of
the time is spent doing floating point computations, then the
fraction of FP computations
must be vectorizble is f / 90 % = ( 9/ 11 ) / 0.9 = 10 / 11 = 0.909
c. Fraction enhanced = 90% * 80% = 0.72
In the first choice,
speedup enhanced = 24 ,
so the overall Speedup
= 1/ (1-0.72 + 0.72 / 24) = 3.226
In the second choice,
speedup enhanced = 12 ,
the overall clock
rate is increased by a factor of 1.5,
the overall speedup
= ( 1/ ( 0.28+ 0.72/12) ) * 1.5 = 4.412
So
the second choice is faster.
2. b. Let x be the percentage, and the original
execution time is 1,
we have
: (x /4) / (1-x+x/4) = 30% , x = 12 / 19 = 0.632
a. Since the Fraction enhanced
= 0.632 and Speedup enhanced = 4
the overall
Speedup = 1 / ( 1-0.632 + 0.632 / 4 ) = 1.901
3. Data reference per instruction = loads + stores = 21 % + 12 % = 33%
The overall average CPI = CPU execution cycles
per instruction + Memory stall cycles per instruction
= 0.5 + ( instruction references per instruction * instruction
reference miss rate +
data references per instruction * data reference miss rate) * miss penalty
= 0.5 + ( 1 * 1% + 33 % * 4%) * 50
= 0.5 + 1.16 = 1.66
4. a. Let x be the needed fraction,
we have:
25 % / 20 + 25% / 10 + x / 5 + (1-25% *2 -x) = 1/5
x = 0.422
b. The fraction unenhanced of the
original execution time = 1- 35% - 25% *2 = 0.15
Suppose the original execution time is 1, then
the enhanced new execution time = 25% / 10 + 25% /5 + 35 % / 20 + 0.15
= 0.2425
The fraction of the new execution time unenhanced = 0.15 / 0.2425 = 0.619
c. If only one enhancement could
be implemented, then we compute the speedup of each one,
Speedup A = 1 / ( 1- 20% + 20% / 20 ) = 1.235
Speedup B = 1 / ( 1- 30% + 30% / 10 ) = 1.370
Speedup C = 1 / ( 1- 40% + 40% / 5 ) = 1.471
So we should choose the enhancement of C.
If two
enhancement could be implemented, we compute the speedup with every combination
of two,
Speedup AB = 1 / ( 20% /20 + 30% /10 + 50% ) = 1.852
Speedup BC = 1 / ( 30% /10 + 40% /5 + 30% ) = 2.439
Speedup AC = 1 / ( 20% /20 + 40% /5 + 40% ) = 2.041
So we should choose the enhancement of B and C.
5. a. The MFLOPS rating of program P1 in computer A = 500 / 20
= 25
The MFLOPS rating
of program P2 in computer A = 500 / 300 = 5 / 3 = 1.667
The MFLOPS rating of
program P1 in computer B = 500 / 200 = 2.5
The MFLOPS rating
of program P2 in computer B = 500 / 5 = 100
The MFLOPS rating of
program P1 in computer C = 500 / 50 = 10
The MFLOPS rating
of program P2 in computer C = 500 / 50 = 10
b. The arithmetic mean of MFLOPS for machine A =
( 25 + 5/3 ) / 2 = 13.333
The arithmetic mean of MFLOPS
for machine B = ( 2.5 + 100) / 2 = 51.25
The arithmetic mean of MFLOPS
for machine C = ( 10 + 10 ) / 2 = 10
The harmonic mean of MFLOPS
for machine A = 2 / ( 1/25 + 3/5 ) = 3.125
The harmonic mean of MFLOPS
for machine B = 2 / ( 1/2.5 + 1/100 ) = 4.878
The harmonic mean of MFLOPS
for machine C = 2 / ( 1/10 + 1/10 ) = 10
If normalized to C, we have:
A | B | C | |
Program P1 | 2.500 | 0.250 | 1.000 |
Program P2 | 0.167 | 10.000 | 1.000 |
Geometric mean | 0.646 | 1.581 | 1.000 |
c. Since the benchmark runs each program exactly
once, the strong drawback
of geometric mean is avoided
in this case. Therefore, the geometric mean reflects total
performance for the two
programs most accurately. Because the the Geometric mean of
normalized execution times
is independent of the running times of the individual programs,
also, it doesn't matter
which machine is used to normalize. So the geometric mean will be
less
misleading than the arithmetic
mean.
6. Need to address the following three aspects
a. Benchmarks may not represent real user programs.
b. Performance improvement measured by benchmarks
may not be the case for real user programs
c. advantages of using benchmarks.