Homework #1
CMSC 611, Spring 2000
Assigned: 8 Feb 2000
  Due: 15 Feb 2000 at 5:45 PM (in class)
  - We are considering the addition of a vector processing unit to a CPU. The 
    vector unit speeds up vectorizable floating point computations but doesn't 
    affect the speed of integer computation or non-vectorizable FP. Assume that 
    the vector unit provides a speedup of 12 over normal floating point computations. 
    Your measurements of scientific programs have shown that 90% of the time is 
    spent doing floating point. 
    
      - Assume that the FP computations are 100% vectorizable. How 
        much speedup would be gained? 
      
- What fraction of FP computations must be vectorized to get 
        a speedup of 4 over the unvectorized CPU? 
      
- You are given a choice between increasing the vector speedup 
        by a factor of 2 (to 24x) or increasing the overall clock rate by a factor 
        of 1.5, thus speeding up integer, floating point, and vector computations. 
        Which results in a faster CPU, assuming that the FP code is 80% vectorizable? 
    
 
- You've built a new computer with multimedia (MM) instructions, and discover 
    that MM instructions account for 30% of the run time of a set of benchmarks. 
    You know that the MM instructions are 4 times faster than the "normal" 
    instructions that they replaced. 
    
      - How much faster does the CPU run with MM instructions relative 
        to its original (without MM) speed? 
      
- What percentage of the original execution time was converted 
        to MM instructions? 
    
 
- You have measured the "ideal" CPI of a program to be 0.5 (superscalar 
    issue allows CPIs to be below 1). However, this doesn't include the memory 
    system. Assume that cache hits cost nothing and that cache misses cost 50 
    cycles. If instruction references miss 1% of the time and data references 
    miss 4% of the time, what is the overall average CPI? Assume the instruction 
    frequencies shown in Figure 1.17 in the text. 
  
- There are three possible enhancements for a computer system. The enhancements 
    (A, B, and C) speed up the processor by a factor of 20x, 10x, and 5x respectively. 
    However, the enhancements conflict with one another, so at most one can be 
    active at any time. 
    
      - Suppose enhancements A and B each run for 25% of the original execution 
        time. What fraction of the original exeuction time must enhancement C 
        run to get an overall speedup of 5?
- If enhancements B and C each apply to 25% of the original execution 
        time and enhancement A applies to 35% of the original execution time, 
        what fraction of the new execution time (running with enhancements turned 
        on) is unenhanced?
- The results of running a benchmark suite have shown that 20% of the 
        execution time could be improved by enhancement A, 30% could be improved 
        by enhancement B, and 40% could be improved by enhancement C. If only 
        one enhancement could be implemented, which should it be? If only two 
        could be implemented, which two provide the most improvement?
 
- Consider the following two programs, P1 & P2 (all times in seconds). 
    Each program executes 500 million floating point operations. 
    
       
        |  | Computer A | Computer B | Computer C |   
        | P1 | 20 | 200 | 50 |   
        | P2 | 300 | 5 | 50 |   
        | Total time | 320 | 205 | 100 |  
 
      - Calculate the MFLOPS rating of each program.
- What is the arithmetic, harmonic, and geometric mean of MFLOPS for each 
        machine?
- Which mean most accurately reflects total performance for the two programs? 
        Assume the benchmark runs each program exactly once.
 
 
- What are the risks of optimizing a CPU by benchmarks alone? Explain why 
    CPU designers often optimize their programs for common benchmarks (SPEC, etc.), 
    and list reasons why this approach to building faster computers may not result 
    in the best performance for end users.