CMSC 201 Measures of Central Tendency Out: Wendesday 10/1/03 The design document for this project, design2.txt, is due: Before Midnight, Wednesday 10/8/03 |
The objective of this assignment is to give you some more practice writing functions and using separate compilation, and to give you some experience with using arrays, passing arrays to functions, and sorting an array of values. It will also test your ability to use top-down design.
There are three common means that can be associated with a group of values. You should already be familiar with the arithmetic mean, a.k.a. the average. The arithmetic mean is calculated by dividing the sum of all the values by the number of values.
arithMean = (val1 + val2 + val3 + . . . + valn) / n
where each vali is an individual value.
The second mean we'll work with is the geometric mean. It is calculated by taking the nth root of the product of all the values. The geometric mean is useful as a measure of central tendency, because it is less affected by extreme values than is the arithmetic mean. Here is the formula for the geometric mean.
geomMean = (val1 * val2 * val3 * . . . * valn)1/n
The last mean is known as the harmonic mean. It is defined as the reciprocal of the arithmetic mean of the reciprocals. The harmonic mean is given by the formula
harmonMean = 1 / ((1/val1 + 1/val2 + 1/val3 + . . . + 1/valn) / n )
Informally, the median is the middle value of a set of sorted values. If the number of values in the set is odd, then there is a middle value and that value is the median. If the number of values in the set is even, then the median is the sum of the two middle values divided by two. For example, if there are 25 values in the set of values and the values are sorted in ascending order, then the 13th value is the median, having 12 values below it and 12 values above it. If there are 10 values in the set and the values are sorted in ascending order, then the median can be calculated by adding the 5th and 6th values and then dividing the sum by two.
The mode is the value that occurs most frequently in a set of values, the most crowded bin of a histogram. It is possible that a set of values may be multimodal (have more than one mode). I will therefor guarantee that any sets of values given to you in the data files will have only one mode.
You are to write a program that calculates each of the three means, the median and the mode for any of the data files I provide.
Since the calculation of the geometric mean involves the taking of the nth root of the product of the values, you will need to use the pow() function found in the math library. In order to do this, you will need to #include <math.h> so that the prototype for the pow() function can be seen before a call is made to it within your program. The math libraries are not found in the same location as the other standard libraries, so you must also tell the compiler to link in the math library. This is done by linking using the -lm option. Here is an example:
linux1 [101] % gcc -Wall -ansi proj2.o means.o -lm
The pow() function has the following prototype:
double pow(double x, double y);
where the value returned is x y.
Since the determination of the median requires that the
set of values be sorted, you'll have to sort the array of
values. You may use the code for Selection Sort found in
lecture 9, which requires some minor modification.
Hint:
A sorted set of values may also be helpful in determining
the mode.
To submit your files, which are to be called proj2.c, means.c and means.h, you would type the following at the unix prompt:
submit cs201 Proj2 proj2.c means.c means.h (followed by any additional .c and .h files needed)
As always, you can check your submissions using the submitls command
submitls cs201 Proj2