CMSC 201
Programming Project Two

Measures of Central Tendency

Out: Wendesday 10/1/03
Due: Midnight Wednesday 10/15/03

The design document for this project, design2.txt, is due: Before Midnight, Wednesday 10/8/03

The Objective

The objective of this assignment is to give you some more practice writing functions and using separate compilation, and to give you some experience with using arrays, passing arrays to functions, and sorting an array of values. It will also test your ability to use top-down design.

The Background

There are three common means that can be associated with a group of values. You should already be familiar with the arithmetic mean, a.k.a. the average. The arithmetic mean is calculated by dividing the sum of all the values by the number of values.

arithMean = (val1 + val2 + val3 + . . . + valn) / n

where each vali is an individual value.

The second mean we'll work with is the geometric mean. It is calculated by taking the nth root of the product of all the values. The geometric mean is useful as a measure of central tendency, because it is less affected by extreme values than is the arithmetic mean. Here is the formula for the geometric mean.

geomMean = (val1 * val2 * val3 * . . . * valn)1/n

The last mean is known as the harmonic mean. It is defined as the reciprocal of the arithmetic mean of the reciprocals. The harmonic mean is given by the formula

harmonMean = 1 / ((1/val1 + 1/val2 + 1/val3 + . . . + 1/valn) / n )

Two other measures of central tendency that we will investigate are the median and the mode .

Informally, the median is the middle value of a set of sorted values. If the number of values in the set is odd, then there is a middle value and that value is the median. If the number of values in the set is even, then the median is the sum of the two middle values divided by two. For example, if there are 25 values in the set of values and the values are sorted in ascending order, then the 13th value is the median, having 12 values below it and 12 values above it. If there are 10 values in the set and the values are sorted in ascending order, then the median can be calculated by adding the 5th and 6th values and then dividing the sum by two.

The mode is the value that occurs most frequently in a set of values, the most crowded bin of a histogram. It is possible that a set of values may be multimodal (have more than one mode). I will therefor guarantee that any sets of values given to you in the data files will have only one mode.

The Task

You are to write a program that calculates each of the three means, the median and the mode for any of the data files I provide.

Since the calculation of the geometric mean involves the taking of the nth root of the product of the values, you will need to use the pow() function found in the math library. In order to do this, you will need to #include <math.h> so that the prototype for the pow() function can be seen before a call is made to it within your program. The math libraries are not found in the same location as the other standard libraries, so you must also tell the compiler to link in the math library. This is done by linking using the -lm option. Here is an example:

linux1 [101] % gcc -Wall -ansi proj2.o means.o -lm

The pow() function has the following prototype:

double pow(double x, double y);

where the value returned is x y.

Since the determination of the median requires that the set of values be sorted, you'll have to sort the array of values. You may use the code for Selection Sort found in lecture 9, which requires some minor modification.
Hint: A sorted set of values may also be helpful in determining the mode.

Specific Requirements

Sample Data File

Here is the contents of one of the data files. It is included here so that you can see the format of all of the data files. You should copy this file into your proj2 directory from my directory using the following command:

cp /afs/umbc.edu/users/s/b/sbogar1/pub/meansdata1.dat .

Do NOT cut and paste the file from this page since you may inadvertently capture newline characters where they shouldn't be. You should also copy the other data files that are available for this project using similar copy commands. 12.50 13.28 14.94 10.27 12.50 29.67 35.73 9.72 5.84 6.03 12.50 10.27 15.62 11.55 22.69 -1.00

Sample Run

The data values are : 12.500 13.280 14.940 10.270 12.500 29.670 35.730 9.720 5.840 6.030 12.500 10.270 15.620 11.550 22.690 The sorted data values are : 5.840 6.030 9.720 10.270 10.270 11.550 12.500 12.500 12.500 13.280 14.940 15.620 22.690 29.670 35.730 Arithmetic mean : 14.874 Geometric mean : 13.149 Harmonic mean : 11.780 Median : 12.500 Mode : 12.500

Submitting the Program

To submit your files, which are to be called proj2.c, means.c and means.h, you would type the following at the unix prompt:

submit cs201 Proj2 proj2.c means.c means.h (followed by any additional .c and .h files needed)

As always, you can check your submissions using the submitls command

submitls cs201 Proj2