Goal: Get some practice improving the performance of an application
First, create a local copy of the exercise files.
cd
tar xf ~jake/CSC/exercise6.tar
cd exercise6
source setup.sh
The directory contains a Stats class, which computes various statistical quantities from an array of integers. For examples of how it’s used, see the tests in the TestStats class.
Our goal is to calculate the standard deviation of the median of N numbers chosen at random. (You probably already know how to calculate the standard deviation of the mean, so instead we’re asking about the properties of the median) For example, if we take the median of 11 random numbers that are each between 0 and 1, the median will fluctuate a little because the random numbers vary. Our goal is to write a program to measure the standard deviation of that fluctuation.
To run the test suite,
./build
./test
Note that the tests pass! The Stats class is correct, but it is much too slow.
The Stats program takes two arguments:
time ./run 200 1001
The first argument is the number of runs (e.g. number of sets of separate sets of random numbers) to use when calculating the standard deviation. The second is the number of random numbers in a sample, e.g. we’re calculating the median of 1001 numbers each time. You can see how this works in the main() routine in Stats,java.
To get a profile after running the program:
./analysis
The goal is to reduce the total time needed to find the mean of 200 iterations of 50001 numbers, e.g. to print the correct stddev-median value while using the smallest possible CPU time as measured by:
time ./run 200 50001
I’ll buy the beers for the team with a working program, and the lowest time.
The program will be considered working if:
1) The Stats class continues to pass all the tests
2) The main() routine prints the correct value (within statistical fluctuations) for any input values, especially including our test case of 200 and 50001.
3) You are still using the standard random number generators, etc.
Don’t forget to recompile with ./build after changing the program!
Note that this is a test of how the Stats class does over 50001 entries, a factor of 50 more numbers than you initially ran. This is a particularly difficult test when you consider that the time taken goes as a power of the problem size! You need to make some big improvements. But you’ll also have to make sure that the output of the program is correct…