Revisions to PA14 for the final report
Report 1st draft due: Monday 12/08, final draft due: Tuesday 12/16.
-
Verify the MPI process scheduling policy on the OS cluster and HPC cluster. Here is
a program hello.c
that prints the hostname of the node where each MPI process resides, and
verify the difference between different scheduling policies by doing a pair-wise ping-pong
message test. (The instructor wrote the program and tested it on the OS cluster. It seems that by
repeating each node name in the machinefile the neighboring MPI processes are indeed
assigned to the same node. In this case, the ping-pong throughput is considerably higher
than scheduling them to different nodes. Please also run it on the HPC cluster, just to make sure).
-
Use the peak performance rather than the average performance from benchmarking
results.
-
In error calculation, report average and 95% confidence interval.
-
Compute and report the prediction error for the Matrix Multiplication program ( third bullet in PA14 below, posted on Nov 28).
-
Report the error of prediction with points selection (second bullet in PA14 below, posted on Nov 28), for the "mixed" I/O configuration (16 data points) and the MM computation (14 data points).
Project Assignment 14
Attention: all slides and drafts due on noon, Wed, Dec 3rd.
Continue prediction evaluation
-
Improve the error evaluation by tossing out known points in the average error calculation.
-
For existing MPI-IO prediction with 16 data points, try to select 2, 3, 4, and 5
evenly spread points in the target curve as known points. Perform the prediction and
compare the error with the average error from the random-point predictions (with corresponding
number of known points).
Extend experiments to computational modules
-
Benchmark and perform predictions using a parallel matrix multiplication code. When doing
the prediction, fix the number of processors (say 16) and the array distribution style
(say BLOCK style). The purpose is to simulate simply feeding matrices to a black-box
matrix computation library. The x-axis are formed by input matrix sizes and shapes:
C = A * B, where C is m*n, A is m*k and B is k*n.
Let s be a relatively large number (depending on how large the matrices will be,
for example, 8096). Benchmark the following 14 input
size combinations on both clusters:
| m |
n |
k |
| s |
s |
s |
| s |
s |
1 |
| s |
1 |
s |
| 1 |
s |
s |
| 1 |
s |
1 |
| s |
1 |
1 |
| 1 |
1 |
s |
| s |
s |
s/2 |
| s/2 |
s |
s |
| s |
s/2 |
s |
| s/2 |
s |
s/2 |
| s/2 |
s/2 |
s |
| s |
s/2 |
s/2 |
| s/2 |
s/2 |
s/2 |
Each matrix element should be a double number.
Perform predictions with the interpolation, translation, and scaling methods.
Draft the term project report.
-
Write a preliminary version of the term report using the given latex template. Turn in a PDF or PS file.
Requirements
- For all benchmarks, schedule processors on the same chassis at the HPC cluster
- Use the machinefile to schedule processors at the OS cluster in the same way as the HPC cluster does.
- Plot standard deviation on the result curve.