Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Parallelize the program using OpenMP. To do this compile with When compiling and

ID: 673553 • Letter: P

Question

Parallelize the program using OpenMP. To do this compile with

When compiling and running you must NOT use gollum itself but one of the nodes node1 to node8. Present the results as a table of run time and spedup (wrt OpenMP on 1 thread). Speedup is defined as T1/Tp where T1 is the time on 1 thread and Tp is the time on p threads. Comment on the results.

The preferred way to time the codes is to use omp_set_num_threads(number) to set the threads and omp_get_wtime() to time the relevant part e.g.

Implement and check the time to do a matrix-matrix product of a 1500x900 matrix A with a 900x1200 matrix B of doubles using 1,2,3,4,5,6, 7 and 8 threads on a gollum node.
Define Aij = (i+1)*(j+1) and Bij = 1/((double) (i+1)* (double)(j+1)).
The result matrix C=A*B should be Cij= 900*(double)(i+1)/(double) (j+1) . You should check that the result is correct in each case by comparing A*B with a matrix C with these values.

Modify the code to run similar tests using static, dynamic with chunk size 1 and guided scheduling with chunk size 10.

List the various run times and speedup in seperate tables for each type of scheduling and comment on the speedup of each scheduling type and draw conclusions as to which is preferable (and in what circumstances).

Note that to implement this program you will need to increase the default stack size for both the original thread and the OMP threads. If you do not you will get a segmentation error. To do this you need to execute shell commands similar to:


Note: when timing the output remember not to print anything while timing.

Language used is C Language. As extension of C Language is OpenMP

Explanation / Answer

for example #include #include #include #define NRA 62 /* number of rows in matrix A */ #define NCA 15 /* number of columns in matrix A */ #define NCB 7 /* number of columns in matrix B */ int main (int argc, char *argv[]) { int tid, nthreads, i, j, k, chunk; double a[NRA][NCA], /* matrix A to be multiplied */ b[NCA][NCB], /* matrix B to be multiplied */ c[NRA][NCB]; /* result matrix C */ chunk = 10; /* set loop iteration chunk size */ /*** Spawn a parallel region explicitly scoping all variables ***/ #pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k) { tid = omp_get_thread_num(); if (tid == 0) { nthreads = omp_get_num_threads(); printf("Starting matrix multiple example with %d threads ",nthreads); printf("Initializing matrices... "); } /*** Initialize matrices ***/ #pragma omp for schedule (static, chunk) for (i=0; i