Introduction to Parallel Computer Architecture Gaussian Elimination using OpenMP

Introduction to Parallel Computer Architecture

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Gaussian Elimination using OpenMP

Instructor: Prof. Naga Kandasamy, ECE Department, Drexel University

January

2

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

4, 20

1

8

The assignment is due on 9 February by 11:59 pm. You may work on this assignment in a team of

up to two people.

Consider the problem of solving a system of linear equations of the form

a0,0x0 + a0,1×1 + · · · + a0,n−1xn−1 = b0,
a1,0x0 + a1,1×1 + · · · + a1,n−1xn−1 = b1,

. .

. .

. . . .

an−1,0×0 + an−1,1×1 + · · · + an−1,n−1xn−1 = bn−1.

In matrix notation, the above system is written as Ax = b where A is a dense n × n matrix of
coefficients such that A[i, j] = ai,j, b is an n × 1 vector [b0, b1, . . . , bn−1]

T , and x is the desired
solution vector [x0, x1, . . . , xn−1]

T . From here on, we will denote the matrix elements ai,j and xi
by A[i, j] and x[i], respectively. A system of equations Ax = b is usually solved in two stages.
First, through a set of algebraic manipulations, the original system of equations is reduced to an

upper triangular system of the form

x0 + u0,1×1 + u0,2×2 + · · · + u0,n−1xn−1 = y0,
x1 + u1,2×2 + · · · + u1,n−1xn−1 = y1,

. .
. .

xn−1 = yn−1.

We write the above system as Ux = y, where U is a unit upper-triangular matrix, that is, one where
the subdiagonal entries are zero and all principal diagonal entries are equal to one. More formally,

U[i, j] = 0 if i > j, otherwise U[i, j] = ui,j, and furthermore, U[i, i] = 1 for 0 ≤ i < n. In the second stage of solving a system of linear equations, the upper-triangular system is solved for the

variables in reverse order, from x[n − 1] to x[0] using a procedure called back-substitution.

1

1: procedure GAUSS ELIMINATE(A, b, y)
2: int i, j, k;
3: for k := 0 to n − 1 do
4: for j := k + 1 to n − 1 do
5: A[k, j] := A[k, j]/A[k, k]; /* Division step. */
6: end for

7: y[k] := b[k]/A[k, k];
8: A[k, k] := 1;
9: for i := k + 1 to n − 1 do

10: for j := k + 1 to n − 1 do
11: A[i, j] := A[i, j] – A[i, k] × A[k, j]; /* Elimination step. */
12: end for

13: b[i] := b[i] − A[i, k] × y[k];
14: A[i, k] := 0;
15: end for

16: end for

A serial implementation of a simple Gaussian elimination algorithm is shown above. The algorithm

converts the system of linear equations Ax = b into a unit upper-triangular system Ux = y. We
assume that the matrix U shares storage with A and overwrites the upper-triangular portion of A.
So, the element A[k, j] computed in line 5 of the code is actually U[k, j]. Similarly, the element
A[k, k] that is equated to 1 in line 8 is U[k, k]. Also, our program assumes that A[k, k] 6= 0 when
it is used as a divisor in lines 5 and 7. So, our implementation is numerically unstable, though

it should not be a concern for this assignment. For k ranging from 0 to n − 1, the Gaussian
elimination procedure systematically eliminates the variable x[k] from equations k + 1 to n − 1 so
that the matrix of coefficients becomes upper-triangular. In the kth iteration of the outer loop (line
3), an appropriate multiple of the kth equation is subtracted from each of the equations k + 1 to
n − 1.

This problem asks you to develop a parallel formulation of GAUSS ELIMINATE using OpenMP.

The program given to you accepts no arguments. The CPU computes the reference (or sin-

gle threaded) solution which is compared with the result provided by the multi-threaded imple-

mentation. If the solutions match within the specified tolerance, the application will print out

“Test PASSED” to the screen before exiting. Edit the gauss eliminate using openmp()

function in gauss eliminate.c to complete the functionality of Gaussian elimination using

OpenMP. Build the code as follows:

Submit all of the files needed to run your code as a single zip file via BBLearn. Also include

a brief report describing how you designed your multi-threaded code, using code or pseudocode

to help the discussion, and the amount of speedup obtained over the serial version for 2, 4, 8,

and 16 threads, for the following matrix sizes: 1024 × 1024, 2048 × 2048, 4096 × 4096, and
8192 × 8192.

2

#ifndef _MATRIXMUL_H_
#define _MATRIXMUL_H_
#define MATRIX_SIZE 1024
#define NUM_COLUMNS MATRIX_SIZE /* Number of columns in Matrix A. */
#define NUM_ROWS MATRIX_SIZE /* Number of rows in Matrix A. */
/* Matrix Structure declaration. */
typedef struct {
unsigned int num_columns; /* Width of the matrix represented. */
unsigned int num_rows; /* Height of the matrix represented. */

/* Mumber of elements between the beginnings of adjacent rows in the memory layout.
* This is useful for representing sub-matrices.
* */
unsigned int pitch;
float* elements; /* Pointer to the first element of the matrix represented. */
} Matrix;
#endif // _MATRIXMUL_H_

/* Gaussian elimination code.
* Author: Naga Kandasamy, 10/24/2015
*
* Compile as follows:
* gcc -o gauss_eliminate gauss_eliminate.c compute_gold.c -fopenmp -std=c99 -O3 -lm
*/
#include
#include
#include
#include
#include
#include
#include
#include “gauss_eliminate.h”
#define MIN_NUMBER 2
#define MAX_NUMBER 50
extern int compute_gold(float*, const float*, unsigned int);
Matrix allocate_matrix(int num_rows, int num_columns, int init);
void gauss_eliminate_using_openmp(const Matrix, Matrix);
int perform_simple_check(const Matrix);
void print_matrix(const Matrix);
float get_random_number(int, int);
int check_results(float *, float *, unsigned int, float);

int
main(int argc, char** argv) {
if(argc > 1){
printf(“Error. This program accepts no arguments. \n”);
exit(0);
}
/* Allocate and initialize the matrices. */
Matrix A; /* The N x N input matrix. */
Matrix U; /* The upper triangular matrix to be computed. */

srand(time(NULL));

A = allocate_matrix(MATRIX_SIZE, MATRIX_SIZE, 1); /* Create a random N x N matrix. */
U = allocate_matrix(MATRIX_SIZE, MATRIX_SIZE, 0); /* Create a random N x 1 vector. */

/* Gaussian elimination using the reference code. */
Matrix reference = allocate_matrix(MATRIX_SIZE, MATRIX_SIZE, 0);
struct timeval start, stop;
gettimeofday(&start, NULL);
printf(“Performing gaussian elimination using the reference code. \n”);
int status = compute_gold(reference.elements, A.elements, A.num_rows);
gettimeofday(&stop, NULL);
printf(“CPU run time = %0.2f s. \n”, (float)(stop.tv_sec – start.tv_sec + (stop.tv_usec – start.tv_usec)/(float)1000000));
if(status == 0){
printf(“Failed to convert given matrix to upper triangular. Try again. Exiting. \n”);
exit(0);
}
status = perform_simple_check(reference); // Check that the principal diagonal elements are 1
if(status == 0){
printf(“The upper triangular matrix is incorrect. Exiting. \n”);
exit(0);
}
printf(“Gaussian elimination using the reference code was successful. \n”);
/* WRITE THIS CODE: Perform the Gaussian elimination using the multi-threaded OpenMP version.
* The resulting upper triangular matrix should be returned in U
* */
gauss_eliminate_using_openmp(A, U);
/* check if the OpenMP result is equivalent to the expected solution. */
int size = MATRIX_SIZE*MATRIX_SIZE;
int res = check_results(reference.elements, U.elements, size, 0.001f);
printf(“Test %s\n”, (1 == res) ? “PASSED” : “FAILED”);
free(A.elements); A.elements = NULL;
free(U.elements); U.elements = NULL;
free(reference.elements); reference.elements = NULL;
return 0;
}

void
gauss_eliminate_using_openmp(const Matrix A, Matrix U) /* Write code to perform gaussian elimination using OpenMP. */
{
}

int
check_results(float *A, float *B, unsigned int size, float tolerance) /* Check if refernce results match multi threaded results. */
{
for(int i = 0; i < size; i++) if(fabsf(A[i] - B[i]) > tolerance)
return 0;

return 1;
}

/* Allocate a matrix of dimensions height*width.
* If init == 0, initialize to all zeroes.
* If init == 1, perform random initialization.
* */
Matrix
allocate_matrix(int num_rows, int num_columns, int init){
Matrix M;
M.num_columns = M.pitch = num_columns;
M.num_rows = num_rows;
int size = M.num_rows * M.num_columns;
M.elements = (float*) malloc(size*sizeof(float));

for(unsigned int i = 0; i < size; i++){ if(init == 0) M.elements[i] = 0; else M.elements[i] = get_random_number(MIN_NUMBER, MAX_NUMBER); } return M; } float get_random_number(int min, int max){ /* Returns a random FP number between min and max values. */ return (float)floor((double)(min + (max - min + 1)*((float)rand()/(float)RAND_MAX))); } int perform_simple_check(const Matrix M){ /* Check for upper triangular matrix, that is, the principal diagonal elements are 1. */ for(unsigned int i = 0; i < M.num_rows; i++) if((fabs(M.elements[M.num_rows*i + i] - 1.0)) > 0.001) return 0;

return 1;
}

#include
#include
extern int compute_gold( float*, const float*, unsigned int);
int
compute_gold(float* U, const float* A, unsigned int num_elements){ /* Compute reference data set. */
unsigned int i, j, k;

for (i = 0; i < num_elements; i ++) /* Copy the contents of the A matrix into the U matrix. */ for(j = 0; j < num_elements; j++) U[num_elements * i + j] = A[num_elements*i + j]; for (k = 0; k < num_elements; k++){ /* Perform Gaussian elimination in place on the U matrix. */ for (j = (k + 1); j < num_elements; j++){ /* Reduce the current row. */ if (U[num_elements*k + k] == 0){ printf("Numerical instability detected. The principal diagonal element is zero. \n"); return 0; } /* Division step. */ U[num_elements * k + j] = (float)(U[num_elements * k + j] / U[num_elements * k + k]); } U[num_elements * k + k] = 1; /* Set the principal diagonal entry in U to be 1. */ for (i = (k+1); i < num_elements; i++){ for (j = (k+1); j < num_elements; j++) /* Elimnation step. */ U[num_elements * i + j] = U[num_elements * i + j] -\ (U[num_elements * i + k] * U[num_elements * k + j]); U[num_elements * i + k] = 0; } } return 1; }

Still stressed with your coursework?
Get quality coursework help from an expert!