## Introduction

Parallel programming is a technique used to improve the performance of applications by splitting the work into multiple threads or processes that can run simultaneously on different CPUs or CPU cores. OpenMP is a popular API for parallel programming in C++, which provides a set of compiler directives, runtime library routines, and environment variables for shared memory parallel programming. This article will demonstrate how to use OpenMP for parallel matrix multiplication in C++.

Matrix multiplication is a fundamental operation in linear algebra, and it involves multiplying two matrices to produce a third matrix. The algorithm for matrix multiplication is computationally intensive and can benefit from parallelization. In our example, we will perform matrix multiplication on two square matrices of size 1000x1000, using serial and parallel approaches, and compare their performance.

## Implementation

To begin with, we define three 2D vectors, A, B, and C, of size N x N, where N = 1000. These vectors will represent the two input matrices and the output matrix, respectively. We initialize matrices A and B with random values using nested loops.

```
#include <iostream>
#include <vector>
#include <chrono>
#include <omp.h>
using namespace std;
const int N = 1000;
int main()
{
vector<vector<int>> A(N, vector<int>(N));
vector<vector<int>> B(N, vector<int>(N));
vector<vector<int>> C(N, vector<int>(N));
// Initialize matrices A and B with random values
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
A[i][j] = rand() % 100;
B[i][j] = rand() % 100;
}
}
```

Next, we perform matrix multiplication in serial using three nested loops. The outer loops iterate over the rows and columns of the output matrix C, and the innermost loop computes the dot product of the corresponding row of matrix A and column of matrix B.

```
auto start_serial = chrono::high_resolution_clock::now();
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
int sum = 0;
for (int k = 0; k < N; k++) {
sum += A[i][k] * B[k][j];
}
C[i][j] = sum;
}
}
auto end_serial = chrono::high_resolution_clock::now();
auto duration_serial = chrono::duration_cast<chrono::milliseconds>(end_serial - start_serial);
```

In the above code, we measure the time taken to perform matrix multiplication in serial using the C++11 `chrono`

library. We start a timer before the nested loops and stop the timer after the loops are complete. The difference between the two timestamps gives us the elapsed time in milliseconds. We store this duration in a variable `duration_serial`

.

Now, we will use OpenMP to parallelize the matrix multiplication. We will add an `#pragma omp parallel for`

directive before the outer loop to indicate that the iterations of the loop can be executed in parallel. The OpenMP runtime library will automatically distribute the iterations among the available threads.

```
// Perform matrix multiplication in parallel using OpenMP
auto start_parallel = chrono::high_resolution_clock::now();
#pragma omp parallel for
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
int sum = 0;
for (int k = 0; k < N; k++) {
sum += A[i][k] * B[k][j];
}
C[i][j] = sum;
}
}
auto end_parallel = chrono::high_resolution_clock::now();
auto duration_parallel = chrono::duration_cast<chrono::milliseconds>(end_parallel - start_parallel);
```

Finally, we display the time taken for each approach using the chrono library, which provides high-resolution timers for measuring time intervals.

```
// Display the time taken for each approach
cout << "Time taken for serial matrix multiplication: " << duration_serial.count() << " milliseconds" << endl;
cout << "Time taken for parallel matrix multiplication: " << duration_parallel.count() << " milliseconds" << endl;
return 0;
}
```

**Note**

Enabling OpenMP in Visual Studio can be done in the following steps,

- Open your C++ project in Visual Studio.
- Right-click on your project in the Solution Explorer window and select "Properties".
- In the Properties window, navigate to Configuration Properties -> C/C++ -> Language.
- Set the "Open MP Support" option to "Yes (/openmp)".
- Click "Apply" and "OK" to save the changes.

After enabling OpenMP, you can use OpenMP directives in your code to parallelize loops and other tasks. Remember that not all compilers support OpenMP, so it's essential to check that your compiler supports it before using it in your code.

## Conclusion

It is important to note that the speedup achieved by parallelization depends on several factors, such as the number of available cores, the size of the matrices, and the efficiency of the parallelization. In this program, we use a matrix size of 1000x1000, but the benefits of parallelization become even more apparent for larger matrices.

In conclusion, parallel programming can significantly improve the performance of computationally intensive tasks such as matrix multiplication. The OpenMP library provides a simple and effective way to parallelize code, and with careful optimization, even more, considerable speedups can be achieved.