Home >Backend Development >C++ >How to perform parallel computing of C++ code?

How to perform parallel computing of C++ code?

WBOY
WBOYOriginal
2023-11-03 10:15:331175browse

How to perform parallel computing of C++ code?

With the continuous improvement of computer hardware performance, parallel computing for multi-core processors has become an important topic in the field of programming. As an efficient programming language, C naturally has various methods to implement parallel computing. This article will introduce several commonly used C parallel computing methods and show their code implementation and usage scenarios respectively.

  1. OpenMP

OpenMP is a parallel computing API based on shared memory, which can easily add parallelization code to C programs. It uses the #pragma directive to identify code segments that need to be parallelized, and provides a series of library functions to implement parallel computing. The following is a simple OpenMP sample program:

#include <iostream>
#include <omp.h>

using namespace std;

int main() {
    int data[1000], i, sum = 0;
    for (i=0;i<1000;i++){
        data[i] = i+1;
    }

    #pragma omp parallel for reduction(+:sum)
    for (i=0;i<1000;i++){
        sum += data[i];
    }
    cout << "Sum: " << sum << endl;
    return 0;
}

In this example, the #pragma omp directive is used to parallelize the for loop. At the same time, use the reduction(:sum) instruction to tell OpenMP to add the sum variable. When this program is run on a computer using 4 cores, you can see that the running time is 3-4 times faster than the single-threaded version.

  1. MPI

MPI is a message passing interface that enables distributed parallel computing between multiple computers. The basic unit of an MPI program is a process, and each process is executed in an independent memory space. MPI programs can run on a single computer or on multiple computers. The following is a basic MPI sample program:

#include <iostream>
#include <mpi.h>

using namespace std;

int main(int argc, char** argv) {
    int rank, size;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    cout << "Hello world from rank " << rank << " of " << size << endl;

    MPI_Finalize();
    return 0;
}

In this example, the MPI environment is initialized through the MPI_Init() function, and the MPI_Comm_rank() and MPI_Comm_size() functions are used to obtain the process number of a single process and the total number of processes. . Here I simply output a sentence. By executing the mpirun -np 4 command, this program can be run on 4 processes.

  1. TBB

Intel Threading Building Blocks (TBB) is a C library that provides tools to simplify parallel computing. The main concept of TBB is tasks, which parallelize some work through collaboration between nodes and tasks. The following is a TBB sample program:

#include <iostream>
#include <tbb/tbb.h>

using namespace std;

class Sum {
public:
    Sum() : sum(0) {}
    Sum(Sum& s, tbb::split) : sum(0) {}
    void operator()(const tbb::blocked_range<int>& r) {
        for (int i=r.begin();i!=r.end();i++){
            sum += i;
        }
    }
    void join(Sum&s) {
        sum += s.sum;
    }
    int getSum() const {
        return sum;
    }

private:
    int sum;
};

int main() {
    Sum s;
    tbb::parallel_reduce(tbb::blocked_range<int>(0, 1000), s);
    cout << "Sum: " << s.getSum() << endl;
    return 0;
}

In this example, a Sum class is defined to implement parallel computing, and tbb::blocked_range is used to split the tasks, and tbb::parallel_reduce( ) function completes parallelization. When this program is run on a computer using 4 cores, you can see that the running time is 3-4 times faster than the single-threaded version.

These three methods each have their own advantages and disadvantages. Which method to choose mainly depends on the specific application scenario. OpenMP is suitable for use on a single machine with shared memory, and can easily add parallelization code to existing C programs to make the program run faster. MPI is suitable for use on distributed computing clusters and can achieve parallelization by passing messages between multiple computers. TBB is a cross-platform C library that provides some efficient tools to simplify parallel computing.

In summary, for applications that require parallel computing, C provides a variety of options for efficient parallelization. Developers can choose one or more methods to achieve their tasks based on their own needs and application scenarios, and improve the performance of the program to a new level.

The above is the detailed content of How to perform parallel computing of C++ code?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn