#include <iostream>
#include <mpi.h>
#include <chrono>
#include <cmath>
void compute(int iterations) {
double sum = 0.0;
for (long i = 0; i < iterations; ++i) {
sum += std::sin(i) * std::cos(i);
std::cout << "Computation result: " << sum << std::endl; // Optional: Just to use the result and avoid optimization
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv); // Initialize MPI
int rank, nr_proc;
MPI_Comm_rank(comm, &rank);
MPI_Comm_size(comm, &nr_proc);
long N = atol(argv[1]); // Number of iterations for computation
auto start = std::chrono::high_resolution_clock::now();
compute(N); // Perform the computation
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> duration = end - start;
std::cout << "Process " << rank << " took " << duration.count() << " seconds." << std::endl;
MPI_Finalize(); // Finalize MPI
return 0;
Running this with mpirun --oversubscribe -n 1 mpi_test.exe 1000000000 prints a runtime of 28s.
But running it with `-n 5` yields
1 2 3 4 5 6 7 8 9 10
Computation result: -0.133023
Process 3 took 36.1116 seconds.
Computation result: -0.133023
Process 4 took 36.3062 seconds.
Computation result: -0.133023
Process 1 took 55.0856 seconds.
Computation result: -0.133023
Process 0 took 55.2955 seconds.
Computation result: -0.133023
Process 2 took 64.1076 seconds.
Why do some processes need so much more time to do the same thing? My CPU has 12 logical cores.
Apparently, my CPU has 2 performance cores (allowing hyperthreading) and 8 efficiency cores with no hyperthreading.
But then I would expect the 4 processes to have the same runtime. These are the processes run on the hyperthreads of the 2 performance cores. Only the last process should run slower.