Parfor with threads parpool: Why I don't see expected scaling?

1 Ansicht (letzte 30 Tage)
Thomas Witkowski
Thomas Witkowski am 13 Nov. 2023
Beantwortet: Karan Singh am 13 Dez. 2023
I've some trouble to understand scaling of thread based parallization based on parfor. Here is a simple example:
function testParData()
parpool("threads", 4);
runComp(3000, 30000);
runComp(300000, 300);
delete(gcp('nocreate'));
end
function runComp(n, m)
mat = rand(n, m);
tic;
for i = 1:n
islocal = islocalmax(mat(i,:));
end
t = toc;
fprintf("Single thread run with n = %d: %.1f s\n", n, t);
tic;
parfor i = 1:n
islocal = islocalmax(mat(i,:));
end
t = toc;
fprintf("Parallel thread run with n = %d: %.1f s\n", n, t);
end
My expectations are that the scaling should be somehow independent on n and m (at least as long there are reasonable choosen). I would even guess that the first scenario (n=3000 and m=30000) should scale better as the amount of computation per thread call is better than in the second scenario (n=30000 and m=300). But too my surprise on my PC (Intel i7 CPU with 4 physical cores + HT) the results are as follows:
Starting parallel pool (parpool) using the 'threads' profile ...
Connected to parallel pool with 4 workers.
Single thread run with n = 3000: 2.9 s
Parallel thread run with n = 3000: 2.9 s
Single thread run with n = 300000: 9.8 s
Parallel thread run with n = 300000: 3.0 s
Parallel pool using the 'threads' profile is shutting down.
I do not see any benefits on using parfor in first scenario, whereas the speedup of parallelization in second scenario is ~3.2.
Can someone explain this effect to me?

Antworten (1)

Karan Singh
Karan Singh am 13 Dez. 2023
Hi Thomas,
It seems like you are experiencing unexpected results with the parallelization of the parfor loop in MATLAB.
The behaviour you are observing can be explained by the overhead associated with parallelization, and the characteristics of the tasks being parallelized.
1. Scenario 1 (n = 3000, m = 30000):
- In this scenario, the amount of computation per thread call is relatively high due to the larger value of m. Since the computation per thread is relatively high, the overhead of setting up and managing the parallel threads may outweigh the potential speedup.
2. Scenario 2 (n = 300000, m = 300):
- Here, the amount of computation per thread call is lower due to the smaller value of m. This results in a better speedup when using parallel threads, as the overhead of parallelization is less significant compared to the actual computation being performed.
To improve the performance in the first scenario, you can consider the following:
- Increase the amount of work per iteration in the parfor loop to better balance the overhead of parallelization with the actual computation.
- Test different values of n and m to find the optimal balance between the amount of work per iteration and the overhead of parallelization.
Attached below are the documentation links that you may find helpful:
Hope this helps!
Karan Singh Khati

Produkte


Version

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!