Issues using PARFOR with large object array (slow execution time)

2 Ansichten (letzte 30 Tage)
Hello,
I have been having issues with some code I am trying to parallelise. I am not sure where the problem is, I explored different approaches but none offered satisfactory results yet.
All my testing is done on a dual core laptop but I aim to use the final code on a cluster or dedicated multicore (12) computer. I use MATLAB R2011a.
Here is my code:
parfor m=1:numel(processedQueues)
processedQueues(m) = processedQueues(m).execute(path);
end
processedQueues is a value object array containing between 5000 and 200'000 objects. path is a broadcast variable (char array). The code runs slower than its serial equivalent (296.95 sec vs 255 sec respectively for about 6000 objects).
I thought the way I use the same sliced variable as input and output could be the issue so I tried using a temporary array as my sliced output:
tmp = Queue.empty(numel(processedQueues),0);
parfor m=1:numel(processedQueues)
tmp(m) = processedQueues(m);
tmp(m) = tmp(m).execute(path);
end
But it resulted in increased processing time (about 350 sec for the same dataset).
It is clear that the bottleneck comes from the large amount of data that it has to send to the workers; even though saving the whole dataset takes a few seconds, the overhead imposed by the PARFOR seems at least an order of magnitude larger, which puzzles me. For reference one object is about 0.15MB, the array I used for the tests ~685MB.
I tried reducing the amount of data I send at a time by partitioning my dataset in 500 objects batches and sending them one at a time using a for loop wrapped around the PARFOR. It results in slightly better performances (~200 sec).
I am still not totally satisfied with this result. I am wondering if my laptop might be at fault or if PARFOR is fundamentally unsuitable when using object arrays? Does the fact I use array(m).method() directly in the assignment could be the cause of my issues?
I am really stuck on this problem and any help is welcome!
Cheers,
Nicolas

Akzeptierte Antwort

Edric Ellis
Edric Ellis am 7 Jul. 2011
As you increase the number of workers, the amount of communication doesn't increase (for sliced inputs/outputs), but the amount of computational power does - so you might well find that although 2 workers runs slower than serial MATLAB, by the time you've got 12 workers you could be running a few times faster than serial MATLAB.
  1 Kommentar
Nicolas Jaccard
Nicolas Jaccard am 7 Jul. 2011
I understand there's a necessary tradeoff due to communication to the workers but why is it much slower than saving/loading the objects to/from a file?
I guess I will have to experiment, I am still not sure if it's better to send a large object array to individual workers or break it down to individual slices.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Sean de Wolski
Sean de Wolski am 6 Jul. 2011
Parallelization is slower when the operation is fast and a lot of memory is necessary. There's major overhead in passing it off to the workers.

Kategorien

Mehr zu Parallel for-Loops (parfor) finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by