PARFOR behavior sensitive to comments???

I am finding that the speed performance of PARFOR is greatly impacted depending on whether certain inconsequential lines in the code are commented out or not. In particular, when running the code below with a pool of 12 workers, I obtain a time of 26 sec (4 times that of a normal for loop).
However, if I comment out either the first line (thus converting the mfile to a script) or if I comment out the inconsequential line
A=zeros(M,N); B=A;
the time drops sharply to less than 1 sec!!
This is under Windows 7 64-bit. Processor is Intel Xeon X5680 @3.33 Ghz, dual hexacore. I've tested with R2011b,R2012b,R2013b, all with similar results.
Can anyone reproduce this? Is there something obvious that I'm not seeing?
function test1 %Comment this out
P=100;
fun=@(X)imrotate(X,30,'crop');
Xtypical=zeros(P);
M=P^2;
N=P^2;
A=zeros(M,N); B=A; %or Comment this out
A=cell(1,N);
tic;
parfor j=1:N
input=Xtypical;
input(j)=1;
T=fun(input);
A{j}=sparse(T(:));
end
A=cell2mat(A);
toc;

2 Kommentare

amir
amir am 9 Feb. 2014
Bearbeitet: amir am 9 Feb. 2014
with my system, if I comment first line, the result is generated after about 3.5 seconds but if I run it with first line (as a function) my system goes to a bad state, CPU usage is about zero but whole of my system has no response and I must reboot my system.
by smaller values for P (for example 70), my results was like yours, running as script was better than running as function (opposite to my results in here )
Matt J
Matt J am 9 Feb. 2014
Thanks! Guess it's time to make a Bug Report.

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Edric Ellis
Edric Ellis am 10 Feb. 2014

1 Stimme

Unfortunately what I think you're seeing there is a limitation of the way function handles are created. If you use the FUNCTIONS function on your function handle 'fun' just before entering the PARFOR loop, you can see that in the 'workspace' entry, it contains everything in the function workspace. In particular, the large value of 'B' is there - hence why commenting out the creation of that variable 'fixes' things.
Function handles are created differently at the command-line or in scripts, and don't end up grabbing everything in this way - hence why using a script also 'fixes' things.
Perhaps the best workaround is either to use internal functions, or generate the function handles in a scope that cannot see any large variables.

1 Kommentar

Matt J
Matt J am 10 Feb. 2014
Bearbeitet: Matt J am 10 Feb. 2014
Pretty subtle pitfall, Edric! But it looks like you're right.
Could you also take a look at this one,
and see if you have more insights into what is happening than we did?
There, we also saw parfor performance change greatly depending on whether running as a script or mfile function. Initially, I thought it might have been caused by reasons related to my post here, but now I guess not.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Eric Sampson
Eric Sampson am 10 Feb. 2014

1 Stimme

Matt, what happens if you create the anonymous function inside the parfor loop? If that helps, but if in actual code you need to define it before the parfor loop, then could you first create a string like fun_str = '@(X)imrotate(X,30,''crop'')' and then inside the parfor loop do fun = str2func(fun_str); ?

9 Kommentare

Matt J
Matt J am 10 Feb. 2014
Bearbeitet: Matt J am 10 Feb. 2014
Eric,
That would be a reasonable remedy for many simple cases. However, it won't work if the anonymous function contains external parameters. Compare the following for example,
>> b=1; f=@(x)x+b; f(2)
ans =
3
>> b=1; f=str2func('@(x)x+b'); f(2)
Undefined function or variable 'b'.
Error in @(x)x+b
To be honest, I don't really see the performance of PARFOR as the big problem now, in light of Edric's answer. I think the bigger problem is that anonymous functions can carry around huge amounts of hidden, unintended memory!! That seems like a major pitfall. I can't believe I'm the only one who didn't know about it!
Right. You can always do something like this I suppose:
b=1; f=str2func(strcat('@(x)x+',num2str(b,20))); f(2)
This behavior is 'known' by folks who've encountered issues with it, but probably not well-known... It 'has' to be that way to support odd constructs involving feval/eval, nested functions, etc http://www.mathworks.com/matlabcentral/newsreader/view_thread/240388 http://stackoverflow.com/questions/8671549/matlab-function-handle-workspace-shenanigans http://blogs.mathworks.com/loren/2013/01/10/introduction-to-functional-programming-with-anonymous-functions-part-1/#comment-33635
Matt J
Matt J am 10 Feb. 2014
Bearbeitet: Matt J am 10 Feb. 2014
Thanks for the links, Eric. I'm not sure that the threads you've referenced quite cover it, though. I can see why the anonymous function handle might need to snapshot all variables created before the anonymous function is defined, but the links don't explain why changes to those variables made after the anonymous function is created are also stored. Nobody expects to be able to reference later versions of the variables through their anonymous functions
Sorry maybe it is not exactly related, but it is related to anonymous functions: I have defined an anonymous function like this:
global field1
cost_func_field1 = @(x) cost_func(x,field1);
field1 is an instance of a reference class (inherited from handle).
what is happening here for field1 while calling cost_func_field1(x1) ? is it global or it is a copy of first field1 ?
Matt J
Matt J am 11 Feb. 2014
Bearbeitet: Matt J am 11 Feb. 2014
It is a copy. You can see this by running the following, and observing that the output doesn't change when the global variable is changed,
global b
b=1;
fun=@(x) x+b;
result1=fun(0),
b=2;
result2 = fun(0),
However, with handle objects, copies of the same instance share data, which is why you get different results in the following example
obj=myclass; obj.prop=1; %obj is type handle
fun=@(x) x+obj.prop;
result1=fun(0),
obj.prop=2;
result2 = fun(0),
The behavior has nothing to do with the variables being global or non-global, though.
amir
amir am 16 Feb. 2014
Bearbeitet: amir am 16 Feb. 2014
I have tested and It seems handle class behaves different in parfor: (number of pools is 8)
%%%%%%%%%%%%%%%%%%%%%%%%%
% file : testclass.m
classdef testclass < handle
properties
a = 2000;
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%
% file: main.m
t1 = testclass();
t1.a = 1000;
parfor i = 1 : 10
task = getCurrentTask();
if isempty(task)
ID = 0;
else
ID = task.ID;
end
previous_value = t1.a;
t = t1;
t.a = task.ID;
time_check = tic();
fprintf(1,'%d : %d , %d , %d \n', time_check , previous_value , t1.a , t.a);
end
t1.a
%%%%%%%%%%%%%%%%%%%%%%%%%
% results :
688005680481 : 1000 , 1 , 1
688005674577 : 1000 , 4 , 4
688005592817 : 1000 , 5 , 5
688005592714 : 1000 , 6 , 6
688005679832 : 1000 , 8 , 8
688005592765 : 1000 , 2 , 2
688005601756 : 1000 , 3 , 3
688005857083 : 5 , 5 , 5
688005679596 : 1000 , 7 , 7
688005856927 : 8 , 8 , 8
ans =
1000
as you can see they don't share their attributes in each thread. It seems a complete clone is created for each thread. the problem is when we have a big handle class, overhead of creating and destroying these objects is very high. the question is can we change this behavior?
Matt J
Matt J am 17 Feb. 2014
Bearbeitet: Matt J am 17 Feb. 2014
You can't change it, but Edric made this tool to let you create data directly on the workers instead of broadcasting it, and also making it persist there across multiple calls to parfor.
amir
amir am 17 Feb. 2014
sorry but I can not find where Edric do that (creating data directly on the workers and making it persistent across multiple calling parfor).

Melden Sie sich an, um zu kommentieren.

Kategorien

Gefragt:

am 7 Feb. 2014

Kommentiert:

am 19 Feb. 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by