Vectorize and use parfor in a nested loop with five levels

Question

2 Stimmen

Hi all,

I'm working on an application that, for now, requires computing a nested loop with five levels. I'm trying to vectorize some parts and to use a parfor for one of the loops. However, I haven't been very successful. Any suggestions would be great. A simplified version of the code looks as follows,

clear all
clc
p1=(1:1:50)';
p2=p1;
np=max(size(p1));
[P1,P2]=meshgrid(p1,p2);
DP=unique(P1-P2);
K=max(size(DP));
ns=1000;
ne=10;
S1=randn(ns,ne);
S2=randn(ns,ne);
S3=randn(ns,ne);
S4=randn(ns,ne);
L1=rand(ns,ne);
L2=rand(ns,ne);
L3=rand(ns,ne);
L4=rand(ns,ne);
a=2;
BR=zeros(np,K,K,K,K);
P=BR;
for i=1:K,
    DP1=DP(i);
    C1=mean(a*S1*DP1>=0,2);
    e1=bsxfun(@times,L1,(1-2*C1));
    for j=1:K,
        DP2=DP(j);
        C2=mean(a*S2*DP2>=0,2);
        e2=bsxfun(@times,L2,(1-2*C2));
        for l=1:K,
            DP3=DP(l);
            C3=mean(a*S3*DP3>=0,2);
            e3=bsxfun(@times,L3,(1-2*C3));
            for m=1:K,
                DP4=DP(m);
                C4=mean(a*S4*DP4>=0,2);
                e4=bsxfun(@times,L4,(1-2*C4));
                  for pp=1:np,
                      pr=p2(pp);
                      dp1=p1-pr;
                      A=reshape(-bsxfun(@times,a.*C4,dp1'),ns,1,np);
                      N=bsxfun(@minus,A,e4);
                      f=exp(N);
                      PR=f./(1+f);
                      rev=zeros(ns,ne,np);
                      for zz=1:np,
                          rev(:,:,zz)=PR(:,:,zz)*p1(zz);
                      end
                      er=reshape(sum(mean(rev,2)),np,1);
                      [mpr,ind]=max(er);
                      BR(pp,m,l,j,i)=p1(ind);
                      P(pp,m,l,j,i)=mpr;
                  end
              end
          end
      end
  end

In this version the problems start when evaluating BR and P in the inner loop. Even though all pp, m, l, j and i are defined, it takes a lot of memory and time to evaluate BR(pp,m,l,j,i)=p1(ind), and I really don't know why. Then, after solving that, I need to improve on vectorizing what might be possible to vectorize, and choose a loop to use the parfor.

Thanks,

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Jan am 28 Jun. 2013

Bearbeitet: Jan am 30 Jun. 2013

In MATLAB Online öffnen

1 Stimme

Some small improvements:

% Original code:
rev=zeros(ns,ne,np);
for zz=1:np,
  rev(:,:,zz)=PR(:,:,zz)*p1(zz);
end
% Vectorized (is this faster?):
rev = bsxfun(@times, PR, reshape(p1, 1, 1, np));

Calculating sum(mean(rev,2)) contains 2 summations. Perhaps one sum is faster:

er = sum(reshape(rev, np, []), 2) / ne;

Avoir some multiplications by -1:

% Original:
dp1 = p1-pr;
A   = reshape(-bsxfun(@times,a.*C4,dp1'),ns,1,np);
% New
dp1 = pr - p1;
A   = reshape(bsxfun(@times, a.*C4, dp1'), ns, 1, np);

In these lines:

C1 = mean(a*S1*DP1>=0,2);

some simplifications are possible: a * S1 * DP1 >= 0 does not depend on the positivbe constant a. DP1 is a scalar and only its sign is interesting. So the meanS1 = mean(S1>=0, 2) can be calculated once and in case of a negative sign of DP1 1 - meanS1 can be calculated fast. DP1==0 must be caught also.

I'd start a parallelization after these mathematical simplifications have been performed. Avoiding calculations is simply more efficient than trying to do them faster.

[EDITED] Now I've found the time to run your code on my computer. Beside the fact that 4GB are not enough to run your code (how much RAM do you have?), a simple test revealed the bottleneck: Use the profiler to find out, which lines consumes the most time. You will see, that all my ideas are meaningless, because they improve parts of your code, which require only one percent of the total time. Only vectorizing the creation of rev matters significantly. But the rest is dominated by the very expensive calculation of:

f = exp(N);

So this is the point where improvements are required at first. N is created by BSXFUN and you can exploit, that exp(a - b) equals exp(a) / exp(b), and that exp(a * b) equals exp(a) ^ b. Then the innermost loop becomes:

   exp_e4_inv = exp(e4);
   exp_a_C4   = exp(a * C4);
   for pp = 1:np
      dp1 = p2(pp) - p1;
      Ax  = reshape(bsxfun(@power,exp_a_C4, dp1'), ns, 1, np);
      f   = bsxfun(@times, Ax, exp_e4_inv);
      PR  = f./(1+f);
      rev = bsxfun(@times, PR, reshape(p1, 1, 1, np));
      er  = reshape(sum(reshape(rev, [], 3)) / ne, np, 1);
      [mpr,ind]=max(er);  % No changes here...
      BR(pp,m,l,j,i)=p1(ind);
      P(pp,m,l,j,i)=mpr;
   end

This reduces the total number of elements the exponential function is applied to. For the much smaller problem defined by p1=(1:3)', I get 1.45 seconds runtime instead of 3.02 of the original version. The results differ by rounding errors inside the expected limits from the original. So some basic mathematical simplifications gained the double speed.

Well, double speed is not impressing by such a large problem. But I hesitate to start to parallelize code without considering this. Now this is a good point to come back to the original question: How do we PARFOR this?

5 Kommentare
3 ältere Kommentare anzeigen 3 ältere Kommentare ausblenden

Jan am 30 Jun. 2013

See [EDITED]

Fernando am 1 Jul. 2013

Thanks Jan for the edition. I had realized that the costly part was computing the exponential but I had no idea how to improve on that. Once again, thanks! And to answer your question about RAM, I'm running this on a computer cluster so as long as I don't need more than 50gb, it should be OK.

Melden Sie sich an, um zu kommentieren.

Vectorize and use parfor in a nested loop with five levels

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

5 Kommentare
3 ältere Kommentare anzeigen 3 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Tags

Community Treasure Hunt

Vectorize and use parfor in a nested loop with five levels

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

5 Kommentare 3 ältere Kommentare anzeigen 3 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

5 Kommentare
3 ältere Kommentare anzeigen 3 ältere Kommentare ausblenden