My mex file is slower than my original matlab equivalent

4 Ansichten (letzte 30 Tage)
Hello friends,
I need to calculate some quantities of linear algeibra type, so they are merely matrix and vector products. The following is an example
EZ=[(1.0./Ds0.^2.*(Ds0.*(Dm0.*4.0+Dm0.*Ds1.^2.*2.0-Ds0.*(Ds1.*2.0+Dm0.*Ds2.*2.0+Dm1.*Ds1.*2.0-Dm2.*Ds0)+Dm0.*Dm1.*2.0)-Dm0.^2.*Ds1.*2.0))./4.0;
(1.0./Ds0.^2.*(Ds0.*(Ds0.*(Dm1.*4.0+Ds1.^2-Ds0.*Ds2.*2.0+4.0)-Dm0.*Ds1.*8.0)+Dm0.^2.*4.0))./4.0;
(Dm0.*6.0-Ds0.*Ds1.*3.0)./(Ds0.*2.0)];
where Ds0,Ds1,Ds2,Dm0,Dm1,Dm2 are 1*n vectors. When I do the calculations using matlabFunction (attached) it is fast. However, I am not satisfied since I really need to do such calculations thousands of time s(if not millions of times). To overcome this issue I decided to give mex a try. Unfortunately, the equivalent mex file (which I made by matlab coder) is slower 2-3 times (I could not upload it here, unforetunately).
Is there any hope to create a mex file out of this function which is much faster? I hope so!
Thanks for your help in advance,
Babak
  7 Kommentare
Bruno Luong
Bruno Luong am 18 Jul. 2022
Bearbeitet: Bruno Luong am 18 Jul. 2022
So you don't think simplying expression matters? I do think the contrary. Everytime in the expression there is a gpu array involves there is a whole transfer data from cpu to gpu, you mighte have 200 such terms in your expression, I don't even try to count or understand your code as it is a so unreable and messy expression.
If you a raw unsimplified expression like yours, throw it in the computer and ask why it doesn't accelerate, you need to think a much more lower level how it works.
Mohammad Shojaei Arani
Mohammad Shojaei Arani am 18 Jul. 2022
Bruno,
Of course, simplification matters a lot. My actual expressions are way longer than this. Matlab is not able to simplify them in an efficint way (and in many cases it simplifies a little). I have spent a lot of time on how to simplify my expressions. Unfortunately, using matlab I do not have any hope to simplify my expressions more than this (yes, you can perhaps simplify this expression more because it is not extremely long but can you do it for an expression which is 1KM long???) My expressions are in rational form. So, typically I perform 2 operations to simplify them: 1) first I apply [n,d] = numden(EZ), and then 2) EZ = horner(n,Ds0)./horner(d,Ds0). Unfortunately, matlab does support a multivariate horner scheme and I could only benifit the univariate horner scheme here (I apply horner scheme with respect to variable Ds0 as it is the most repeated variable. Typically, you should apply horner scheme with respect to such variables). So, at this point I convinced myself that I canot hope to simplify my expressions more using matlab. Therefore, I should find strategies to ask C or C++ to perform the calculations.
So, my question is not about how to come up with a better simplification (as it does not work with the current capacities of matlab). My question is "how can I use C/C++ or perhaps resort to stuff like gpuArray, etc to reduce the computational burden".

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Jan
Jan am 18 Jul. 2022
Just some experiments. You can gain some clarity, but hardly improve the speed with this simplifications. I've tried a loop version also.
n = 1e4;
Ds0 = rand(1, n);
Ds1 = rand(1, n);
Ds2 = rand(1, n);
Dm0 = rand(1, n);
Dm1 = rand(1, n);
Dm2 = rand(1, n);
tic;
for rep = 1:1e4
EZ = [(1.0./Ds0.^2.*(Ds0.*(Dm0.*4.0+Dm0.*Ds1.^2.*2.0-Ds0.*(Ds1.*2.0+Dm0.*Ds2.*2.0+Dm1.*Ds1.*2.0-Dm2.*Ds0)+Dm0.*Dm1.*2.0)-Dm0.^2.*Ds1.*2.0))./4.0;
(1.0./Ds0.^2.*(Ds0.*(Ds0.*(Dm1.*4.0+Ds1.^2-Ds0.*Ds2.*2.0+4.0)-Dm0.*Ds1.*8.0)+Dm0.^2.*4.0))./4.0;
(Dm0.*6.0-Ds0.*Ds1.*3.0)./(Ds0.*2.0)];
end
toc
Elapsed time is 0.794406 seconds.
tic;
for rep = 1:1e4
Ds0_2 = Ds0 .* Ds0;
Dm0_2 = Dm0 .* Dm0;
EZ2 = [(1 ./ Ds0_2 .* (Ds0 .* (Dm0 * 2 + Dm0 .* Ds1 .^ 2 - ...
Ds0 .* (Ds1 + Dm0 .* Ds2 + Dm1 .* Ds1 - Dm2 .* Ds0 ./ 2) + ...
Dm0 .* Dm1) - Dm0_2 .* Ds1)) / 2; ...
1 ./ Ds0_2 .* (Ds0 .* (Ds0 .* (Dm1 + Ds1 .^ 2 / 4 - Ds0 .* Ds2 / 2 + 1) - ...
Dm0 .* Ds1 * 2) + Dm0_2);
(Dm0 * 3 - Ds0 .* Ds1 * 1.5) ./ Ds0];
end
toc
Elapsed time is 0.775081 seconds.
tic;
for rep = 1:1e4
EZ3 = zeros(3, n);
for k = 1:n
a = Ds0(k);
b = Dm0(k);
c = Ds1(k);
d = Dm1(k);
e = Ds2(k);
EZ3(1, k) = (1 / a^2 * (a * (b * 2 + b * c ^ 2 - ...
a * (c + b * e + d * c - Dm2(k) * a / 2) + b * d) - b^2 * c)) / 2;
EZ3(2, k) = (a * (a * (d + c ^ 2 / 4 - a * e / 2 + 1) - b * c * 2) + b^2) / a^2;
EZ3(3, k) = b * 3 / a - c * 1.5;
end
end
toc
Elapsed time is 1.140882 seconds.
max(abs(EZ(:) - EZ2(:)))
ans = 0
max(abs(EZ(:) - EZ3(:)))
ans = 2.3283e-10
  5 Kommentare
Jan
Jan am 18 Jul. 2022
@Mohammad Shojaei Arani: The rules are straight:
  1. Avoid repeated work. If a calculation appears repeatedly, compute it once and store it in a temporary variable.
  2. Reduce the call to expensive functions: exp, power, trigonometric functions, faculty, ...
  3. Combine operations, but keep in mind, that the result can be influenced by rounding effects. E.g. 1/a*b takes more time than b/a, but the result can be slightly different.
The clarity of the code improves the time needed for debugging:
  1. Spaces around operators.
  2. Compact names of variables.
  3. Be careful with using parentheses, if they are not required.
  4. Avoid elementwise operators, if the calculation does not need it. 3.0.*2.0 is harder to read then 3 * 2.
Bruno's point is important: The result of numerically instable functions can be influenced massively by simplifications. A basic example:
1e17 + 1 - 1e17
ans = 0
1e17 - 1e17 + 1
ans = 1

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu GPU Computing finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by