Any update on lsqcurvefit with gpuArray?
11 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I use gpuArray for image analysis, and I want to display the image with fitted values pixel by pixel by lsqcurvefit. That's a reason why I'm searching how I cna handle this for 2k x 3k image size. It seems no further discussion about lsqcurvefit with gpuArray, and JacobianMultiplyFcn seems the fastest one. Do you guys hava any advices for?
0 Kommentare
Antworten (1)
Matt J
am 24 Aug. 2023
Bearbeitet: Matt J
am 29 Aug. 2023
I recently had a conversation with the Mathworks about adding gpuArray support to Optimization Toolbox Solvers. The transcript is below. Sadly, there seems to be some skepticism from MathWorks developers that it would be worthwhile:
GPU array support for Optimization Toolbox
Inbox
Matt J
Jun 12, 2023, 10:58AM
Hi Mike,
In addition to gpuArray support for griddedInterpolant, I was also wondering about enabling gpuArray types for the Optimization Toolbox solvers. The Optimization Toolbox implements a number of iterative function minimization methods, where the function to be minimized is specified by a user-defined function handle. Currently, the toolbox solvers work only with CPU-double data and the user-supplied function is required to return its results in CPU-double form. This introduces a lot of bottlenecks. It would be good if these solvers could work entirely on the GPU. This shouldn't be too hard to enable, since all of the operations that the toolbox solvers perform on doubles are probably already implemented for gpuArrays as well.
Mike Croucher <@mathworks.com>
Jun 14, 2023, 11:30AM
to me
Hi Matt
I can confirm that there is definitely gpuArray support for griddedInterpolant in R2023b. The pre-release will be available soon so you can try it out.
Regarding gpuArray support for optimisation solvers. There are currently no plans to do so.
One of our developers recently worked on a Proof of Concept for a customer solving a large set of nonlinear equations using GPU arrays and the speed up was marginal, at best.
Elsewhere, there is no evidence of GPU use by the usual competitors (Gurobi, CPLEX etc) and there seems to be similar conclusions in the open source world.
With that said, if you know of any evidence showing GPU acceleration of optimization algorithms that are relevant to your work, I’d be interested in knowing it.
With respect to your own optimization problems. Given that GPU acceleration seems to be off the table. What else might we try? Do you have something concrete I could look at?
Best Wishes,
Mike
Mike Croucher
Customer Success Engineer, MathWorks
Matt J
Jun 14, 2023, 4:13PM
to Mike
Hi Mike,
Thanks a lot for pursuing this for me, but....
With that said, if you know of any evidence showing GPU acceleration of optimization algorithms that are relevant to your work, I’d be interested in knowing it.
...one piece of evidence is that Matlab has a whole host of GPU-accelerated optimization algorithms already (pcg,lsqr, bicg), but which are not in the Optimization Toolbox. You wouldn't have added gpuArray support for those commands if GPU acceleration was inapplicable to optimization in general.
Another piece of evidence is my sample code below, which runs a few iterations of fminunc() to solve a basic set of equations A*x=b using both the CPU and the GPU. On the GTX 1080 Ti, I see nearly a 3x speed-up. However, as you can see in the user-provided objective() code, I am forced to use gather() to send the results back to the CPU every time the user-provided objective is invoked. As you make N smaller (e.g., N=500), this becomes a bottleneck and the GPU time is outperformed by the CPU by a factor of 10. If you turn the option SpecifyObjectiveGradient=false, it is outperformed 35 times.
N=8e3;
opts=optimoptions('fminunc','Display','none','MaxIterations',4,...
'SpecifyObjectiveGradient',true,'Algorithm','quasi-newton',...
'HessUpdate','steepdesc');
%CPU
A=rand(N);
b=A*rand(N,1);
tic;
x=fminunc(@(x)objective(x,A,b) , ones(N,1) ,opts ); %
toc %Elapsed time is 3.915851 seconds.
disp ' '
%GPU
A=gpuArray(A);
b=gpuArray(b);
tic
x=fminunc(@(x)objective(x,A,b) , ones(N,1) ,opts );
toc %Elapsed time is 1.564494 seconds..
function [fval,grad]=objective(x,A,b)
err=A*x-b;
fval=norm(err).^2;
fval=gather(fval);
if nargout>1
grad=A.'*err;
grad=gather(grad);
end
end
0 Kommentare
Siehe auch
Kategorien
Mehr zu Parallel Computing Toolbox finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!