Multistart and lsqnonlin - Parallelization doesn't seem to provide any benefit.

I'm using Multistart and Lsqnonlin to do curve fitting and get parameters.
A quick overveiw of the problem: I have a Numerical DiffEq solver (in the form of a mex file) that is operated through MATLAB, a function blackboxfunction(A,t, X). The DiffEq solver provides my 'model' data (from theory).
We perform experients with known experimental values 'A' and for a known time from 0 to t.
X are parameters that are unknown, but can be obtained by inverse fitting the 'model' data to the 'experimental' data. Both experimental and model data are plotted in the form of curves.
This is essentially a non-convex optimization problem: What are the parameters (X) such that the function @(X)blackboxfunction(A,t,X) - y_experimental(t) is minimized.
In other words - what parameters 'X' gets the experimental and model curves to overlap perfectly?
I have a system with 16 nodes.
%% An excerpt of my code:_________________
% Function definition
fun = @(X)blackboxfunction( ...
A, ...
time_start,time_stop, ...
X) ...
-Experimental;
% Set up Lsqnonlin options
options = optimoptions(@lsqnonlin,...
'Algorithm','trust-region-reflective', ...
'Display', 'iter', ...
'UseParallel', false, ...
'StepTolerance', 5e-6, ... Step-size stopping criterion
'FunctionTolerance', 1e-6, ... Function stoppping criterion
'TypicalX', TypicalX, ...
'FiniteDifferenceStepSize', 1e-4, ... % This seems to work fine, and faster
'DiffMinChange', 1e-6); %, ...
% Create optimization problem
problem = createOptimProblem('lsqnonlin', ...
'objective', fun, ...
'x0', initial, ...
'lb', lowlimit, ...
'ub', uplimit, ...
'options', options);
% Create and run MultiStart object
ms = MultiStart('FunctionTolerance',2e-4,'XTolerance',5e-3,...
'StartPointsToRun','bounds-ineqs', 'Display', 'iter', 'UseParallel', true);
[ms_params,ms_fval,ms_eflag,ms_output,ms_manymins] = run(ms, problem, 10)
My questions are:
  1. UseParallel should be 'true' for MultiStart alone, or both for MultiStart and the optimoptions object?
  2. Neither options above seem to be making the parallelization work. I don't have tic/toc data right now, but it's going no faster than when I did lsqnonlin for a single starting point. Is MultiStart with 'n' start points supposed to take as much time as n*lsqnonlin runs with one start point? (I suppose that makes sense, but I thought I'd still ask)
  3. What is the meaning of the error below:
[Error: idasErrorHandler::183] In function 'IDASolve' of module 'IDAS', error code 'IDA_ILL_INPUT':
At t = 14.3214, , mxstep steps taken before reaching tout.
[Error: integrate::1378] IDASolve returned IDA_TOO_MUCH_WORK at t = 14.3214

 Akzeptierte Antwort

Alan Weiss
Alan Weiss am 19 Jun. 2020
I have to ask: do you have Parallel Computing Toolbox installed? It is required for MultiStart to run in parallel.
I do not understand the error that you show, but it seems to be an error thrown from your ODE solver. IIs the ODE being solved for the time interval you specify?
If you read parfor Characteristics and Caveats, you will see that you do not have to set parallel computing for your local optimizer, lsqnonlin in your case, but it doesn't matter whether you do or not, because MultiStart takes the outer parallel loop, and this disables parallel lsqnonlin.
Alan Weiss
MATLAB mathematical toolbox documentation

14 Kommentare

Alan,
  1. Parallel Computing Toolbox: Yes. It says 'Parallel Computing Toolbox, Version 7.1' when I check using 'ver'. The license test results in a 'true' result as well.
  2. I do not understand what you mean by the next question - pertaining to the time interval. Could you elaborate? The 'model' or 'blackbox' solver does solve for the times we specify (the time it took for the experiment to run), which is about 1-2hr.
  3. Parfor: I see! Makes sense. So I'm not doing it wrong.
Another question: Since I'm specifying 'UseParallel' = true, I do not need to put in any explicit parfor loops in the code, right?
Please enter
parpool
before running MultiStart and check whether the resulting pool is what you expect. I am no expert on setting up a parallel pool, I am simply suggesting that you check that the pool is set up correctly.
Also, and again this reflects my ignorance, does your MEX diff eq solver run in parallel? Does each worker in your parallel pool have access to their own copy of that solver? In order for the solver to run effectively in parallel, they must each have their own copy. Otherwise, this is a potential bottleneck in the code.
Parallel MultiStart simply runs a parfor loop around your local solver, passing in new initial points for the local solver. You don't need MultiStart to test whether your parallel pool and MEX solver can operate together efficiently. But I am starting to believe that they cannot, at least with your current setup.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
Thanks,
I think I've checked that the MEX solver successfully runs in parallel. Code and comments at the bottom.
But even before we get there, I'd like to ensure I understand the benefits of MultiStart:
Say I use lsqnonlin
  • with 'UseParallel = true' (I'm assuming this parallelizes function evaluations?)
  • without MultiStart
And perform 'n' optimation runs (we run lsqnonlin 'n' times, with different starting points), which in total takes time 't1'.
Now, say I use MultiStart, lsqnonlin, with 'UseParallel = true', from 'n' starting points, which takes time 't2' to run.
Q. : Assuming parallelization works properly, and that number of iterations/ function solves for each optimization is roughly equal, then would we expect t1 = t2 (approximately)? I mean, do we expect MultiStart to work more efficiently/ faster than manually randomizing lsqnonlin? How much more efficient is it?
It might be a trivial question, but I'm not sure of the answer.
___
Here's a parfor loop where I brute-force the parameter space
  • A parameter mesh or space is generated
  • For each point in that space, X(j), we compare the 'model' and 'experimental' curves.
  • This successfully runs in parallel - the parallel plots are produced significantly faster than sequential.arfor j = 1:size(paramspace,1)
  • There's no optimization here, but this shows the mex file is being handled properly during parallelization? To make a plot, the solver needs to run. The plots are being produced much faster when in parallel, so the solvers are running in parallel/?
  • Thanks :)
I'm not sure what's going wrong in the previous comment, but I can't seem to edit it properly.
Here's the code instead.
parfor j = 1:size(paramspace,1)
%Parallelize the paramspace entry for each 'j ' - %%%%% PARAMSPACE_PAR = 'X', or parameter
paramspace_par = [paramspace(j,:,:), param_4]
% Solve the model
protein_model = Langmuir_run(...
CADET_protein_load, ...
CADET_Sections(2), ...
CADET_Sections(end), ...
paramspace_par); %%%%%%% THIS LINE CONTAINS THE POINT IN THE PARAMETER SPACE X(j)
%%% THERE IS NO OPTIMIZATION INVOLVED, we're just exploring the entire parameter space and
%%% plotting an output curve %%%.
%%%%% EVERYTHING BELOW IS SOME FORM OF OUTPUT %%%%%
% Get sum of squares
sum_of_sq_par = rssq(protein_model - protein_Exp);
sum_of_sq(j,i) = sum_of_sq_par;
% Plot the solved model
figure
plot(...
% Figure Name
figname = ....
end
Good job testing the parallel configuration. That is exactly the kind of test that shows the evaluations can occur in parallel.
FYI, when you run lsqnonlin in parallel, the thing that is parallelized is the graident estimation, which is done by finite difference steps. Depending on the situation, this could be faster or slower when evaluating in parallel, because there can be significant communication overhead in parceling out the evaluation points and other data. However, in your case, where each evaluation takes a long time, it might be beneficial to parallelize this calculation even with no MultiStart involved.
MultiStart is usually beneficial to run in parallel because local optimizations usually take some time, and having multiple local optimizations runniing in parallel usually saves time. I still find it mysterious that you see no benefit to running MultiStart in parallel. Unless perhaps you were running lsqnonlin in parallel, and that saved a lot of time, and then when you switched to parallel MultiStart the lsqnonlin calculation is no longer in parallel, and the two just happened to balance out. But I really don't know. Perhaps you could test to see the benefit from running lsqnonlin in parallel and not in parallel on small cases.
For more details on what the solvers do when running in parallel, see What Is Parallel Computing in Optimization Toolbox? and MultiiStart.
I hope this helps.
Alan Weiss
MATLAB mathematical toolbox documentation
Thanks for the discussion!
>>Unless perhaps you were running lsqnonlin in parallel, and that saved a lot of time, and then when you switched to parallel MultiStart the lsqnonlin calculation is no longer in parallel, and the two just happened to balance out.
Indeed I was running lsqnonlin in parallel. I will check this!
Also, thank you for your other answers on Mathworks - by now I've read a good dozen of them.
In fact, I was thinking of trying out genetic algorithms, when I encountered a comment of yours somwhere, advising MultiStart instead.
What I shall do is evaluate what you said in the italic text above. Then, if I see no time-benefit by Multistart (despite everything working as it should), I'll give the Genetic Algorithm a try.
Best,
SB.
Hello!! Can i ask your help please? I see that you know lsqnon and multistart
I have to minimize a function objective and find some parameters. I also use lsqnonlin +multistart. Here's my code for the part of lsqnon + multistart :
par0= [6.353500000000001e-10 30 3.277800000000000e-04 63604]; % 4 parameters initial
LB = [0 0 0 0]; %bornes inférieures des param
UB = [inf 120000 inf 200000];%bornes supérieures
f=@(x)Parameters_determination(x,b,mcat,Nu,S,D,p,vecx,nbpoints,D0,Ptot,b0,Hads,T_exp,R,V,Modcin,tauxdeconversion_exp); %it' s my objective function, it's called an ODE45 to solve a differential equation
opts=optimoptions(@lsqnonlin,'display','iter','FunctionTolerance',10e-9,'MaxFunctionEvaluations', 1000,'MaxIterations', 100,'FiniteDifferenceStepSize' , [1e-11 1e-6 1e-11 1e-6], 'PlotFcn','optimplotfval', 'StepTolerance', 10e-15,'FunValCheck', 'on' ); % Définition des options
problem = createOptimProblem ( 'lsqnonlin' , 'x0' , par0, 'objective' ,f, 'lb' , LB, 'ub' , UB, 'options',opts);
ms = MultiStart('PlotFcn', @gsplotbestf , 'Display', 'iter', 'StartPointsToRun', 'bounds');
[x,fval,exitflag,output,solutions] = run(ms,problem,5)
x
fval
exitflag
output
solutions
But when i run, multistart use one time the local solver and then it's stuck, it doesn't go until 5 calls of local solver like i ask. I don't find a solution to solve this, do you have an idea why multistart stuck like this?
I don't understand very well Parallel use, it' says that i must have a multicor processor, how can i know if i have a multicor processor please?
Also, i think that one of my problem is to optimize my choice of optimoptions of lsqnonlin and options of multistart to have better results but i don't know which options i must still modify , do you have an idea which options can i modify to get a result?
I will be very grateful for any help,
Marylen
The plot of function objctive value is like this :
there's one local solver launched only, then it's stucks, i have to force the stop to stop. Sometimes i tried with other initials parmeters and the local solver is launched two times but the second times gives worse results and then it's stuck
I suggest that you use the debugger. I suspect that, in the second run, the ODE solver gets stuck. But I cannot be sure without working code.
Alan Weiss
MATLAB mathematical toolbox documentation
Hello, thank you for your aswer
Indeed, so when it's stuck, i push the button 'pause when NaN ou inf is returned' and in the command widow there's the message : NaN/Inf breakpoint hit for ode45
I think that multistart run startpoints out of my limits (born) or generate bad start points , something like this but i really don't know how to pass it
I believe that you will have to write code that is robust to this kind of error. Either figure out the regions of bad initial points and set bounds or some other kind of code to avoid them, or learn to use try-catch statements to enable MATLAB to continue after attempting to evaluate a "bad" point.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
Hello, thank you very much.
I find the problem myself : multistart is not the problem, it's lsqnonlin. It use finitedifference, so i chose FiniteDifferenceStepSize=[1e-11 1e-6 1e-11 1e-6], it's good for the first vector of starting points that i fixed but it's not good for the starting points generating by multistart, so lsqnonlin is stuck. Base on this documentation : https://fr.mathworks.com/help/optim/ug/optimizing-a-simulation-or-ordinary-differential-equation.html#btfb69a
I tried Patternsearch but it's not adapted for my problem. I can't always modify the finitedifferencetepsize for each startingpoints generated by multistart. I can't use a gradient because my objective function is based on an differential equation, i solve this with ODE45 , then compare to experimental results, i have my objective function.
So Do you have other ideas to solve the problem of finitedifferencestepsize please ?
I will be very grateful,
Marylen
i find the problem, my differential equation is a stiff problem and i use ODE45 and this one is not totally adapt tot stiff equation , so i change to ode15s and multistart run well.
Thanks for letting us know. Glad you were able to fix the issue.
Alan Weiss
MATLAB mathematical toolbox documentation

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Produkte

Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by