Fitting nonlinear noisy data

Question

Samuele Bolotta am 18 Mär. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/776437-fitting-nonlinear-noisy-data

Kommentiert: Bjorn Gustavsson am 22 Mär. 2021

I am fitting a function to some simulated data. The procedure works perfectly, but I would like to know if it can be made more robust to noise. When I use this amount of noise:

y = awgn(CPSC,35,'measured');

It still works very well. But if the amount of noise gets increased to:

y = awgn(CPSC,25,'measured');

In 15% of cases the fit is completely wrong.

This is the function that I use to generate the data:

function [EPSC, IPSC, CPSC, t] = generate_current(G_max_chl, G_max_glu, EGlu, EChl, Vm, tau_rise_In, tau_decay_In, tau_rise_Ex, tau_decay_Ex,tmax)
    
    dt = 0.1;                               % time step duration (ms)
    t = 0:dt:tmax-dt;
    
    % Compute compound current
    
    IPSC = ((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl));
    
    EPSC = ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu));
    
    CPSC = IPSC + EPSC;
    
end

And this is the fitting procedure:

% Values generated by simulation
[~,~,CPSC,t] = generate_current(60,40,0,-70,-30,0.44,15,0.73,3,120);
% Initial values
gmc = 90;
gmg = 90;
tde = 1;
tdi = 1;
tre = 1;
tri = 1;
% Apply white noise to the CPSC
y = awgn(CPSC,35,'measured');
% Alternatively, without noise
% y = CPSC;
%% Perform fit
[xData, yData] = prepareCurveData(t, y);
% Set up fittype and options.
ft = fittype( '((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl)) + ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu))', 'independent', 't', 'dependent', 'y' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.Lower      = [-70 0 1   1   -30 0 0 0 0];
opts.StartPoint = [-70 0 gmc gmg -30 tde tdi tre tri]; % Starting values
opts.Upper      = [-70 0 150 150 -30 20 20 5 5];
[fitresult1, gof1] = fit(xData, yData, ft, opts)
%% Plot fit with data
figure( 'Name', 'Fit' );
h = plot( fitresult1, xData, yData );
legend( h, 'CPSC at -30mV', 'Fit to CPSC', 'Location', 'NorthEast', 'Interpreter', 'none');
subtitle('Realistic values')
% Label axes
xlabel( 'time', 'Interpreter', 'none' );
ylabel( 'pA', 'Interpreter', 'none' );
grid on

How can I make it more robust to noise?

Thanks!

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

John D'Errico am 22 Mär. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/776437-fitting-nonlinear-noisy-data#answer_654732

This is something I recall reading about many years ago. Nonlinear least squares in the presence of high noise is a classically bad problem. It tends to converge poorly. It requires very good starting values, else it will likely diverge to some meaningless result.

The fix is simple. Ok, simple is not always truly simple to achieve. It often involves one or more of these ideas, possibly all three:

Get better data. Yeah, I know. Not so easy some of the time. More data will not hurt either, especially if it is good .
Provide better starting values. Also not easy.
Apply intelligent constraints on the parameters to reduce the search space.
Use a robust solver, often an iteratively re-weighted solver, that can decrease the penalty on those large residual points to allow the solver to converge.
Multi-start methods are a good choice, since they improve the chance you will get one start point in the basin of attraction of the solution.

In the end, if your data is total crapola, nothing else matters but to get better data.

2 Kommentare
Keine anzeigenKeine ausblenden

Samuele Bolotta am 22 Mär. 2021

Bearbeitet: Samuele Bolotta am 22 Mär. 2021

Thanks for the great answer!

I am working to come up with some intelligent constraints. I'm sure I can reduce the search space for the four nonlinear parameters.

As for the fourth point, do you have something specific in mind? Because in the meanwhile I have implemented a multi-start method, and suprisingly it does not give any significant advantage over the normal fit function.

Also, do you think that Splitting the Linear and Nonlinear Problems (https://it.mathworks.com/help/optim/ug/nonlinear-data-fitting-problem-based-example.html#NonlinearDataFittingProblemBasedExample-4) could potentially help? Two of my parameters are indeed linear.

Bjorn Gustavsson am 22 Mär. 2021

Here's the curves produced by the OPs function and the "very noisy" data and the best fitting weighted lsq-fit with the desired parameters as fitting variables. All the worries above are at least for the OPs example not of major importance.

Melden Sie sich an, um zu kommentieren.

Answer 2

Alan Weiss am 18 Mär. 2021

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/776437-fitting-nonlinear-noisy-data#answer_651242

Most likely the issue is that there are multiple local minima, as in this example: Nonlinear Data-Fitting Using Several Problem-Based Approaches, especially the section Split Problem is More Robust to Initial Guess.

In that example a "split problem" approach worked. In general, you might want to use multiple start points, as in MultiStart Using lsqcurvefit or lsqnonlin.

Good luck,

Alan Weiss

MATLAB mathematical toolbox documentation

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Samuele Bolotta am 22 Mär. 2021

Thank you for your reply.

In the meanwhile I have implemented a multi-start method, and suprisingly it does not give any significant advantage over the normal fit function. Do you think it is still worth trying to split the linear and the nonlinear problems?

Melden Sie sich an, um zu kommentieren.

Answer 3

Bjorn Gustavsson am 18 Mär. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/776437-fitting-nonlinear-noisy-data#answer_651192

In MATLAB Online öffnen

When I do this I typically write my own error-function or residual-function and then use fminsearch (or some of the general optimization-contributions on the file-exchange, like fminsearchbnd or optimize). These functions do the same job as fit as far as I understand, but with a more direct hands-on interface for the users. That way I can write my error-functions with the properly weighted contributions when I have measurements with varying standard deviation (I typically get away with the assumption that my noise is generaly normal-distributed, but if it isn't it's also possible to generalize to optimizing the liklelihood/log-likelihood for other noise-distributions). Therefore I'd solve your problem by turning away from fit and use lsqnonline (or fminsearch) instead:

function res = your_fit_residuals(pars,t,y_obs,y_std)
% Your residual-function for use with lsqnonlin
% that also takes the estimated standard deviations of your observations
% giving you a weighted least-square solution for your fit
if nargin == 3
  y_std = 1; % if no std given defaults to a constant and get a standard lsq-fit
end
G_max_chl = pars(1);
tau_rise_In = pars(2);
tau_decay_In = pars(3);
Vm = pars(4);
EChl = pars(5);
G_max_glu = pars(6);
tau_rise_Ex = pars(7);
tau_decay_Ex = pars(8);
EGlu = pars(9);
y_mod = ((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl)) + ...
        ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu))');
res = (y_mod-y_obs)./y_std;
end

Then you call lsqnonlin like this:

Upper      = [-70 0 150 150 -30 20 20 5 5];
StartPoint = [-70 0 gmc gmg -30 tde tdi tre tri]; % Starting values
Lower      = [-70 0 1   1   -30 0 0 0 0];
% You have fixed the first second and fifth parameter here, I'm also unsure
% about the ordering of parameters the FIT-function uses so that you have
% to check...
sigma_y = % You have to get estimates of the uncertainty of your measurements from somewhere
[pars_lsqnonlin = lsqnonlin(@(pars) your_fit_residuals(pars,xData, yData,sigma_y),...
                            StartPoint, Lower,Upper);
yModel = your_fit_residuals(pars_lsqnonlin,xData, 0*yData);

With information about the variation of measurement uncertainty the fit will weight the contributions from the different data-points accordingly.

HTH

7 Kommentare
5 ältere Kommentare anzeigen5 ältere Kommentare ausblenden

Samuele Bolotta am 18 Mär. 2021

In MATLAB Online öffnen

The idea is: once I'll have recorded data, I will have to use this function and fit it to the data:

((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl)) + ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu))

For now, I'm still using simulated data to test the goodness of the fitting procedure. And so if I decide to generate these simulated data:

% Values generated by simulation
[~,~,CPSC,t] = generate_current(60,40,0,-70,-30,0.44,15,0.73,3,120);
% There are six values that are unknown to the fitting procedure (1st, 2nd,6th,7th,8th and 9th), 
% while all the other are constrained

I want to fit the previous function to these data and correctly estimate that the values are indeed 60, 40, etc.

This is something my procedure can already do, but if I add some noise to the data it starts being less effective.

Of course when I'll use experimental data, I won't know the values: those six values could be anything between the lower and upper bounds I defined.

Thanks!

Bjorn Gustavsson am 18 Mär. 2021

In MATLAB Online öffnen

OK, I start to understand your plan. For the case where I want to keep some parameters fixed and only let a subset of them vary I use a pattern like this:

function res = your_fitsomepars_residuals(pars,idx4pars,allpars,t,y_obs,y_std)
% Your residual-function for use with lsqnonlin
% that also takes the estimated standard deviations of your observations
% giving you a weighted least-square solution for your fit
if nargin < 6 || isempty(y_std)
  y_std = 1; % if no std given defaults to a constant and get a standard lsq-fit
end
% Here the varying parameters are inserted into the allpars array. 
allpars(idx4pars) = pars;
G_max_chl = allpars(1);
tau_rise_In = allpars(2);
tau_decay_In = allpars(3);
Vm = allpars(4);
EChl = allpars(5);
G_max_glu = allpars(6);
tau_rise_Ex = allpars(7);
tau_decay_Ex = allpars(8);
EGlu = allpars(9);
y_mod = ((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl)) + ...
        ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu))');
res = (y_mod-y_obs)./y_std;
end

Then your call would be modified to something like this:

Upper      = [-70 0 150 150 -30 20 20 5 5];
StartPoint = [-70 0 gmc gmg -30 tde tdi tre tri]; % Starting values
FixedPars = StartPoint;
Lower      = [-70 0 1   1   -30 0 0 0 0];
idx4varpars = [2 3 4 6 7 8]; % The ordering of the parameters in this sketch is
                             % clearly messed up relative to yours...
% You have fixed the first second and fifth parameter here, I'm also unsure
% about the ordering of parameters the FIT-function uses so that you have
% to check...
sigma_y = % You have to get estimates of the uncertainty of your measurements from somewhere
[pars_lsqnonlin = lsqnonlin(@(pars) your_fitsomepars_residuals(pars,idx4varpars,FixedPars,xData, yData,sigma_y),...
                            StartPoint(idx4varpars), Lower(idx4varpars),Upper(idx4varpars));
yModel = your_fitsomepars_residuals(pars_lsqnonlin,idx4varpars,FixedPars,xData, 0*yData);

That way you completely remove the variables that's not in idx4varpars from the optimization.

Samuele Bolotta am 21 Mär. 2021

In MATLAB Online öffnen

That is great help! I think I'm almost there, but I've got a couple of doubts. First, I'll attach what I've written so far:

% Time and simulated data (y_obs)
[EPSC,IPSC,CPSC,t] = generate_current(60,40,0,-70,-30,0.44,15,0.73,3,120);
y_obs = awgn(CPSC,25,'measured');
% Known values
Vm = -30;
EChl = -70;
EGlu = 0;
% Initial values for fitting and bounds
gmc = 51; gmg = 50; tde = 1; tdi = 1; tre = 1; tri = 1;
Upper      = [150 5 20 Vm EChl 150 5 20 EGlu];
StartPoint = [gmc tri tdi Vm EChl 80 tre tde EGlu]; % Starting values
FixedPars = StartPoint;
Lower      = [1 0 0 Vm EChl 1 0 0 EGlu];
% Indexes
idx4varpars = [1 2 3 6 7 8]; 
% Standard deviation
y_std = std(y_obs);
% Lsqnonlin
[pars_lsqnonlin] = lsqnonlin(@(pars) fit_residuals(pars,idx4varpars,FixedPars,t, y_obs,y_std),...
                            StartPoint(idx4varpars), Lower(idx4varpars),Upper(idx4varpars));
                            
[IPSC_fit, EPSC_fit, yModel] = fit_residuals(pars_lsqnonlin,idx4varpars,FixedPars,t, 0*y_obs);
%% Plot fit with data
figure
subplot(3,1,1)
plot(t,y_obs)
hold on
plot(t,yModel)
title("Compound current")
legend("Recorded","Fit")
subplot(3,1,2)
plot(t,IPSC)
hold on
plot(t,IPSC_fit)
title("Inhibitory current")
legend("Recorded","Fit")
subplot(3,1,3)
plot(t,EPSC)
hold on
plot(t,EPSC_fit)
title("Excitatory current")
legend("Recorded","Fit")

And:

function [IPSC_fit, EPSC_fit, res] = fit_residuals(pars,idx4pars,allpars,t,y_obs,y_std)
if nargin < 6 || isempty(y_std)
  y_std = 1;
end
allpars(idx4pars) = pars;
G_max_chl = allpars(1);
tau_rise_In = allpars(2);
tau_decay_In = allpars(3);
Vm = allpars(4);
EChl = allpars(5);
G_max_glu = allpars(6);
tau_rise_Ex = allpars(7);
tau_decay_Ex = allpars(8);
EGlu = allpars(9);
IPSC_fit = ((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl));
EPSC_fit = ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu));
y_mod = ((G_max_chl) .* ((1 - exp(-t / tau_rise_In)) .* exp(-t / tau_decay_In)) * (Vm - EChl)) + ...
        ((G_max_glu) .* ((1 - exp(-t / tau_rise_Ex)) .* exp(-t / tau_decay_Ex)) * (Vm - EGlu));
    
res = (y_mod-y_obs)./y_std;
end

So, my doubts are:

A) How should I calculate the standard deviation? Perhaps this is a naive question

B) The fit is clearly inaccurate and it's because I'm doing something wrong. In particular, I am not sure how calculating y_mod helps, if it is done with the starting values.

Thanks again!

Bjorn Gustavsson am 21 Mär. 2021

Bearbeitet: Bjorn Gustavsson am 22 Mär. 2021

In MATLAB Online öffnen

Good, that looks OK at a first glance.

A: Your question about hte standard-deviation estimation might be seen as "naive" at a first glance but is anything but naive. It is typically the most tricky one to actually get right in a typical observation scenario. You should try to get estimates of the standard deviation for each individual point of your measurements. That is sometimes relatively easy, sometimes much harder or very difficult indeed. In your case I tried:

y_std = movstd(y_obs-filtfilt(ones(1,11),1,y_obs),23);

Which gives you something. What you will/should do when you get real data you'll have to learn by looking at your data, and discussing with colleagues about your measurement characteristics. Here I simply made some low-pass filtering of your data - which corresponds to the assumption that you should have a trend that's smoothly varying with time.

B: you have to make res the first output variable from the fit_residuals function - that is the variable lsqnonlin tries to fit to. When I did that and simply overplotted y_mod the fit was good.

y_mod is not calculated with the starting-values - the first thing that happens is to replace the varying parameters in the allpars variable with the values in the pars variable and those are the variable that lsqnonlin modifies.

Samuele Bolotta am 22 Mär. 2021

Thanks for the great input. Much appreciated!

Melden Sie sich an, um zu kommentieren.

Fitting nonlinear noisy data

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (2)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

7 Kommentare
5 ältere Kommentare anzeigen5 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Fitting nonlinear noisy data

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (2)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

7 Kommentare 5 ältere Kommentare anzeigen5 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

7 Kommentare
5 ältere Kommentare anzeigen5 ältere Kommentare ausblenden