## Optimization using lsqnonlin on very distinct data sets that depend on the same variables

### Niels (view profile)

on 20 Oct 2016
Latest activity Commented on by Niels

### Niels (view profile)

on 24 Oct 2016
I am working with some large data sets ( N rows of data with 1 parameter varied, each consisting of M points) for which it is assumed that there exists a function that is able to accurately describe each of these rows of data. This function consists of P fit parameters and the one that I vary.
Now, M is a very large number and I cannot afford to use my fitting routine on all N rows of data. Fortunately, my fitting function can be integrated, such that I can instead consider the much smaller single-row data set consisting of just N points.
Getting a nice fit through the integrated quantities goes fast and gives me physically realistic values for my P fit parameters. However, when I then plug in the fit parameters in my original function to compare it to one of the N rows of M points, the result can be way off...
So what I now want to do is make a routine where I consider e.g. 2 out of my N rows, as well as the integrated data for my fitting routine. I tried to simply concatenate everything, but the values and numbers of points may differ significantly and in the end I get similar results as when I consider just a single row of M points at the cost of a slower routine.
How can I realize this combined fitting routine and make everything equally important, independent of the big difference in N and M?

Show 1 older comment
Niels

### Niels (view profile)

on 21 Oct 2016
Hi Matt,
My model is basically of the form y( t, k ) = f( t, k, p_0,...,p_n ).
As t is a very large array, it is computationally very expensive to apply lsqcurvefit to all k:
k = 1;
p_out = lsqcurvefit( @(p,t) f(p,t,k), p_init, t, y, p_min, p_max, fit_opts);
The above may already take up to 5 minutes to get a decent fit. Solving for all k at once is something my computer does not like at all.
So instead I am using something like
p_out = lsqcurvefit( @F, p_init, k, Y, p_min, p_max, fit_opts);
where Y( k ) = F( k, p_0,...,p_n ), which is simply y (numerically) integrated over t and F is the integral of f over t from t_0 to infinity.
Now I want to get to some intermediate form where I fit Y( k ) vs F( k ), then plug my p_out into f for e.g. k=1 and k=20 and compare these results to y. I can do both of these separately, but I am stuck at getting my p values to converge to something that gives equally good results in both Y vs F and y(t,k=1) vs f(t,k=1) and y(t,k=20) vs f(t,k=20) due to length(k) << length(t).
Matt J

### Matt J (view profile)

on 21 Oct 2016
Yes, but this is basically a restatement of your original question. What does f(t,k,p0,..pn) look like and how large is length(t) and length(k)? Without knowing that, we have no way of making informed recommendations.
Niels

### Niels (view profile)

on 24 Oct 2016
My apologies. Hopefully the following will give you a better idea:
length(t) can be anything between 5e4 and 5e6, length(k) varies roughly from 10 to 50 and the y(t) I fit against can be quite noisy, but the integrated quantities do not suffer from that and reproduce very well if I do not vary my k.
In f(t,k,p_0,...,p_n), I basically take a summation of N individual contributions in a form similar to below snippet:
function y = f(p,t,k)
% ... Some input checking to make sure all the inputs
% are fed with the correct dimensions...
% size(t) = [M,1];
% size(p) = [1,2*N+1];
% k is a scalar
%%Split up the input parameters
N = (length(p)-1)/2;
N1= 1:N;
a = p( 0*N + N1 );
b = p( 1*N + N1 );
c = p( end );
%%Vectorized calculation of the output using simple matrix multiplications.
y = exp( -( (t-k) * (1./(a+N1*c)) ).^2 ) * b.';
In reality, f is more complicated with a significantly larger number of input parameters, but it is simple enough to be able to expand and perform summation with just some permutations and an occasional bsxfun. F(k,p_0,...,p_n) is simply the definite integral of f w.r.t. t from g(k) to infinity.