How to fit a 2D Gaussian function to noisy data in Matlab? Or data with another number of dimensions and arbitrary fit function?
It took me a while to find out how Matlab does this by the functions lsqcurvefit, fmincon or fminsearch. This project demonstrates how to use these functions to fit ND data with ND functions. Several tests are compiled in the file fit_test.m. Reading and trying out this file will explain a lot.
fit_nl.m and fit_nl_ex.m are written around the matlab functions lsqcurvefit/ fmincon/fminsearch. Additionally: parameters can be set fixed or some built-in functions (Gaussians, Lorentzians) can be used (see fit_func.m). However the set of built-in functions is pretty special.
fmincon/fminsearch can even be used to solve maximum likelihood parameter estimation problems other than for gaussian noise (least squares). An example is included for Poissonian noise in fit_nl_ex.m. Although this is often the case in imaging, this part is also quite special. If You have questions about these problems, ask here!
Jan, Your submission was very helpful for me. I am quite new to matlab and had trouble figuring out how to use lsqcurvefit for 2-d fitting. Going through your program helped me figure out the mistakes I was making.
I have just submitted a new version. There I have changed the aim of the whole package towards demonstrating the possibilities of the functions lsqcurvefit/fmincon/fminsearch. The syntax is now much more clear and is similar to the syntax of the underlying matlab functions. Much more comments, explanation, h1 lines and error checks have been included. (Indeed much more than usual for me. :)) However also some additions have been taking place that might be of interest. In particular fit_nl_ex() now maximizes the likelihood and is therefore more universal than only least squares. Treatment of Poissonian noise which often occurs e.g. in imaging is incorporated. And I have included the possibility to add penalties depending on the parameter on the likelihood. Matlab itself has only all-or-nothing penalties via the extended options of fmincon. But when you know something about the distribution of parameters instead of hard borders an additive penalty is the choice.
However, instead of the assuring results of the test routine fit_test there might still be errors, both conceptual and coming from programming. Please let me know about them. I will correct them. However, I do not plan to add any functionality further than this.
The second version only has one or two more comments. The definitions of the built-in functions are explained. I am still working on an improved version.
I hope that you fix the problems. I'm always happy to update my review to reflect problems that have been repaired, and I'm always willing to work in depth (via e-mail) with an author who wants to improve something. I see that there is a new version, so I'll take a second look today.
Jan, you should get 5 stars for your very kind way to reply to a reviewer. I have the impression that too many lack the good manners to communicate with each other here. That's a good sign.
from the author:
I really appreciate the review by John D'Errico. Its amazing completeness indeed brings out all the flaws in my code including the mediocre help. However I could learn a lot from it and I really like that.
First of all the reason why I thought a n-D fit tool was not already built in. From the Matlab documentation I extracted the information that lsqnonlin only works with a function (fun) that takes vectors as input and output which means 1D functions. Before, I was using nlinfit from the Statistics toolbox and a way to do n-D fits with nlinfit was not obvious to me. Furthermore I did not find any examples for n-D fits on the web but was desparate for a solution for one of my problems. Just today I read this line for the first time in the documentation of lsqnonlin going: "If the user-defined values for x and F are matrices, they are converted to a vector using linear indexing." Reading more thoroughly would have saved me quite some time and indeed makes the code kind of unnecessary...
The second point is the commenting which can be clearly much better before sending it to other people. About error checking I have ambivalent feelings. I rather do not include all possible error checkings but only the reasonable ones since the effort will be large, the amount of error checking will be larger than the rest of the code and the performance of the execution will be slower.
So all in all I consider either delisting or rewritting it in a way more like an example for how to use lsqnonlin with n-D functions.
There is absolutely no reason why lsqnonlin or lsqcurvefit cannot be employed to fit models with more than one independent variable. In fact, those tools already automatically do handle n-D data. Since they are already part of the optimization toolbox, I see no valid reason to have built this code. It is not friendly or complete enough to justify its existence.
Given the above comment, as long as I've downloaded it, I'll take a look at the code itself, starting with fit_nl.
The help is somewhat reasonable at first glance, with a couple of glaring flaws. First, there is no H1 line. An H1 line is the very first line of help, the very first commented line. It is used by lookfor to allow you or someone else to find this function next year when they forget the name. The H1 line should include a variety of appropriate keywords that describe the function and what it does. As well, an H1 line is used to build the contents file, used by help itself when there is no contents.m provided.
Next, while the help is fairly clean and complete, some parameters have problem in the description. The variable "fixed" is supposed to indicate which variables are to be held fixed, if any. fixed is supposed to be a logical vector. But what does the presence of a 1 indicate? Does it mean that you are to estimate the parameter, or that it be held fixed? Tell the users how to use your code! Don't make them guess.
In addition, there are three optional parameters. Whenever you have an optional parameter, be nice to your user and tell them what will happen if they do not supply that parameter at all, I.e., what are the defaults?
The worst flaw with the help is the data "stack". The variable data is supposed to contain an "n-d data stack". But that is as much as we are ever told. What/where are the independent variables? What/where is the dependent variable (i.e., the aim)? How are we to know how to use this if we must guess?
Next, fit_nl includes a nested function. F, that computes the likelihood of a 3D Gaussian. What happens if your problem is not a 3-d gaussian?
Next, look at fit_func. This offers a small variety of standard specific model types, but the help is VERY INCOMPLETE here. You are not told how to use it, or what the model types are or what they mean.
This submission really would benefit from better help. It feels like the author put it all together, but then got bored at the end, not wanting to invest the extra work to make it useable.
There is error checking in the code, although not a huge amount. Think about what the user can screw up, and give them some help! Give them a hint about what you see that has gone wrong.
Were there things I did like about this submission? Yes. The help is clean and readable, and nicely laid out. The few error checks that are in the code are good. There is pretty good internal commenting provided, so if you want to read (or debug) the code, you might be able to work through it. One thing that might be somewhat useful is the ability to fix an unknown. Every once in a while someone asks to do this. Its pretty trivial to do in any optimizer anyway, so I can't be too excited about it.
Overall, I originally considered giving this code a higher rating. The problems I mention above, along with some others suggest this code is not fully thought out. While I'm sure the author knows how to use it for their own purposes, they don't give good enough help to explain its use to anyone else, and I'm not at all comfortable asserting that it can be easily used for other model forms that are not already programmed. Again, I'm sure the author thinks it is good, but the semi-complete help is very inadequate here. Given that this solves absolutely no problems that are already not directly solved by use of lsqcurvefit or lsqnonlin, I have a hard time giving it a higher rating than a 2.
Were the many problems I've described above to be repaired, I'd consider revising my rating upwards. Even then I'd still wonder why this code exists in the first place.
Some bugs, better help, more comments, error checks, a better test routine and a little bit more features ... version 2.
Added more comments, especially on order of parameters for builtin functions.