How to calculate confidence interval of parameter estimated by global optimization--pattern search

Hi everyone,
I recently used pattern search to find the global minimal. Though the algorithm worked very well to find me a solution, I do not know how to estimate the confidence interval for these predicted values. Different from local lsqcurvefit which offers output can be used to calculate confidence interval directly, pattern search does not that output.
So I would like to know normally how do people handle this after they get their global minimal solution? How do they perform a uncertainty test on their prediction? And do they generally perform any other statistical analysis/inference on it? Any suggestion will be highly appreciated!
Rui

Antworten (2)

Matt J
Matt J am 8 Jun. 2013
Bearbeitet: Matt J am 10 Jun. 2013
It's usually pretty hard to get statistical information about a solution for general problems. I imagine most people just measure statistical variation of the solution by running repeated simulations.
The methods you're talking about usually assume that all of the following are true of the problem
  1. It is unconstrained and differentiable
  2. It uses a least squares objective function
  3. The residuals are Gaussian distributed.
If that is your situation, you could compute the Jacobian at the solution yourself. Or, once you've found the solution, you can feed it to LSQCURVEFIT as an initial point and use its Jacobian output in the usual way. LSQCURVEFIT should stop in 1 iteration, since your solution is already optimal.

5 Kommentare

Hi Matt,
Thanks! This is a constrained problem, and not smooth at all. So I tried the first method you mentioned:
Data sets are pooled together, and parameters are estimated using global optimization toolbox. Then data sets are randomly picked for bootstrap, parameters are estimated again. It is repeated for a couple of times to get mean and confidence interval.
However, my coworker said our data sets came from different studies, where experimental conditions might not be the same; hence it might not be proper to use bootstrap here. However, since none of us is statistician, we do not know if this is a right way to do it.
So what do you think?
I'm not a statistician either... But not controlling the experimental conditions does sound bad.
These are published data from several papers...but still thanks a lot!
What I meant was, computing statistical variation based on data repetitions that are not i.i.d. sounds problematic.
Matt, that's true. We have 7 compounds. Data set (concentration time course, several data points) for each compound is from a different paper. Our model has a few compound-specific but known parameters, and a few parameters same across compounds, which we try to estimate. These parameters are estimated by minimizing errors between pooled simulation and pooled data with a global algorithm. Can we still use bootstrap safely under this situation?

Melden Sie sich an, um zu kommentieren.

There are no analytic approximations for CIs for pattern search similar to linear regression. The best alternative as you note is to use a resampling technique. Of course you are further away from an ideal world if you violate MattJ points.
BOOTCI is probably your best bet:

7 Kommentare

Hi Shashank,
Thanks a lot! Let me rephrase my problem with more details. We have 7 compounds. Data set (concentration time course, several data points) for each compound is from a different paper. Our model has a few compound-specific but known parameters, and a few parameters same across compounds, which we try to estimate. These parameters are estimated by minimizing errors between pooled simulation and pooled data with a global algorithm. To do a bootstrap, I randomly pick a compound, put it into my modeling set, and estimate parameters. It is repeated a couple of times, so every time my modeling set is a little bit different, and finally I get mean and standard deviation...but as Matt mentioned, these data are not iid.
If I want to do a bootstrap in whichever way, is there anything specific I need to know given my current situation? Is it way too different from normal situation where people can bootstrap safely?
I randomly pick a compound, put it into my modeling set, and estimate parameters. It is repeated a couple of times, so every time my modeling set is a little bit different, and finally I get mean and standard deviation...but as Matt mentioned, these data are not iid.
Can't you keep the compound the same across the different repetitions? I imagine they would be i.i.d for a fixed compound. That way you can get the mean and std for each compound individually. Since there are only 7 compounds, it doesn't sound hard to study the statistics of each compound separately.
No, I cannot do it that way. We are trying to a set of parameter simultaneously minimizing error between simulation and data of multiple compounds. In addition, we cannot sample individual data point for each compound, because this is a time concentration curve. Hence, if I use only one compound, the result will be the same for every repetition, because I am using the same data every time. Sorry for the confusion.
OK. Well, I don't have a clear picture of what the sources of randomness are in this application and possibly need to have a lot more knowledge of chemistry to get one.
You're saying that the time/concentration curve (whose parameters you are trying to estimate) is a deterministic function once some initial selection of compounds is made, but that that initial selection of the compounds itself (and their concentrations?) is random? If so, then as long as you simulate the selection of initial compounds according to a fixed distribution from trial to trial, you should get i.i.d. results.
Hi Matt, thanks a lot!
Let me rephrase it. We have 7 time/conc. curves from 7 different papers. Each curve has several time/data points. Our model is trying to predict these data points when supplied with both compound dependent parameters (known) and compound independent parameters. The compound independent parameters are estimated by minimizing errors between pooled data and pooled predicted data. Then time concentration curves are randomly picked according to a uniform distribution. According to the curves chosen, corresponding simulations are made from our model. Then both are pooled, and calculate error. Is this still iid?
Then time concentration curves are randomly picked according to a uniform distribution. According to the curves chosen, corresponding simulations are made from our model.
If each simulated trial is randomized based on the same uniform distribution, then yes, it does sound i.i.d.

Melden Sie sich an, um zu kommentieren.

Gefragt:

Rui
am 8 Jun. 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by