Calculating weighted confidence interval

6 Ansichten (letzte 30 Tage)
Drach
Drach am 12 Jun. 2020
Kommentiert: Drach am 13 Jun. 2020
Hi there,
I have problems generating the weighted confidence interval for a fit that I generated using the curve fitting toolbox.
Below is the result. The red line is a exponential fit (second order), the dotted line is the weighted confidence interval. It might be obvious, but I imagined the weighted confidence interval a lot narrower. I tried to reproduce this method form stackoverflow:
This is my code:
figure(1)
plot(X_data, Y_data, '.', 'MarkerEdgeColor', 'k')
grid on
%% I got these values from the curve fitting toolbox
a = 1.677;
b = 0.001851;
c = -1.594;
d = -0.01021;
X_fit = 0:0.01:400;
fit_data = a.*exp(b.*X_fit) + c*exp(d.*X_fit);
plot(X_fit, fit_data,'r')
xlim([0 400])
predint_data = predint(fittedmodel,X_data, 0.95,'observation','off');
fitY = feval(fittedmodel,X_data);
ci = (fitY-predint_data(:,1)).*X_data;
Conf_new = [fitY-ci,fitY+ci];
plot(X_data,Conf_new,'m--')
I attached the X_data, Y_data and the fittedmodel from the curve fitting toolbox.
I have similar results when using another fitting method, for example the power function. Why are my confidence intervals so huge?
It would be appreciated very much if anyone could help me on this one.
Thanks in advance!

Antworten (1)

John D'Errico
John D'Errico am 12 Jun. 2020
Bearbeitet: John D'Errico am 12 Jun. 2020
I looked at your data, but since you never provided the weights, I cannot reconstruct the fitting process.
By the way, it is a REALLY bad idea to call a variable fit, since fit is one of the most important tools in the curve fitting toolbox.
As well this:
predint = predint(fittedmodel,X_data, 0.95,'observation','off');
is an insanely bad thing to do in MATLAB. You just used the function predint, to create a variable, that you then stuffed into a variable named predint. That means you can now no longer use the FUNCTION predint again.
However, looking at your data, I see this:
I find it difficult to imagine your justification for a two term exponential model there. Regardless, the width of the confidence intervals on your data should be pretty large, given the complete crap you have for data.
Finally, you are totally misusing the prediction intervals that do come out of predint.
ci = (fitY-predint(:,1)).*X_data;
ci = (fitY-predint(:,2)).*Y_data;
Huh?
Even if I ignore the first of those two lines, since you then overwrite it, this is NOT how you would compute prediction intervals.
  5 Kommentare
John D'Errico
John D'Errico am 13 Jun. 2020
It is good that you no longer use fit or predint as variable names.
As Adam points out, this is highly heteroscedastic. And just using the first model you find from a library that seems to fit your data, but with no apparent reason to make that choice from physical principles is dangerous as hell. Yes, of course a linear model will predict poorly near zero on that data.
I still fail to see what weights you chose to employ for the fit, though. I assume you used X_data to compute the weights? Perhaps something in an inverse form, given the way you tried to compute ci?
weights = 1./X_data;
Since X_data(1) is zero, that would be problematic of course. But I see no place where you show HOW you provided the weights in that fit. And computing the confidence interval would need that information. Again however, this:
ci = (fitY-predint_data(:,1)).*X_data;
is meaningless to compute confidence intervals. That is not what a confidence interval is intended to show, though I have no idea what you are looking to produce.
A confidence interval can be used in two ways. First, to show a confidence interval around the curve. That is, after the fit has been done, where do I think the underlying curve really lies? That presumes the model is correct, AND that includes assumptions about normality of the error structure, and homoscedasticity. Those assumptions fail a bit here, since your noise is clearly not normal, but highly skewed, and is not of uniform variance.
The second way to interpret a confidence interval is to think of where would a point lie around the curve? This effectively adds back in the variance of the data around the curve. So the confidence interval for an observation is always wider than a confidence interval for the prediction of the curve itself.
I don't think I said that very well. Perhaps the help for predint does it better:
If 'INTOPT' is 'functional', the bounds measure the uncertainty
in estimating the curve. If 'INTOPT' is 'observation', the bounds
are wider to represent the addition uncertainty in predicting a
new Y value (the curve plus random noise).
'observation' is the default.
Now, you have a relatively large amount of data, so the functional intervals will be pretty tight.
pred = fittedmodel(0:400);
ci_O = predint(fittedmodel,0:400,0.95,'observation');
ci_F = predint(fittedmodel,0:400,0.95,'func');
plot(0:400,[pred,ci_O,ci_F]);
There are 5 curves plotted there.
In the very middle, there is the curve itself.
Very tightly bound to the curve are the functional confidence intervals. Again, a LOT of data, so they are very tight. In reality, the assumtions in the modeling are totally incorrect, so these confidence intervals are pretty much bogus, garbage predictions. You had no real a-prioi reason to choose a two term exponential model, and there is a huge amount of lack-of fit and noise in the problem.
Finally, the wide intervals at the top and bottom have the noise variance added back, ASSUMING A UNIFORM, HOMOSCEDASTIC NOISE STRUCTURE. Those are the 'observation' intervals. They try to tell you, if you just sampled a new data point, where do I think the point would lie?
Drach
Drach am 13 Jun. 2020
Thanks you so much for taking the effort, I finaly got the difference between 'functional' and 'observation' and I see that this makes no sens with my noisy data.
About my data: The x axis gives me measurement results, while the y axis shows the SNR for every measurement result.
My intend with the confidence interval was to show from which x value going upwards the SNR for > 95 % of the measurement values is higher than 1. I solved this now without a fit or confidence interval: I plot the ratio of measurement data with SNR > 1 to all measurment points for a small interval step by step over the x axis. This gives me an increasing curve and I can read at a second y axis at 95 % what my corresponding x value is.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Linear and Nonlinear Regression finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by