Why I am unable to recreate curve fitting equation?
6 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I used MATLAB's inbuilt Curve Fitting Tool to fit following data:
x = [5 10 15 20 25 30 35 40 45 50]
and
y = [140 88 62 49 38 31 25 20 17 12]
I used two term exponential equation to generate fitting curve.
Following results were obtained:
General model Exp2:
f(x) = a*exp(b*x) + c*exp(d*x)
where x is normalized by mean 27.5 and std 15.14
Coefficients (with 95% confidence bounds):
a = 0.2758 (-0.1069, 0.6585)
b = -3.521 (-4.346, -2.696)
c = 34.03 (32.91, 35.15)
d = -0.6419 (-0.6992, -0.5846)
Goodness of fit:
SSE: 3.376
R-square: 0.9998
Adjusted R-square: 0.9996
RMSE: 0.7501
I recreated the equation of the curve using the same coefficients a = 0.2758, b = -3.521, c = 34.03 and d = -0.6419 in equation y1 = a*exp(b*x) + c*exp(d*x) and run it in the command window I get following out put of y1 :
y1 =
1.374022395352651
0.055478622562340
0.002240049060184
0.000090446005331
0.000003651919963
0.000000147452830
0.000000005953673
0.000000000240390
0.000000000009706
0.000000000000392
I am unable to understand why there is such a big mismatch in y1 and y ?
1 Kommentar
John D'Errico
am 1 Jun. 2020
Bearbeitet: John D'Errico
am 1 Jun. 2020
READ MY ANSWER TO THE END. It solves your problem, showing what you did, and reproducing the garbage numbers you got for y1, pretty much exactly.
Essentially, the problem you have in producing y1 is IF you do a fit using a normalized version of x in fit, then you need to build that normaization into your model. It is now part of your model.
The proof is that when I did the fit using the normalized version of x, it produces the same coefficients you got. So the problem is NOT in the fit itself, because i can then predict y pretty accurately, even if I use only the approximate set of coefficients as did you.
However, when you then predict the model, you need to use the narmalization used for the fit!
The problem is NOT how you estimated the model.
Antworten (2)
Star Strider
am 31 Mai 2020
The Code —
f = @(b,x) b(1).*exp(b(2).*x) + b(3).*exp(b(4).*x);
x = [5 10 15 20 25 30 35 40 45 50];
y = [140 88 62 49 38 31 25 20 17 12];
B = fminsearch(@(b) norm(y - f(b,x)), rand(4,1));
figure
plot(x, y, 'p')
hold on
plot(x, f(B,x), '-r')
hold off
grid
text(27, 100, sprintf('a = %7.3f\nb = %7.3f\nc = %7.3f\nd = %7.3f',B))
The Plot —
3 Kommentare
Star Strider
am 31 Mai 2020
I do not have the Curve Fitting Toolbox because the others that I have (Statitistics and Machine Learning Toolbox, Optimization Toolbox) plus others, and my own mathematical and programming experience do everything I want.
Other than that, we know only what you said you did, not what you actually did. It is not possible to determine the problem. (I reversed the two vectors and my function still ran without error. The fit was appropriate and the parameters were different, however they did not even closely resemble the parameters you previously reported, eliminating that as a source of the problem.)
Alex Sha
am 1 Jun. 2020
The results below seem to be more better:
Root of Mean Square Error (RMSE): 0.581032486515321
Sum of Squared Residual: 3.37598750386176
Correlation Coef. (R): 0.999881471978295
R-Square: 0.999762958005483
Parameter Best Estimate
---------- -------------
a 165.438803583087
b -0.232608352963485
c 109.208242653762
d -0.0424020153019308
John D'Errico
am 1 Jun. 2020
Bearbeitet: John D'Errico
am 1 Jun. 2020
I had to play with this for a while, because my first assumption was that you were using the wrong coefficients. In fact, while that costs you some, it is not what destroyed your results. That can down to forgetting to use the normalized variable x in your computation. You CANNOT use a normalized x in the fit, but then not use the same normalization to predict y.
In fact, using 4 digit approximations is a classic problem. People think that a 4 digit approximation to a coefficient is the coefficient. It is not. Just because you see the number reported to 4 significant digits, it does not stop there.
My initial assumption is the problem you had is NOT the software used to estimate the model, but nothing more than using the wrong coefficients.
format long g
>> mu = mean(x)
mu =
27.5
>> S = std(x)
S =
15.1382517704875
>> xhat = (x - mu)/S;
>> mdl = fit(xhat',y','exp2')
mdl =
General model Exp2:
mdl(x) = a*exp(b*x) + c*exp(d*x)
Coefficients (with 95% confidence bounds):
a = 0.2758 (-0.1069, 0.6585)
b = -3.521 (-4.346, -2.696)
c = 34.03 (32.91, 35.15)
d = -0.6419 (-0.6992, -0.5846)
As you should see, these are exactly the same set of coefficients you claim to have gotten.
plot(x,mdl(xhat))
hold on
plot(x,y,'ro')
Again, those 4 significant digit approximations to the coefficients are NOT the coefficients. You always need to use the true values as estimated.
mdl.a
ans =
0.275764176155343
>> mdl.b
ans =
-3.52133177155047
>> mdl.c
ans =
34.0286408362909
>> mdl.d
ans =
-0.641895329124188
You need to use the full precision. And make sure you use the correct value for the normalizations too. Don't use a 4 digit approximation. If you do, then expect to get what is potentially random crapola.
I would have gotten the correct result also had I done this as:
ypred = mdl.a*exp(mdl.b*(x - mu)/S) + mdl.c*exp(mdl.d*(x - mu)/S);
In fact, this will give exactly the same predictions, as I claim it must. This I can verify.
norm(ypred' - mdl((x - mu)/S))
ans =
1.4210854715202e-14
To prove the problem is, in the end, just 4 digit approximations to the coefficients, let me now try doing exactly that.
aappr = 0.2758;
bappr = -3.521;
cappr = 34.03;
dappr = -0.6419;
Sappr = 15.14;
muappr = 27.5;
yappr = aappr*exp(bappr*(x - muappr)/Sappr) + cappr*exp(dappr*(x - muappr)/Sappr);
However, when I plot that 4 digit approximation, I still get something that is not too far off. However, As you see, I got exactly the correct fit, because I did my fit the same way you did, by fitting using a normalized version of x.
Now, let me compute the prediction, but NOT using the normalized version of x. After all, you computed it using a NORMALIZED X!!!!!!!
ywrong = aappr*exp(bappr*x) + cappr*exp(dappr*x);
When you computed y1, you did not use the normalized version of the vector x. Now, let me show the results. LOOK CAREFULLY AT THE COLUMNS.
format short g
[y',ypred',yappr',ywrong']
ans =
140 140.05 139.99 1.374
88 87.627 87.612 0.055479
62 62.864 62.861 0.00224
49 48.347 48.347 9.0446e-05
38 38.327 38.328 3.6519e-06
31 30.76 30.762 1.4745e-07
25 24.807 24.809 5.9537e-09
20 20.044 20.046 2.4039e-10
17 16.207 16.209 9.7062e-12
12 13.109 13.11 3.919e-13
Column 1 is the real data.
Column 2 are my predictions using the correct set of coefficients.
Column 3 is my predictions using the incorrect set of coefficients. As you can see, while it is incorrect, the differential is not as large as what you reported. In fact, surprsingly, it is not that far off. There are relatively small errors, but not huge errors.
Column 4 is what happens if you use the UNNORMALIZED VERSION OF X.
0 Kommentare
Siehe auch
Kategorien
Mehr zu Get Started with Curve Fitting Toolbox finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!