MATLAB Answers

Curve Fitting and Distribution Fitting

3 views (last 30 days)
hoda dn
hoda dn on 11 Sep 2020
Commented: hoda dn on 16 Sep 2020
Hello. I have three sets of data and an equation based on it. How can I fit a curve on it and get the coefficients(c1,c2,c3,c4,c5,c6,c7)?and estimate root mean square errors (RMSE) and correlation coefficient (R2)
ye = [0.166393443
0.206557377
0.25204918
0.280737705
0.306967213
0.323770492
0.352868852
0.378278689
0.412704918
0.432786885
0.44057377
0.456147541
0.475819672];
Q = [0.040755669
0.067112106
0.10495312
0.132849355
0.160791371
0.179781479
0.214383688
0.246026104
0.290251909
0.316344405
0.326446015
0.346526208
0.371457751];
z=[0
0
0
0
0
0
0
0
0
0
0
0
0];
1.6<=c7<=3
Q= ((c1-(c2*(ye+z)))*(asin(ye)^c3)+c4*((ye+z)^c5)*ye+(z^c6))^c7;

  0 Comments

Sign in to comment.

Accepted Answer

Star Strider
Star Strider on 11 Sep 2020
Try this:
yez = [ye, z];
Qf = @(c1,c2,c3,c4,c5,c6,c7,yez) ((c1-(c2.*(yez(:,1)+yez(:,2)))).*(asin(yez(:,1)).^c3)+c4.*((yez(:,1)+yez(:,2)).^c5).*yez(:,1)+(yez(:,2).^c6)).^c7;
Qfcn = @(cv,yez) Qf(cv(1),cv(2),cv(3),cv(4),cv(5),cv(6),cv(7),yez);
B = lsqcurvefit(Qfcn, [rand(6,1);2], yez, Q, [-Inf(1,6) 1.6], [Inf(1,6) 3])
The estimated parameters do not appear to be unique. This produces quite different results in different runs.

  4 Comments

Show 1 older comment
Star Strider
Star Strider on 11 Sep 2020
My pleasure!
how can i find root mean square errors (RMSE) and correlation coefficient (R2)?’
Using my previous code (and plotting the data and fitted curve):
Qest = Qfcn(B,yez);
RMSE = sqrt(mean((Q - Qest).^2)); % Compute RMSE
OLSCF = @(b) sum((Q-Qfcn(b,yez)).^2);
SStot = sum((Q - mean(Q)).^2);
SSres = OLSCF(B);
Rsq = 1 - (SSres/SStot); % Compute R-squared
figure
stem3(ye, z, Q, 'p')
hold on
plot3(ye, z, Qest, '-r')
hold off
grid on
xlabel('ye')
ylabel('z')
zlabel('Q')
Isn't there a way to not have different answers in each run?
Decreasing several of the tolerances in the options structure might help with that, although estimating 7 parameters with 13 data might contribute to that problem. See Tolerances and Stopping Criteria (and similar references) for an extended explanation.
Note that some of the parameters may not be necessary in the model. I did not check that, however using nlparci would give you that information. The Jacobian and the residuals are necesaary to do that, so use this lsqcurvefit syntax to get the necessary information:
(Clicking on that link will show the appropriate documentation section.)
It may then be necessary to re-write the model, eliminating that parameter and the term associated with it, if the confidence limits for a particular parameter demonstrate that it is not significantly different from 0 (the confidence limits for a particular parameter will have opposite signs in that situation).
.
hoda dn
hoda dn on 11 Sep 2020
thanks alot for your response.I used less data to shorten my questions. Otherwise, 51 data from all three parameters are available.
Star Strider
Star Strider on 11 Sep 2020
As always, my pleasure!
No worries! You can always save your data to a .mat file, then upload it here. Then, we have access to all of it. (There are size limitations for uploaded files, however even large data sets are compatible with MATLAB Answers.)

Sign in to comment.

More Answers (2)

John D'Errico
John D'Errico on 11 Sep 2020
You have what, 13 data points? A model with many exponents in it? A model that is surely based on no real physical meaning of the terms in the model?
Expect numerical problems. You will need good starting values.
Worse, z == 0. z is IDENTICALLY zero. Yet you then expect to estimate the power z, z^c6? SIGH.
I can confindently predict that c6 = 17. Oh, wait, I remember, all parameters have the value 42, at least when any possible value will suffice.
The point is, you cannot intelligently estimate these parameters, because there is no unique solution. Any value of c6 will work.
That means your model reduces to
Q = ((c1-c2*ye)*(asin(ye)^c3)+c4*ye^(c5+1))^c7
It is still a mess of parameters put there just to get enough flexibility to fit your data.
So now, lets plot your data, something I should have done before anything else.
plot(ye,Q,'o-')
You are kidding me, right? A simple low order polynomial model is entirely adequate. Is there sufficient information content in those 13 data points to estimate the parameters tou wish to fit? From somewhere in the distance, you hear a deep, heartfelt sigh, even a groan. Is there any physical reason why you think this model is appropriate, that those parameters have any physical meaning in context?
If you prefer, a simple power curve will do quite well.
>> mdl = fittype('a + b*abs(ye-c)^d','indep','ye')
mdl =
General model:
mdl(a,b,c,d,ye) = a + b*abs(ye-c)^d
>> fittedmodel = fit(ye,Q,mdl,'start',[.01,.1,-.1,1.3])
fittedmodel =
General model:
fittedmodel(ye) = a + b*abs(ye-c)^d
Coefficients (with 95% confidence bounds):
a = 0.04029 (0.03285, 0.04772)
b = 1.508 (1.45, 1.566)
c = 0.1636 (0.1464, 0.1808)
d = 1.297 (1.227, 1.366)
>> plot(fittedmodel)
>> hold on
>> plot(ye,Q,'o-')
And that fits quite well, with a far simpler model.

  3 Comments

Alex Sha
Alex Sha on 11 Sep 2020
Hi, as mentioned above by John, since all values of z are 0, the fit function reduced to as:
Q = ((c1-c2*ye)*(asin(ye)^c3)+c4*ye^(c5+1))^c7
For this function, the unique global solution will be
Root of Mean Square Error (RMSE): 2.43841979977219E-6
Sum of Squared Residual: 7.72965845589736E-11
Correlation Coef. (R): 0.999999999735994
R-Square: 0.999999999471987
Parameter Best Estimate
---------- -------------
c1 2.62679652185812
c2 5.53334907279052
c3 1.58348940809572
c4 3.75109432653593
c5 1.60837445158278
c7 1.60000000000427
There are some local solution, for example, one of them looks like below
Root of Mean Square Error (RMSE): 3.5571727535936E-6
Sum of Squared Residual: 1.64495213985813E-10
Correlation Coef. (R): 0.99999999943834
R-Square: 0.999999998876679
Parameter Best Estimate
---------- -------------
c1 2.6236423827441
c2 1.91860164829432
c3 1.58458040806969
c4 -1.40108824401991
c5 4.44905951896927
c7 1.60000490291755
hoda dn
hoda dn on 11 Sep 2020
hello.I used less data to shorten my questions. Otherwise, 51 data from all three parameters are available.
ye=[0.166393443
0.206557377
0.25204918
0.280737705
0.306967213
0.323770492
0.352868852
0.378278689
0.412704918
0.432786885
0.44057377
0.456147541
0.475819672
0.129098361
0.183196721
0.192213115
0.209016393
0.227868852
0.264754098
0.301639344
0.032786885
0.063114754
0.079098361
0.089344262
0.126229508
0.13647541
0.149180328
0.170491803
0.185245902
0.199590164
0.21147541
0.014691521
0.020238585
0.025862799
0.031509604
0.035940668
0.044923322
0.054482866
0.0717356
0.072919889
0.018024148
0.024373383
0.03174389
0.035398208
0.017063413
0.020637735
0.028203846
0.033120028
0.035555281
];
Q = [0.040755669
0.067112106
0.10495312
0.132849355
0.160791371
0.179781479
0.214383688
0.246026104
0.290251909
0.316344405
0.326446015
0.346526208
0.371457751
0.069843006
0.121450028
0.130852824
0.14882695
0.169537878
0.21088739
0.251709087
0.009151561
0.025042029
0.035476157
0.042806716
0.072740217
0.081892097
0.093653673
0.114235514
0.128963324
0.143531782
0.155708172
0.002992409
0.004835202
0.006977514
0.009371635
0.011403503
0.015891643
0.021148234
0.031683247
0.032449372
0.003571359
0.005565609
0.008182482
0.009580949
0.003294144
0.004359568
0.006889221
0.008701893
0.009642407
];
z=[0
0
0
0
0
0
0
0
0
0
0
0
0
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.33
0.33
0.33
0.33
0.33
0.33
0.33
0.33
0.33
0.33
0.33
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.66
0.66
0.66
0.66
0.66
0.66
0.66
0.66
0.66
];
Alex Sha
Alex Sha on 12 Sep 2020
Hi, Hoda, for your all 51 data points, with:
1.6<=c7<=3
Q= ((c1-(c2*(ye+z)))*(asin(ye)^c3)+c4*((ye+z)^c5)*ye+(z^c6))^c7;
there are many local trap solutions. An unique global solution will be:
Root of Mean Square Error (RMSE): 0.00131207824399577
Sum of Squared Residual: 8.43559165999837E-5
Correlation Coef. (R): 0.999921364930492
R-Square: 0.999842736044457
Parameter Best Estimate
---------- -------------
c1 -0.128678458232134
c2 0.849061931670286
c3 -0.0162023653012152
c4 2.15335768995412
c5 -0.0699568937237459
c6 0.739734903785799
c7 1.6

Sign in to comment.


hoda dn
hoda dn on 12 Sep 2020
thanks alot. Can you please send me the code you wrote?

  2 Comments

Alex Sha
Alex Sha on 13 Sep 2020
Hi, Hoda, your problem looks like a general curve fitting issue with one constrained condition, however, it seems to be a prety hard job for existing fitting function in Matlab, such as lsqcurvefit, fmincon, cftool, etc., those functions are all influenced heavy by the guessing initial start-values. Although there is a Global Optimization toolbox in Matlab, its performance is far away as expected. The results shown you above are obtained by 1stOpt, one of software package with global optimization function, easy for using without required to guess initial start values, the code are as below:
Parameter c(6),1.6<=c7<=3;
Variable ye,z,q;
Function Q= ((c1-(c2*(ye+z)))*(arcsin(ye)^c3)+c4*((ye+z)^c5)*ye+(z^c6))^c7;
Data;
ye=[0.166393443,0.206557377,0.25204918,0.280737705,0.306967213,0.323770492,0.352868852,0.378278689,0.412704918,0.432786885,0.44057377,0.456147541,0.475819672,0.129098361,0.183196721,0.192213115,0.209016393,0.227868852,0.264754098,0.301639344,0.032786885,0.063114754,0.079098361,0.089344262,0.126229508,0.13647541,0.149180328,0.170491803,0.185245902,0.199590164,0.21147541,0.014691521,0.020238585,0.025862799,0.031509604,0.035940668,0.044923322,0.054482866,0.0717356,0.072919889,0.018024148,0.024373383,0.03174389,0.035398208,0.017063413,0.020637735,0.028203846,0.033120028,0.035555281];
z=[0,0,0,0,0,0,0,0,0,0,0,0,0,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.33,0.33,0.33,0.33,0.33,0.33,0.33,0.33,0.33,0.33,0.33,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.66,0.66,0.66,0.66,0.66,0.66,0.66,0.66,0.66];
q=[0.040755669,0.067112106,0.10495312,0.132849355,0.160791371,0.179781479,0.214383688,0.246026104,0.290251909,0.316344405,0.326446015,0.346526208,0.371457751,0.069843006,0.121450028,0.130852824,0.14882695,0.169537878,0.21088739,0.251709087,0.009151561,0.025042029,0.035476157,0.042806716,0.072740217,0.081892097,0.093653673,0.114235514,0.128963324,0.143531782,0.155708172,0.002992409,0.004835202,0.006977514,0.009371635,0.011403503,0.015891643,0.021148234,0.031683247,0.032449372,0.003571359,0.005565609,0.008182482,0.009580949,0.003294144,0.004359568,0.006889221,0.008701893,0.009642407];
Hope there will be some similar function in Matlab in the future.
hoda dn
hoda dn on 16 Sep 2020
hi, Alex Sha, thank you for your responce.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by