I have one dimensional data (~12500 entries) with values reaching from ~135 to ~1150, yielding 3 peaks (see attachment).
Now I want to create a histogram showing the data distribution, as well as a fitting curve and a goodness of fit (chi squared) test.
Thus far I got the following:
load Data.mat
bins = round(sqrt(length(Data))); % Number of bins
[f, x] = hist(Data,bins); % Calculate histogram
pd = fitdist(x','Kernel'); % Calculate fit
y = pdf(pd,f); % Calculate pdf
figure(1)
dx = diff(x(1:2));
bar(x, f/sum(f*dx)); % Normalizing and plotting
hold on
plot(x,y,'Linewidth',2) % Plot fit
hold off
[h,p] = chi2gof(x,'CDF',pd,'Alpha',0.05); % Chi squared test
While my chi2gof test yields expected results (h=0 ; p = 0.9983) my plot doesn't look to well:
The scale of the fitting curve sems to be way off for all 3 peaks. Additionally I'd expect the curve to get a lot closer to 0 for very low and very high values.
Thanks in advance for any suggestions on how to improve/fix my code!

2 Kommentare

Jeff Miller
Jeff Miller am 26 Feb. 2019
Regarding the scaling problem, what is sum(y*dx)?
Regarding the above-0 tails of the estimated pdf, do they drop off when you compute the density over a wider range, e.g. -500 to 2000? If so, the problem may be that the kernel bandwidth is not optimal. The default is to choose a good bandwidth to estimate a normal distribution. This looks much more like a mixture of three different distributions, so MATLAB's bandwidth guess may be pretty far from optimal.
Marcel Dorer
Marcel Dorer am 27 Feb. 2019
Bearbeitet: Marcel Dorer am 27 Feb. 2019
@Scaling problem
sum(y*dx) = 0.3353
I also attached the data file to the OP, in case it's needed :)
@ 0-tails
Thanks, I managed to achieve better tails by adjusting the bandwidth of the kernel!
pd = fitdist(x','Kernel','Width',75);

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Jeff Miller
Jeff Miller am 28 Feb. 2019

0 Stimmen

I think there are a couple of problems. Try this:
load Data.mat
bins = round(sqrt(length(Data))); % Number of bins
[f, x] = hist(Data,bins); % Calculate histogram
pd = fitdist(Data,'Kernel','Width',5); % Calculate fit
y = pdf(pd,x); % Calculate pdf of bin values
figure(1)
dx = diff(x(1:2));
bar(x, f/sum(f*dx)); % Normalizing and plotting
hold on
y = y / sum(y*dx);
plot(x,y,'Linewidth',2) % Plot fit
hold off
[h,p] = chi2gof(x,'CDF',pd,'Alpha',0.05); % Chi squared test

1 Kommentar

Marcel Dorer
Marcel Dorer am 28 Feb. 2019
Works perfectly! Thanks a lot!
I didn't work a lot with fitdist and pdf yet, I'm glad you were able to point out my mistakes :)

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by