Fit data to beta distribution

Question

0 Stimmen

I'm trying to fit beta distribution parameters to a [1X60] size vector (provided below as x) using betafit() funciton but the obtained parameters do not make sense (alpha=0.3840 beta= 23.4999), presenting a distribution which is far from representing the data. Nevertheless, by manually selecting the parameters (alpha=3 beta=3.5) I was managed to get a propper fit quite easily.

Is there any automated way to fit propper beta distribution parameters for this vector?

(I was able to simulate data from beta distribution and fit it successfully with this function, but from some reason the function is "not working" when applied to my data)

Thanks

The vector

x=[0.033280 0.049990 0.074000 0.082480 0.086050 0.082780 0.077200 0.067750 0.059840 0.053020 0.046540 0.041610 0.031640 0.027930 0.023980 0.021130 ...

0.018620 0.013620 0.011490 0.009930 0.008620 0.007670 0.005640 0.004970 0.004370 0.003880 0.003340 0.003230 0.002870 0.002580 0.002390 0.002180 ...

0.001490 0.001330 0.001160 0.001000 0.000920 0.000810 0.000730 0.000650 0.000570 0.000520 0.000450 0.000400 0.000370 0.000360 0.000310 0.000270

000290 0.000280 0.000260 0.000270 0.000240 0.000200 0.000160 0.000150 0.000130 0.000160 0.000820 0.001010];

The time vector

t=[0 0.0169491525423729 0.0338983050847458 0.0508474576271187 0.0677966101694915 0.0847457627118644 0.101694915254237 0.118644067796610 0.135593220338983 0.152542372881356 0.169491525423729 0.186440677966102 0.203389830508475 0.220338983050847 0.237288135593220 0.254237288135593 0.271186440677966 0.288135593220339 0.305084745762712 0.322033898305085 0.338983050847458 0.355932203389831 0.372881355932203 0.389830508474576 0.406779661016949 0.423728813559322 0.440677966101695 0.457627118644068 0.474576271186441 0.491525423728814 0.508474576271186 0.525423728813559 0.542372881355932 0.559322033898305 0.576271186440678 0.593220338983051 0.610169491525424 0.627118644067797 0.644067796610169 0.661016949152542 0.677966101694915 0.694915254237288 0.711864406779661 0.728813559322034 0.745762711864407 0.762711864406780 0.779661016949153 0.796610169491525 0.813559322033898 0.830508474576271 0.847457627118644 0.864406779661017 0.881355932203390 0.898305084745763 0.915254237288136 0.932203389830508 0.949152542372881 0.966101694915254 0.983050847457627 1];

Code line

betafit(x)

Output

ans =

0.3840 23.4999

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Noam Omer am 21 Feb. 2021

Thanks Jeff,

Perhaps it is not needed but the time vector was provided to give kind commenters like you the ability to plot it exactly as I see it (also to show that it is rangning between [0-1]). Also, I did mention to 0.000290.

You are right about the mean and SD correspond to my data but when you plot the data vs. the auto-fitted curve and again vs. the manual fit you realize that that somthing is wrong. Please examine this yourself using the code beloew and tell me what you think.

Notice that in this example I found another set of parameters (a=2 and b=15) which is better fited to the data but still far from the fitted parameters. For ploting I've added a scaling factor.

Thanks again,

Code

pd = betafit(x);

yAutoFit = betapdf(t,pd(1),pd(1));

yManFit = betapdf(t,2,15);

figure;

scalingFactor=70;

plot(t,x,t,yAutoFit/scalingFactor,t,yManFit/scalingFactor);

legend('Data','Auto-Fit','Manual-Fit');

Ive J am 21 Feb. 2021

In MATLAB Online öffnen

I believe there is a misunderstanding here. betafit gives you MLE parameters best fitted to your x vector and not t.

x_new = linspace(min(x), max(x), 100);
pd = betafit(x);
yAutoFit = betapdf(x_new, pd(1), pd(1));
yManFit = betapdf(x_new, 2, 15);
histogram(x)
line(x_new, yAutoFit, 'color', 'k', 'LineWidth', 1.5)
line(x_new, yManFit, 'color', 'r', 'LineWidth', 1.5)

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Jeff Miller am 22 Feb. 2021

In MATLAB Online öffnen

0 Stimmen

ECDF.jpg

There is information here about how to fit a univariate distribution from an empirical CDFs with MATLAB. Unfortunately, it is a bit complicated because the beta distribution belongs to the group of "Non-Location-Scale Families" discussed starting about half-way down the page.

Alternatively, you could use Cupid, with commands something like this:

x = x / sum(x);  % normalize for sum to 1
ECDF = cumsum(x);
myBeta = Beta(1,1);  % arbitrary starting parameters 
myBeta.EstPctile(t,ECDF)  % Estimate parameters from the ECDF, which produces estimates of 1.1889,7.985
myBeta.PlotDens

This produces the attached graph.

I don't think this is quite right, though, because I don't think the t values and probabilities correspond exactly. In particular, I doubt that you really have a probability of 0.033280 at exactly t=0, since that is the minimum for the beta distribution.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Noam Omer am 24 Feb. 2021

Thanks for introducing me the Cupid- it was what I needed.

Problem solved!

Melden Sie sich an, um zu kommentieren.

Answer 2

Noam Omer am 21 Feb. 2021

0 Stimmen

It seems like I did not explain myself well enough.

The time vector (t) indicates time intervals and x indicates the number of observations which ocurred during each time interval e.g., between times t(1)=0 and t(2)=0.0169491525423729 the relative number of observations was 3.33280% (in percentage out of the total number of observations).

Based on your answer I understand that the betafit may not be helpful for my task. So, assuming that I don't have the raw data but only the distribution of the data, do you know any funciton that might to the job?

Thanks

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Jeff Miller am 21 Feb. 2021

Sorry, I misunderstood the original question. I thought x gave data values and did not realize they gave bin probabilities, with t defining the bin boundaries. betafit would only be appropriate with the data values, not the bin probabilities.

Since you have only bins and their probabilties, your best bet is to estimate from the empirical CDF:

But if the x values represent observed bin probabilities, why don't they sum to 1?

Melden Sie sich an, um zu kommentieren.

Answer 3

Noam Omer am 22 Feb. 2021

0 Stimmen

Thanks because I manually excluded an outlier from measurement. To overcome this x could be normalized: x/sum(x).

Any suggestion how to estimate from the empirical CDF?

Thanks

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Fit data to beta distribution

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (2)

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

Fit data to beta distribution

3 Kommentare 1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (2)

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden