Use Kernel Density Estimation to get the probability of a new observation
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Tatiana Lopez
am 1 Apr. 2015
Kommentiert: Tatiana Lopez
am 1 Apr. 2015
Hi all, I would like to use KDE to fit a 1d variable and then getting the probability of a new observation given the fitted model using pdf( kde, new observation ). I used fitdist with the 'Kernel' option and then plotted the corresponding pdf. My question is, shouln't the pdf be normalized? How can I get the probability of a new observation then?
Thank you
x = [15.276,15.277,15.279,15.28,15.281,15.186,15.187,15.188,15.19,15.191,15.193,15.194,15.195,15.197,15.198,15.199,15.201,15.202,15.204,15.205,15.206,15.208,15.209,15.211,15.212,15.213,15.215,15.216,15.217,15.219,15.22,15.222,15.223,15.224,15.226,15.227,15.229,15.23,15.231,15.233,15.234,15.236,15.237,15.238,15.24,15.241,15.243,15.244,15.245,15.247,15.248,15.249,15.251,15.252,15.254,15.255,15.283,15.284,15.286,15.287,15.288,15.29,15.291,15.172,15.173,15.174,15.176,15.177,15.179,15.18,15.181,15.183,15.184,15.293,15.294,15.167,15.169,15.17,15.295,15.297,15.298,15.299,15.301,15.302,15.304,15.305,15.306,15.308,15.309,15.311,15.312,15.313,15.315,15.316,15.145,15.147,15.148,15.149,15.151,15.152,15.154,15.155,15.156,15.158,15.159,15.161,15.162,15.163,15.165,15.166,15.293,15.294,15.167,15.317,15.319,15.32,15.322,15.323,15.324,15.326,15.327,15.329,15.33,15.331,15.333,15.334,15.336,15.337,15.338,15.34,15.341,15.343,15.344,15.345,15.347,15.348,15.349,15.351,15.352,15.354,15.355,15.356,15.358,15.359,15.361,15.362,15.363,15.365,15.366,15.368]';
t = max( min(x) - 0.1, 0 ) : 0.01 : min( max(x) + 0.1, 24);
kde = fitdist(x,'Kernel','Kernel','normal','Support','positive')
pdf_kde = pdf( kde, t );
plot(t, pdf_kde, 'g-'), hold on;
0 Kommentare
Akzeptierte Antwort
Brendan Hamm
am 1 Apr. 2015
The pdf integrates to be 1, so I am not sure why you think it needs to be normalized? Furthermore, this gives you a continuous distribution and therefore P(x|distribution) = 0 for all x, as points have no mass. the pdf is NOT telling you a probability, but rather just the probability density function evaluation at this point. Probability is the area under the curve between two values (the limits of the integral). You could ask, "what is the probability a new observation is less than or equal to 15.2?" and the answer would be found from the cdf:
cdf(kde,15.2) % P(x <= 15.2)
ans =
0.2727
or, "What is the probability the sample will be less than 15.4 but greater than 15.2?":
cdf(kde,15.4) - cdf(kde,15.2)% P(15.2 < x < 15.4) = P(x < 15.4) - P(x < 15.2);
ans =
0.7105
I hope this helps clear this up.
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Regression finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!