Compare probability density functions of 2 vectors

Question

T.C. am 8 Nov. 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2164953-compare-probability-density-functions-of-2-vectors

Bearbeitet: Bruno Luong am 10 Nov. 2024

I have two vectors v and w of different legths and with positive elements. Each element of both vectors can be considered to be drawn according to some probability density function. I would need to check that the pdf of both is (more or less) the same, how could I do this in Matlab? I was thinking to plot the cdf of both vectors and see how much they differed, but maybe there's a better way to do that.

2 Kommentare
Keine anzeigenKeine ausblenden

Jeff Miller am 9 Nov. 2024

Did you already check whether the means/medians are the same (e.g., ttest2 or ranksum) and also check the variances (e.g., vartest2, ansaribradley)? If you can reject equality of either of those then of course you can conclude the vectors come from populations with different pdfs.

You can also use kstest2 to check the full pdfs, but keep in mind that this test has very low power unless the pdfs are very different or the vectors are very long. (That means the data will rarely allow you to conclude the pdf's are different even when they are. Power would be much higher for the tests of means and variances.)

William Rose am 10 Nov. 2024

Jeff Miller, good idea.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Bruno Luong am 8 Nov. 2024

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2164953-compare-probability-density-functions-of-2-vectors#answer_1542963

Use histcounts to compute pdfs then compare them

2 Kommentare
Keine anzeigenKeine ausblenden

Bruno Luong am 9 Nov. 2024

Bearbeitet: Bruno Luong am 10 Nov. 2024

In MATLAB Online öffnen

If seems cdf comparison is better than pdf. There os a better separation. Both are normalized to 1 (maximum difference).

As Jeff Miller has suggested you can also derive mean, median, standard deviation, high order moments, skewness, kurtosis, etc

Ntest = 100;

for i = 1:Ntest

x1=rand(1,500);

x2=rand(1,400);

x3=rand(1,600)*1.2;

[dpdf12(i),dcdf12(i)] = diffpdf(x1,x2);

[dpdf13(i),dcdf13(i)] = diffpdf(x1,x3);

end

i = 1:Ntest;

subplot(2,1,1)

plot(i, dpdf12, 'b', i, dpdf13, 'r')

ylabel('dpdf')

legend('x1-x2','x1-x3')

subplot(2,1,2)

plot(i, dcdf12, 'b', i, dcdf13, 'r')

ylabel('dcdf')

legend('x1-x2','x1-x3')

function [dpdf,dcdf] = diffpdf(x1,x2)

x = [x1,x2];

[a,b] = bounds(x);

edges = linspace(a,b,33);

pdf1 = histcounts(x1,edges,'Normalization','pdf');

pdf2 = histcounts(x2,edges,'Normalization','pdf');

dx = mean(diff(edges));

dpdf = norm(pdf1-pdf2,1)*dx/2; % divide by 2 so as max(dpdf) is 1

cdf1 = cumsum(pdf1)*dx; % histcounts(x1,edges,'Normalization','cdf');

cdf2 = cumsum(pdf2)*dx; % histcounts(x2,edges,'Normalization','cdf');

dcdf = norm(cdf1-cdf2, Inf);

end

Bruno Luong am 10 Nov. 2024

In MATLAB Online öffnen

Another exmple to show the dindicator when comppare RAND and RANDN

Ntest = 100;

for i = 1:Ntest

x1=rand(1,500);

x2=rand(1,400);

x3=randn(1,600);

[dpdf12(i),dcdf12(i)] = diffpdf(x1,x2);

[dpdf13(i),dcdf13(i)] = diffpdf(x1,x3);

end

i = 1:Ntest;

subplot(2,1,1)

plot(i, dpdf12, 'b', i, dpdf13, 'r')

ylabel('dpdf')

legend('x1-x2','x1-x3')

subplot(2,1,2)

plot(i, dcdf12, 'b', i, dcdf13, 'r')

ylabel('dcdf')

legend('x1-x2','x1-x3')

function [dpdf,dcdf] = diffpdf(x1,x2)

x = [x1,x2];

[a,b] = bounds(x);

edges = linspace(a,b,33);

pdf1 = histcounts(x1,edges,'Normalization','pdf');

pdf2 = histcounts(x2,edges,'Normalization','pdf');

dx = mean(diff(edges));

dpdf = norm(pdf1-pdf2,1)*dx/2; % divide by 2 so as max(dpdf) is 1

cdf1 = cumsum(pdf1)*dx; % histcounts(x1,edges,'Normalization','cdf');

cdf2 = cumsum(pdf2)*dx; % histcounts(x2,edges,'Normalization','cdf');

dcdf = norm(cdf1-cdf2, Inf);

end

Melden Sie sich an, um zu kommentieren.

Answer 2

William Rose am 8 Nov. 2024

2
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2164953-compare-probability-density-functions-of-2-vectors#answer_1542968

Bearbeitet: William Rose am 9 Nov. 2024

In MATLAB Online öffnen

@T.C.,

[Edit: fix error in my code below.]

The 2 sample Kolmogorov-Smirnov test.

x1=rand(1,50);
x2=rand(1,40);
[h,p]=kstest2(x1,x2);
if h
    fprintf('prob(x1,x2) from same distribution=%.3f.',p);
    fprintf('Accept the null hypothesis that they are are from same distribution.');
else
    fprintf('prob(x1,x2) from same distribution=%.3f.',p);
    fprintf('Reject the null hypothesis that they are are from same distribution.');
end
prob(x1,x2) from same distribution=0.503.
Accept the null hypothesis that they are are from same distribution.

OK

10 Kommentare
8 ältere Kommentare anzeigen8 ältere Kommentare ausblenden

Paul am 9 Nov. 2024

Bearbeitet: Paul am 9 Nov. 2024

In MATLAB Online öffnen

@T.C., @William Rose,

The first part of my comment above was intended to point out that the if test in the code in the answer is inconsistent with the doc page kstest2. If I'm reading that doc correctly, that code should be:

if h
%    fprintf('prob(x1,x2) from same distribution=%.3f.',p);
%    fprintf('Accept the null hypothesis that they are are from same distribution.');
    fprintf('REJECT the null hypothesis that x1 and x2 are from same distribution.');
else
%    fprintf('prob(x1,x2) from same distribution=%.3f.',p);
%    fprintf('Accept the null hypothesis that they are are from same distribution.');
%   Maybe this should say: "Accept the alternative hypothesis ... " 
    fprintf('Cannot reject the null hypothesis that x1 and x2 are from same distribution.');
end

Disclaimer: I don't have any expertise in statistical testing, just going by what the doc page states.

@Bruno Luong

By "p seem to vary quite a bit from run to run." do you mean the following?

N = 1e2;

p = nan(1,N);

rng(100);

for ii = 1:N

x1=rand(1,500);

x2=rand(1,400);

[h,p(ii)]=kstest2(x1,x2);

end

plot(p)

If so, maybe @Jeff Miller's answer to this question regarding variability of the p-value would be of interest, albeit for a different test (maybe the same concept applies here?).

Bruno Luong am 9 Nov. 2024

@Paul "By "p seem to vary quite a bit from run to run." do you mean the following?"

Yes your question seems arrive at the samme time than I edit my post.

Paul am 10 Nov. 2024

In MATLAB Online öffnen

Hi William,

After the edit, I still think the logic in the code is not correct. According to kstest2, if h = 1, the null hypothesis should be rejected. Consider what happens if we change the distribution for x2

x1=rand(1,50);
%x2=rand(1,40);
x2=randn(1,40);
[h,p]=kstest2(x1,x2);
if h
    fprintf('prob(x1,x2) from same distribution=%.3f.',p);
    fprintf('Accept the null hypothesis that they are are from same distribution.');
else
    fprintf('prob(x1,x2) from same distribution=%.3f.',p);
    fprintf('Reject the null hypothesis that they are are from same distribution.');
end
prob(x1,x2) from same distribution=0.000.
Accept the null hypothesis that they are are from same distribution.

Melden Sie sich an, um zu kommentieren.

Compare probability density functions of 2 vectors

2 Kommentare
Keine anzeigenKeine ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (1)

10 Kommentare
8 ältere Kommentare anzeigen8 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Compare probability density functions of 2 vectors

2 Kommentare Keine anzeigenKeine ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (1)

10 Kommentare 8 ältere Kommentare anzeigen8 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

2 Kommentare
Keine anzeigenKeine ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

10 Kommentare
8 ältere Kommentare anzeigen8 ältere Kommentare ausblenden