# What is the kstest2 in MATLAB doing to compute the empirical distribution function?

9 Ansichten (letzte 30 Tage)
Darcy Cordell am 22 Nov. 2018
Bearbeitet: Darcy Cordell am 22 Nov. 2018
According to Wikipeda, to compute the 2-sample Kolmogorov-Smirnov test, you first compute the empirical cumulative distribution function (ECDF) for both samples and then find the maximum difference between the ECDFs. MATLAB includes a built-in function called "ecdf", but the built-in kstest2 does not use it. Instead it uses this:
%
% Calculate F1(x) and F2(x), the empirical (i.e., sample) CDFs.
%
binEdges = [-inf ; sort([x1;x2]) ; inf];
binCounts1 = histc (x1 , binEdges, 1);
binCounts2 = histc (x2 , binEdges, 1);
sumCounts1 = cumsum(binCounts1)./sum(binCounts1);
sumCounts2 = cumsum(binCounts2)./sum(binCounts2);
sampleCDF1 = sumCounts1(1:end-1);
sampleCDF2 = sumCounts2(1:end-1);
where and are your two samples.
Note that this does not give the same result as or . For example, if and both have length , then sampleCDF1 and sampleCDF2 will have a length of whereas and both have length .
It seems really strange to me that the variable used to compute the ecdf is "binCounts1" and binCounts2" which are just vectors of 1s and 0s. What happened to the data?
Can someone explain what MATLAB is doing here and why they don't use the ecdf() function?
##### 0 Kommentare-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

### Kategorien

Mehr zu Noncentral t Distribution finden Sie in Help Center und File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by