Automatically select the right number of bins (or combine the bins) for the expected frequencies in crosstab, in order to guarantee at least 5 elements per bin

4 Ansichten (letzte 30 Tage)

Sim am 23 Aug. 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2147474-automatically-select-the-right-number-of-bins-or-combine-the-bins-for-the-expected-frequencies-in

Kommentiert: Star Strider am 23 Aug. 2024

I have two observed datasets, "x" and "y", representing "future stock prices", and I want to compare the observed frequencies in bins of "x" and "y" against the other, through crosstab. To do so, I first need to place the elements of "x" and "y" into bins, by using the histcounts function. The resulting binned arrays, "cx" and "cy", are then compared to each other with a chi-square test, perfomed by crosstab. The chi-square test of independence is performed to determine if there is a significant association between the frequencies of "x" and "y" across the bins.

However, the chi-square test "is not valid for small samples, and if some of the counts (in the expected frequency) are less than five, you may need to combine some bins in the tails.". In the following example, several bins of the observed frequencies "cx" and "cy" have zero elements, and I do not know if they affect the expected frequencies calculated within/by crosstab.

Therefore, is there a way in crosstab to automatically select the right number of bins for the expected frequencies, or to combine them if some are empty, in order to guarantee at least 5 elements per bin?

rng default;  % for reproducibility
a = 0;
b = 100;
nb = 50;
% Create two log-normal distributed random datasets, "x" and "y' 
% (but we can use any randomly distributed data)
x = (b-a).*round(lognrnd(1,1,1000,1)) + a;
y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;
% Counts/frequency of "x" and "y"
cx = histcounts(x,'NumBins',nb);
cy = histcounts(y,'NumBins',nb);
[~,chi2,p] = crosstab(cx,cy)
chi2 = 476.6926
p = 2.9412e-28

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Antworten (1)

Star Strider am 23 Aug. 2024

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2147474-automatically-select-the-right-number-of-bins-or-combine-the-bins-for-the-expected-frequencies-in#answer_1504139

One option for small samples is to use the fishertest function.

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Sim am 23 Aug. 2024

Thanks @Star Strider :-)

Star Strider am 23 Aug. 2024

My pleasure!

Since your data are not normally distributed, friedman may be the most appropriate, since like other nonparametric distributions (explore them, such as ranksum as well), it only requires that the values to be compared share the same distribution, regardless of what that particular distribution is. I usually use it or other nonparametric analysis functions to compare lognormally-distributed data, since most of what I deal with (physiological data) are lognormally distributed.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Kategorien

AI and Statistics Statistics and Machine Learning Toolbox Probability Distributions Continuous Distributions Noncentral t Distribution

Mehr zu Noncentral t Distribution finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Automatically select the right number of bins (or combine the bins) for the expected frequencies in crosstab, in order to guarantee at least 5 elements per bin

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Automatically select the right number of bins (or combine the bins) for the expected frequencies in crosstab, in order to guarantee at least 5 elements per bin

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

6 Kommentare 4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden