Problem with p-value calculation for Spearman's rho correlation analysis

20 Ansichten (letzte 30 Tage)
Manuel Eichenlaub am 18 Sep. 2018
Kommentiert: Brendan Hamm am 23 Okt. 2018
Hello,
I am using the corr function to calculate correlation coefficients between variables of interest. Specifically, I am using the Spearman's rho method for correlation analysis. While doing this, I noticed something odd when looking at the calculated p-values:
Here is an example that illustrates the problem:
x = [2 7 9 5 4 1 3 8 10 6 11]';
y = [1 2 3 4 5 6 7 8 9 10 11;
1 1 3 4 5 6 7 8 9 10 11]';
[r,p]=corr(x,y,'Type','Spearman')
[r,p]=corr(x,y(:,1),'Type','Spearman')
The output is the following:
r =
0.4455 0.4237
p =
0.1697 0.1941
r =
0.4455
p =
0.1728
One can clearly see that the calculated p-values for the correlation between x and the fist column of y are different, depending on whether y is passed as the full matrix or just its first column.
I found out that this is a specific problem of the Spearman's correlation, as this method uses ranks for calculation. It essentially comes down to the fact that the p-values are calculated differently, depending on the existence of rank ties in the data. In the first function call, the method in the case of ties is used for both columns of y, even though there are only ties in the second column. In the second function call there are no ties in the first column of y, so a different method is used, yielding a different p-value (and from my understanding the correct one) for the correlation between x and the fist column of y.
Would it be possible for anyone to resolve this issue?
Best wishes,
Manuel
0 Kommentare-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Brendan Hamm am 18 Sep. 2018
I will notify the development team of your request. In the meantime, if anyone is experiencing the same issue, it may be worth explaining what is happening.
The way this is working is that in the presence of ties (for any column) then the corr function will use an Edgeworth expansion for the tail probabilities, this is documented as Algorithm AS89 . In the event that there are no ties, then MATLAB will use a t-approximation for the p-value as described in the Wiki article.
3 Kommentare1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden
Manuel Eichenlaub am 10 Okt. 2018
Is there any news regarding this issue? It has been 3 weeks.
Brendan Hamm am 23 Okt. 2018
I have notified the development team, pointing them to this question and they will review it and make a decision accordingly. I typically will not receive any notification regarding this.If you would like to hear any information back then I would suggest you open a technical support case .

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Descriptive Statistics finden Sie in Help Center und File Exchange

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by