Pearson correlation coefficient determination

Chanaka Navarathna (view profile)

on 11 Feb 2019
Latest activity Commented on by John D'Errico

on 13 Feb 2019

John D'Errico (view profile)

Hi,
I am trying to generate the Pearson correlation coefficient for A1 vs. A2, A1 vs. C and A2 vs. C. For all these I get 1.000 which is quite unrealistic. What am I doing wrong here? You can use these numbers to excute the function. Pearson_correlation(100000,10000,25000,1)
function [ Absorbance1,Absorbance2 ] = Pearson_correlation(I0,E1,E2,l)
% A = ECl beer-lambert law
% A = -log T = -log(It/Io)=log(I0)-log(It)
% A = Absorbance
%It = Intensity of Transmitted photons
%I0 = Initial Intensity of photons
C=[0 10 20 30 40 50]; %Concentrations
C1=C*1E-6; %Concentration unit conversion
SD=sqrt(I0); %Standard deviation of excitation photons
%This part of the code generates numbers for A1,A2 and the Pearson
%correlation coeffienct for A1 vs. A2, A1 vs. C and A2 vs.C
I0f=I0+randn(1,1).*SD; %excitation photon fluctuation
A1=E1*C1*l; %Calculation of absorbance
It11=I0f.*10.^(-A1); %Calculation of transmittance
SD21=sqrt(It11);
It12=It11+SD21.*randn(1,1); %transmittance fluctuation
Absorbance1=log(I0f)-log(It12) %calculation of absorbance accounding the fluctuations.
I0f=I0+randn(1,1).*SD; %excitation photon fluctuation
A2=E2*C1*l;
It21=I0f.*10.^(-A2);
SD21=sqrt(It21);
It22=It21+SD21.*randn(1,1);
Absorbance2=log(I0f)-log(It22)
r=corrcoef(Absorbance1, Absorbance2);
Absorbance_correlation=r(2)
s=corrcoef(C,Absorbance1);
Concentration_correlation1=s(2)
t=corrcoef(C,Absorbance2);
Concentration_correlation2=t(2)
% Pearson_correlation(100000,10000,25000,1)

John D'Errico (view profile)

on 11 Feb 2019
Edited by John D'Errico

John D'Errico (view profile)

on 11 Feb 2019

What do you expect? It seems you keep on doing these computations, but then fail to think about the result, not thinking why you get what you did. For example...
corrcoef(Absorbance1, Absorbance2)
ans =
1 0.999999780102625
0.999999780102625 1
The coreelation coefficient is NOT 1. However, it is very near 1. Not exactly so though.
p1 = polyfit(Absorbance1',Absorbance2',1)
p1 =
2.50345381465648 -0.0079603101543761
[Absorbance1'*p1(1) + p1(2), Absorbance2']
ans =
0.00319199569957401 0.00396857486191138
0.580997733409916 0.580941962999683
1.15897081349257 1.15836071715381
1.73713188545425 1.73637553044848
2.31550417871278 2.31518905653607
2.89411383324176 2.8950745980109
So, if we transform Absorbance1 by a linear transformation, we get something virtually identical to Absorbance2.
Likewise, C is a perfectly linear sequence.
C
C =
0 10 20 30 40 50
diff(Absorbance1)
ans =
0.230803434170655 0.230870278772036 0.230945371780708 0.231029743737411 0.231124557258262
As you should see by the differences there, Absorbance1 is nearly so too.
Can you possibly expect to not see nearly unit correlations for each of those comparisons? Two perfectly linear (non-constant) sequences will have a correlation coefficient that is either 1 or -1.
Look at what you get. Don't just compute a number and assume it has any meaning. Think about what you have done. Does what you did make sense?

Chanaka Navarathna

Chanaka Navarathna (view profile)

on 13 Feb 2019
Thank you. Can you suggest me a way to fix this?
John D'Errico

John D'Errico (view profile)

on 13 Feb 2019
To fix what? You got some numbers. For some reason unknown to me, you got a result that you did not expect. But you don't give any reason that the result should be any different, only your claim that it was unexpected.
Were I to be surprised by a computational result, i would then go back and check my computations. Verify every line. Does that intermediate result make sense? Did I do something improper? Then go to the next line. Think carefully about what I am doing in every step. If is still seems strange, think carefully about the mathematics.
For example, here you seem to be expecting a correlation that is not near 1, yet it is, and very nearly so. That implies you were expecting nonlinear behavior. Yet one should know that over sufficiently small regions, any differentiable nonlinear process will still appear linear. And that would explain a near unit correlation coefficient, as any two linear sequences will have a unit correlation coefficient, so +1 or -1.
So I've told you what I would do. You can do as you wish, because I cannot divine what it is you really wanted to do here, or know why you think you should have gotten something different. I don't know if you made a mistake inyour computations, or if you have mistaken expectations.