Why does my mex-function for linear regression return r-squared values above 1?

In a mex-file I have a function called ols() that is supposed to calculate a linear regression for two vectors, 'x' and 'y'. Unfortunately, every now and then ols() returns r-squared values that are higher than 1, so at some point there must be an error in my calculations...but I can't figure out where. Any help would be be greatly appreciated.
std::vector<float> ols(const std::vector<float>& x, const std::vector<float>& y){
if(x.size() != y.size()){
mexErrMsgTxt("Error: The length of vector 'x' does not equal the length of vector 'y'.");
}
float sumx = 0.0;
float sumx2 = 0.0;
float sumxy = 0.0;
float sumy = 0.0;
float sumy2 = 0.0;
float n = x.size();
for (int i=0;i<n;i++){
sumx += x[i];
sumx2 += pow(x[i], 2);
sumxy += x[i] * y[i];
sumy += y[i];
sumy2 += pow(y[i], 2);
}
float denominator = (n * sumx2 - pow(sumx, 2));
float slope = (n * sumxy - sumx * sumy) / denominator;
float intercept = (sumy * sumx2 - sumx * sumxy) / denominator;
float r = (sumxy - sumx * sumy / n) /
sqrt((sumx2 - pow(sumx, 2)/n) *
(sumy2 - pow(sumy, 2)/n));
float rsquared = pow(r, 2);
std::vector<float> stats;
stats.push_back(slope);
stats.push_back(intercept);
stats.push_back(rsquared);
return stats;
}

 Akzeptierte Antwort

James Tursa
James Tursa am 2 Okt. 2018
Bearbeitet: James Tursa am 2 Okt. 2018
I haven't run your code, but just looking at it you are using the worst algorithm numerically with single precision numbers, which is not a good combination. At the very least you should be using a better algorithm, and maybe doing the calculations in double precision. E.g., see this link:

5 Kommentare

The reason for why I use floats instead of doubles is that I am dealing with very large 2D matrices (which are being passed to the mex Routine). If these matrices were stored as doubles I would run out of memory constantly.
James Tursa
James Tursa am 2 Okt. 2018
Bearbeitet: James Tursa am 2 Okt. 2018
So you've got good reasons to use single precision for your data, but you should still use a better algorithm and use double precision variables to do the accumulation etc calculations.
I will do that. Thanks for your suggestions.
James Tursa
James Tursa am 2 Okt. 2018
Bearbeitet: James Tursa am 2 Okt. 2018
As a start, maybe try the two pass algorithm since it is much better than the one pass algorithm you are currently using.
Thanks - I will give the two pass algorithm a go.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Descriptive Statistics finden Sie in Hilfe-Center und File Exchange

Gefragt:

am 1 Okt. 2018

Kommentiert:

am 2 Okt. 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by