Residual values for a linear regression fit
4 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
NA
am 16 Okt. 2020
Kommentiert: Star Strider
am 17 Okt. 2020
I have these points
x = [1,1,2,2,3,4,4,6]';
y = [8,1,1,2,2,3,4,1]';
I want to remove the point from above set that makes the residual largest.
This is the code I use
d=zeros(length(x),1);
for i=1:length(x)
x_bk = x;
y_bk = y;
x(i) = [];
y(i) = [];
X = [ones(length(x),1) x];
b = X\y;
yhat = X*b;
d(i) = abs(sum(y - yhat));
x = x_bk;
y = y_bk;
end
index = find(min(d)==d);
x(index) = [];
y(index) = [];
X = [ones(length(x),1) x];
b = X\y;
yhat_r = X*b;
plot(x,y,'o')
hold on
plot(x,yhat_r,'--')
I think the result should be black line (attached file), but I get red dashed line.
0 Kommentare
Akzeptierte Antwort
Star Strider
am 16 Okt. 2020
I would do something like this:
x = [1,1,2,2,3,4,4,6]';
y = [8,1,1,2,2,3,4,1]';
xv = x;
yv = y;
for k = 1:numel(x)
X = [xv(:), ones(size(xv(:)))];
b = X \ yv(:);
yhat = X*b;
rsdn(k) = norm(yv - X*b);
xv = x;
yv = y;
xv(k) = [];
yv(k) = [];
end
figure
plot((1:numel(x)), rsdn)
grid
[rsdnmin,idxn] = min(rsdn(2:end));
[rsdnmax,idxx] = max(rsdn(2:end));
lowest = idxn+1
hihest = idxx+1
idxv = [lowest; hihest];
figure
for k = 1:2
subplot(2,1,k)
xv = x;
yv = y;
xv(idxv(k)) = [];
yv(idxv(k)) = [];
plot(xv,yv,'ob')
yhat = [xv(:), ones(size(xv(:)))]*bmtx(:,idxv(k));
hold on
plot(xv, yhat, '--r')
hold off
title(sprintf('Eliminating Set %d', idxv(k)))
end
Here, the norm of residuals (the usual metric) is least when eliminating ‘row=2’, and greatest when eliminating ‘row=6’.
Experiment to get the result you want.
6 Kommentare
Star Strider
am 17 Okt. 2020
In that simulation, you are defining a particular slope and intercept and adding a normally-distributed random vector to it. The slopes and intercepts of the fitted lines will not change much.
You can see that most easily if you add this text call to each plot (in the loop):
text(1.1*min(xlim),0.9*max(ylim), sprintf('Y = %.3f\\cdotX%+.3f',bmtx(:,k)), 'HorizontalAlignment','left')
That will print the regression equation in the upper-left corner of each one. You can then compare them.
Note that the residual norms do not change much, either. In the original data set, they varied between 2.73 and 5.97. In this data set, they are within about ±0.5 of each other.
Weitere Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!