I have data that should resemble a parabola when plotted into a figure. However, near the center, there is a "high" value for the data.
x = [-9.0000 -8.0000 -7.0000 -6.0000 -5.0000 -4.0000 -3.0000 -2.0000 -1.0000 0 1.0000 ...
2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000 10.0000];
y = [0.0173 0.0169 0.0168 0.0166 0.0166 0.0167 0.0165 0.0165 0.0166 0.0167 0.0168 ...
0.0177 0.0189 0.0173 0.0176 0.0178 0.0180 0.0181 0.0182 0.0185];
The values I would like Matlab to see as an outlier are x = 2 --> y = 0.0177, and x = 3, --> y = 0.0189, because I should not expected the parabola to grow in the middle, and then decrease. However, it does not count this points as outliers because, of course, Matlab does not know that I should be expecting a parabola-like shape. How could I do this? Thank you!

6 Kommentare

A picture is sometimes worth a thousand words.
x = [-9.0000 -8.0000 -7.0000 -6.0000 -5.0000 -4.0000 -3.0000 -2.0000 -1.0000 0 1.0000 ...
2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000 10.0000];
y = [0.0173 0.0169 0.0168 0.0166 0.0166 0.0167 0.0165 0.0165 0.0166 0.0167 0.0168 ...
0.0177 0.0189 0.0173 0.0176 0.0178 0.0180 0.0181 0.0182 0.0185];
plot(x,y,'o')
Of course, sometimes even a picture is insufficient. ;-)
So which outliers do you hope to identify? Is the bump down at x==-4 really an outlier? It is not very far out, so perhaps difficult to identify.
And when you have a pair of points that are close to each other that are both apprently outliers, this is more difficult to identify, since they tend to mask each other. And when you have only few data points, things get more tough yet. At least your data does not appear to be real noisy outside of the outliers.
The rudimentary stepwise use of
b=polyfit(X,Y,2);
res=Y-polyval(b,X);
z=res/std(res);
iz=find(abs(z)>=3);
X(iz)=[]; Y(iz)=[];
and repeating a second time isolated the two interior points around X==2 in this case. That's no guarantee will work for general cases, of course, and visualization is always better than just thowing code at a problem...I'd probably give something like that a first crack at it, though, if had to do something.
The CurveFitting TB has some residual analysis tools, but I've not used it enough "in anger" to be able to comment on them.
The results of the above in a couple of graphs here were
John D'Errico
John D'Errico am 20 Nov. 2021
Bearbeitet: John D'Errico am 20 Nov. 2021
What DPB suggests will work well here for the two large residual points. But it will surely run out of steam if you hope to use it to find that little bump at x==-4, because a low order polynomial will not fit that data well enough. There is simply too much lack of fit, that that one point will get lost in the lack of fit.
Classical methods that I recall do local, moving polynomial fits. I don't really know enough about the methods in tools like isoutlier or rmoutlier to be able to offer good advice there.
John, you are totally right! I should have had included a picture of the results for the sake of clarity. I was actually trying to identify the outliers located at x=2 and x=3 but I did not want it to be "manually" done, but I was having a hard time with the outliers method Matlab has. Thank you!
dpb, your answer also helps me a lot, and I think it is a pretty good idea that I can also include a plot of the residuals.
Thank you so much to both of you!!
What DPB has suggested is a variation of often called an iteratively reweighted least squares. It is the basis for many of the robust fitting tools you will find. Points with large residuals are downweighted, then a weighted fit is redone. In the case of an outlier scheme, you can just decide to just remove them if you wish.
Thank you both! By the way, John, your function inpaint_nans is a life saver!!! (Do not worry, I have properly cited it everytime I used it :) )

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Linear interpolation might be good to use here, e.g.:
x = [-9.0000 -8.0000 -7.0000 -6.0000 -5.0000 -4.0000 -3.0000 -2.0000 -1.0000 0 1.0000 ...
2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000 10.0000];
y = [0.0173 0.0169 0.0168 0.0166 0.0166 0.0167 0.0165 0.0165 0.0166 0.0167 0.0168 ...
0.0177 0.0189 0.0173 0.0176 0.0178 0.0180 0.0181 0.0182 0.0185];
plot(x,y, 'linewidth', 2), shg
% Linear Interpolation
x1 = 2; y1 = 0.0177;
x2 = 3; y2 = 0.0189;
Idx = find(x==x1 | x==x2);
y(Idx) = interp1([x(Idx(1)-1),x(Idx(2)+1)], [y(Idx(1)-1),y(Idx(2)+1)], x(Idx));
hold on
plot(x, y, 'r--', 'linewidth', 2), grid on; legend('Raw: x vs. y', 'Fixed: x vs. y')

2 Kommentare

This was extremely nice. Thank you so much!
Most Welcome! All the Best!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by