This is a MATLAB version of Jerome Friedman's 1984 supersmoother algorithm. The original was written in Fortran; this is a vectorized translation. The algorithm is a variable span smoother which uses cross validation to pick the best span for each predicted point.
Inspired: ReSpect
Create scripts with code, output, and formatted text in a single executable document.
Andrea Libri (view profile)
Januka (view profile)
I tried out a couple of smoothing programs on a dataset where I had low amplitude sharp scattering but none gave me a smooth line as supsmu.m did. The overlap of the original curve was great even with low power of smoothing.
Douglas Schwarz (view profile)
Yuri,
Sure, it will smooth any vector of at least 5 values. You'll have to decide if you like the result. Just give it a try.
Doug
Yuri K (view profile)
Hi Doug,
Interesting algorithm. Can it be used to smooth unperiodic step function?
send paper
(some version was published recently in FEX by Eilers).
Sorry this is not quite correct. Eilers has published a paper on "perfect smoother", I have written his code in Matlab and suggested him to publish it in FEX, but this was not done.
Dear Yi Cao,
Thanks for your suggestion. Unfortunately, the anonymous function approach is (a little) slower than what I did. I don't want to trade off execution speed for a cleaner mlint report. If I want that I can add %#ok to the appropriate lines, but really, it's just not that important.
Doug
I appreciate what John d'Great does extremely high. What is a perfect filter is an eternal question (some version was published recently in FEX by Eilers). I suggest a test for SUPSMU, based on data example used by the author. The test compares errors of SUPSMU and SAVFILT. With 4 inputs, you can see the residuals. Just type
>> testsmu
function [ER,errors]=testsmu(NITER,W,REP,SHOW)
%Input
%NITER=Number of filtering ITERations
%W = filtering Window for SAVFILT
%REP = number of test repetitions, def. 100
%SHOW = if set, SHOWs the residuals of SUPSMU and SAVFILT
%
%Output
%errors = matrix or errors in every REPetition
% ER = 3 mean errors: [noise, supsmu, savfilt]
NAR=nargin;
if NAR<3
REP=100;%REPetitions
end
if NAR<2
W=21;
end
if NAR<1
NITER=2;
end
errors=zeros(REP,3);
x = linspace(0,1,201);
signal=sin(2.5*x);
ind=W+1:201-W; %Index for inner points
sigind=signal(ind);
for j=1:REP
noise=0.05*randn(size(x));
NAMP=std(noise);
y = signal + noise;
smo = supsmu(x,y);
yy=y;
for i=1:NITER
yy=savfilt(yy,W);
end
errors(j,:)=[NAMP std(smo(ind)-sigind) std(yy(ind)-sigind)];
if NAR==4
plot(x,noise,'o',x,smo-signal,x,yy-signal,'linewidth',1)
legend('data','supsmu','savfilt')
shg
pause
end
end
ER=sqrt(mean(errors.^2));
Dear Douglas,
One way to get rid of these warning without making the code more complicated is to use anonymous functions. It seems that among 5 inputs, 3 of them (1,3 and 5) are the same for all algorithms. Hence, you can change function handle, smooth to anonymous function, for example in case 1, you can use
smooth = @(y,s)(smooth_wt_aper(x,y,prop.weights,s))
so that in smooth_wt_aper definition you can remove the input argument, period.
To use the anonymous function, line 188, for example, can be replaced as:
smo = smooth(y,prop.span);
hth.
Thanks, John and Urs, for your comments.
John, there are four subfunctions, but for any given run of the function only one is used as selected by the specific conditions (whether weights are supplied and whether the data is periodic). So, it is necessary for all four subfunctions have identical arguments, even though some of them may not be used -- hence, the mlint report. Could I get around that? Sure, but it's only an mlint report -- anything I might do would just make the code more complicated.
Doug
simply said: an excellent adaptation of a well-known FORTRAN code to a smooth, professional ML program... as had to be expected - of course - from this seasoned author...
us
Excellent help. Well documented, well written. I found an H1 line, error checking, a nice interface. All excellent.
A minor point is that mlint flagged a few things. Some of the subfunctions had input arguments that are never used. Always check with mlint. Its surprisingly smart sometimes. These flags here look like they have uncovered some deadwood in the algorithm, not true flaws.
This will works nicely for most data. Of course, you can trip it up if you try. Here is a tough curve for the default parameters:
x = sort(randn(1000,1));
y = sin(x*3) + randn(size(x))/10;
tic,yss = supsmu(x,y);toc
Elapsed time is 0.036050 seconds.
plot(x,y,'b.',x,yss,'r-')
This data was a problem because it varies widely in its density in x. What this should teach those who do numerical modeling is to always know your data, and always understand the tool that you will use to fit your data. Poor results are possible otherwise.
A very nice feature of supsmu is how fast it was on the test curve with 1000 points above.