MATLAB Answers

Cross-validation the output of "scatteredInterpolant" in order to choose best method (linear, nearest, and natural)

9 views (last 30 days)
Dear all,
I had the value of precipitation in 93 scattered coordinate stations; I used "scatteredInterpolant" to interpolate this 93 scattered data in gridded coordinates.
[new_lons,new_lats] = ndgrid(44.25:.5:63.75,24.25:.5:39.75); %make the grid for my new lats/lons
After doing that I achieved 1280 gridded data. The exact coordinates of above mentioned 93 scattered data not included between these 1280. I mean I achieved new values on new coordinates.
Now I want to check R2 and RMSE of different methods that included in scatteredInterpolant (linear, natural, and nearest) to investigate which interpolation method was good for my data set.
I think I should using scattered interpolation again to interpolate these 1280 values on initial 93 scattered coordinates and check R2 and RMSE of values in first 93 original scattered vales and new interpolated 93 values that interpolated using scatteredinterpolant before.
So am I right?
Is there any better approach available?
I appreciate any suggestions.
Thank you


Sign in to comment.

Accepted Answer

Bjorn Gustavsson
Bjorn Gustavsson on 5 May 2020
To me that sound somewhat sensible, but would primarily check the regular-grid interpolation-method, and not the scatteredInterpolant-methods. My first idea would be to try a leave-one-out attack instead. If you leave one point out from your 93 you could still create the scatteredInterpolanting, then you have one test-point to compare that with an actual observation, then you can repeat and leave another point out (preferably not from the perimeter, I'd guess) to build some statistics.


Show 2 older comments
Behzad Navidi
Behzad Navidi on 5 May 2020
Yes, I agree this is the best method. Thank you for mentioning it.
So the name of this method is the Leave One Out Cross Validation (LOOCV) method as you mentioned.
1.Since you have knowledge and information on this topic, do you know if I had 1988 to 2018 monthly (360 months) values for every 92 points and also omitted point; and when I interpolated them to 1280 points and afterward interpolate omitted point I have this 360 months for each point too; Is it good to check the difference for all 360 months?
for example, I have 360 original values and 360 interpolated values so I should get mean of all 360s in both and then compare these two number together or I should calculate R2 and RMSE for each row one-by-one and then get an average of results to have an R2 and RMSE of the method.
2. One other question is, is this necessary to do it for all 93 points? or I can choose some of them for evaluating ? for example do this for just 10 points?
Thank you so much! it was so beneficial.
Bjorn Gustavsson
Bjorn Gustavsson on 5 May 2020
1, yeah, it should be good to do this for all months. It seems to be possible to manipulate the interpolant F - meaning that it should be possible to change the values (your precipitations), this would make it very efficient to loop over all months.
I would calculate all sorts of statistics once the core functionality is running, RMSE, R2, correlation etc. That part will cost you almost nothing extra.
2, I'd try to do it for all points on the interior. If you think about what happens when we exclude a point on the convex hull that point will now be outside your grid, and the value we get there will be extrapolated by F, not interpolated. And extrapolation is a dicey thing to do. So I'd separate your points into a group of internal points and the perimeter-points. Then I'd run this LOOCV on all the internal points (scatteredInterpolant built with 92 points but only exclude points from the internal-points-set). If this takes too long sure, use a smaller set.

Sign in to comment.

More Answers (0)




Translated by