How to replace values above a given percentile by nans
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hi,
I am trying to replace values above the 99th percentile in a table by nans by using Star Strider excellent little function (https://www.mathworks.com/matlabcentral/answers/127944-how-to-pick-the-j-th-percentile-of-a-vector). An example of my code looks like so:
% Table with nan
t = array2table(vertcat(horzcat([1:1000]', [1001:2000]'), NaN(10, 2)));
% Replace values above 99th percentile by nan
pct = @(v,p) interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))', sort(v), p*0.01, 'spline');
t.Var1(t.Var1 > pct(t.Var1, 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2, 99)) = nan;
The problem of course is that there are nan values scattered across multiple variables in the table and I therefore receive the following error message: Warning: Columns of data containing NaN values have been ignored during interpolation.
Does anyone has an idea as to how I could replace values above the 99th percentile in a table by nans in a table where there are already multiple nans ? Please note that I do not have the Statistic toolbox.
0 Kommentare
Akzeptierte Antwort
Adam Danz
am 15 Aug. 2019
Bearbeitet: Adam Danz
am 15 Aug. 2019
Here is an alternative function that ignores NaN values. It's not a nice one-liner like Star Striders but it does produce the same output as the prctile function in the stats toolbox.
function pctl = percentile(v,p)
% v is a vector of data; example v = [8 9 nan 13 15 11 nan 3 5 7];
% p is a percentile; example p = 75;
% pctl is the exact p_th percentile of v.
vsort = reshape(sort(v(~isnan(v))),1,[]); %row vector
pd = p/100 * numel(vsort);
idx = floor(pd + 0.5) + [0,1];
md = pd - idx(1);
idx(idx<1) = 1;
idx(idx>numel(vsort)) = numel(vsort);
pctl = sum(vsort(idx) .* (0.5+[-md,md]));
end % <-- only needed if this function is within a script
Comparison
v = [8 9 nan 13 15 11 nan 3 5 7];
p = 82;
percentile(v,p) % = 13.120000 this function
prctile(v,p) % = 13.120000 stat's toolbox
pct(v(~isnan(v)),p) % = 13.115995 the one-liner approximation
To apply it to your data,
t = array2table(vertcat(horzcat((1:1000)', (1001:2000)'), NaN(10, 2)));
t.Var1(t.Var1 > percentile(t.Var1, 99)) = nan;
t.Var2(t.Var2 > percentile(t.Var2, 99)) = nan;
Weitere Antworten (1)
per isakson
am 15 Aug. 2019
Bearbeitet: per isakson
am 15 Aug. 2019
Ignore the NaNs explicitely when calculating the percentile. Try
%%
% Table with nan
t = array2table(vertcat(horzcat([1:1000]', [1001:2000]'), NaN(10, 2)));
% Replace values above 99th percentile by nan
t.Var1(t.Var1 > pct(t.Var1, 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2, 99)) = nan;
function z = pct(v,p)
v( isnan(v) ) = [];
z = interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))' ...
, sort(v), p*0.01, 'spline');
end
1 Kommentar
Adam Danz
am 15 Aug. 2019
Bearbeitet: Adam Danz
am 15 Aug. 2019
If you're going to use the one-liner appoximation, you can keep the function handle and explicitly ignore the NaNs from within the call to the function.
t = array2table(vertcat(horzcat((1:1000)', (1001:2000)'), NaN(10, 2)));
pct = @(v,p) interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))', sort(v), p*0.01, 'spline');
t.Var1(t.Var1 > pct(t.Var1(~isnan(t.Var1)), 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2(~isnan(t.Var2)), 99)) = nan;
% {_____here____}
Siehe auch
Kategorien
Mehr zu Descriptive Statistics finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!