Get indeces of any quantile of a column
23 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hello everybody,
as of now I´m trying to sort a large (101x1168) matrix. I am always sorting the first column, on which the following three columns depend upon. I want to be able to get any of the indeces of, for example the top 10 % cent of the values, or the values between the .3 and .4 quantile of the first column, to adress those with a function. As of now I have used several sortrows(), but it takes a long time to run. It is important to know that the length of the columns may vary ( Some of the columns have more NaNs than others) and thus it would be amazing if it was a function that ignores NaNs (maybe a combination of quantile() and find()?)
Here an example of what I need:
Col. 1 Col. 2 Col. 3 Col. 4
15 18 12 32
14 23 19 12
10 7 18 12
9 34 12 13
11 19 3 17
I know want to know the Index and the value of the top 20% values a in the first column. In this case it would be 1. and 15. If implemented correctly I would be able to get a vector output with all the data.
Any help is truly appeciated! Many thanks and kind regards, A.Goe
0 Kommentare
Antworten (1)
Image Analyst
am 26 Aug. 2016
If you have the Statistics and Machine Learning Toolbox, there is prctile(). Would that help?
Y = prctile(X,p) returns percentiles of the values in a data vector or matrix X for the percentages p in the interval [0,100]. If X is a vector, then Y is a scalar or a vector with the same length as the number of percentiles required (length(p)). Y(i) contains the p(i) percentile.
If X is a matrix, then Y is a row vector or a matrix, where the number of rows of Y is equal to the number of percentiles required (length(p)). The ith row of Y contains the p(i) percentiles of each column of X.
For multidimensional arrays, prctile operates along the first nonsingleton dimension of X.
2 Kommentare
Image Analyst
am 27 Aug. 2016
If the values must be in your data, then you can use cumsum() to create the cdf, then use find to find the value. Untested code:
col1 = sort(data(:, 1), 'ascend');
cdf = cumsum(col1); % Compute cdf
cdf = cdf/cdf(end); % Normalize
% Find index of top 20 %
index = find(cdf >= 0.8, 1, 'first');
dataValue = col1(index);
Siehe auch
Kategorien
Mehr zu Data Distribution Plots finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!