ranking (ordering) values with repeats

Hello Community,
Im hoping some of you have a clever solution to this problem. Im looking for fast and efficient way to rank (order)a vector of numbers in a particular way when repeated values arise.
To make it simple, suppose I have a row vector:
data = [-1 2 0 -2 0]
I know I can rank them using the 3rd output of "unique":
>> [~,~,rnk] = unique(data)
rnk =
2 4 3 1 3
What I like about this is that it assigns the same rank to the repeated zeros. What I don't like about this is that the top rank is now "4" even though I have 5 values. I would prefer this:
>> rnk = myrank(data)
rnk =
2 5 3 1 3
Ive also played around with the second output of "sort" quite a bit, but since this output produces indicies of the sorted values within the original array, there is no simple way (that I've found) to associate the same rank with repeated values.
Im just wondering if there is something simple that Im missing.
Thanks!

 Akzeptierte Antwort

Oleg Komarov
Oleg Komarov am 23 Mär. 2012

3 Stimmen

If you have the Statistics Toolbox, but it's kinda an overkill:
floor(tiedrank([-1 2 0 -2 0]))
ans =
2 5 3 1 3
Otherwise:
data = [-1 2 0 -2 0];
% Sort data
[srt, idxSrt] = sort(data);
% Find where are the repetitions
idxRepeat = [false diff(srt) == 0];
% Rank with tieds but w/o skipping
rnkNoSkip = cumsum(~idxRepeat);
% Preallocate rank
rnk = 1:numel(data);
% Adjust for tieds (and skip)
rnk(idxRepeat) = rnkNoSkip(idxRepeat);
% Sort back
rnk(idxSrt) = rnk
rnk =
2 5 3 1 3

4 Kommentare

owr
owr am 24 Mär. 2012
This is very nice Oleg, thank you for taking the time. I do have the Statistics Toolbox and had forgotten about "tiedrank". I tried that when I first worked on this problem about a year ago, however the "floor" trick falls apart when you have more than 2 ties.
Regardless, your other solution appears to be working nicely and seems to be a good deal faster than using "unique". Very nice! I'll have to look into this more on Monday morning when I have a clear head, its quitting time now.
Thanks again!
Oleg Komarov
Oleg Komarov am 24 Mär. 2012
floor(tiedrank... was dirty trick :)
There is something wrong here; for
data = [1 -3 -3 2 23 23];
The result is
rnk =
3 1 1 4 5 4
Tommaso Fornaciari
Tommaso Fornaciari am 4 Dez. 2016
Hi is there a way to assign equal observations two different subsequent ranks? following on the original question, the output i would need is 2 5 4 1 3 or 2 5 3 1 4
Thank you

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (4)

Raph
Raph am 4 Mai 2015

11 Stimmen

It should also work with sort() and ismember()
data_sorted = sort(data);
[~, rnk] = ismember(data,data_sorted)

1 Kommentar

Bradley Stiritz
Bradley Stiritz am 28 Mai 2016
Very impressive, Raph! Thanks for your contribution. Excellent use of built-in vectorized functions.

Melden Sie sich an, um zu kommentieren.

sunbeam
sunbeam am 6 Mär. 2013

0 Stimmen

This should work. I couldn't figure out how to do it without a loop, but at least this only loops over the duplicate entries. Someone let me know if you come up with a better way.
function outrank = rankWithDuplicates(data,mode)
% R = rankWithDuplicates(data,mode) ranks the values in the data variable
% according to size, allowing for duplicates. Whereas sort actually
% rearranges the input, and therefore duplicates get assigned different
% indices, rankWithDuplicates will simply output the rank order allowing
% ties for duplicate entries. For example,
%
% rankWithDuplicates([1 1 5 8 8 10])
%
% will output [1 1 3 4 4 6]; and if these entries are shuffled like
%
% rankWithDuplicates([8 1 5 1 10 8])
%
% the output will be [4 1 3 1 6 4].
%
% INPUT: data, a vector of real numbers.
% mode, an optional input which can be 'ascend' or 'descend'
%
% OUTPUT: the rank order of the input data.
%
if nargin==1
mode='ascend';
end
[~,b]=size(data);
if b==1
data=data';
end
% Sort data
[srt, idxSrt] = sort(data,mode);
% Find where are the repetitions and negate
idxRepeat = [false diff(srt) == 0];
% Loop through where there are duplicates and maintain the rank.
% I'm not sure if this is necessary but it's the only way I could get it
% done.
rnk = 1:numel(data);
loopidx=find(idxRepeat>0);
for i=loopidx
rnk(i)=rnk(i-1);
end
% Return order according to original sort
outrank(idxSrt)=rnk;
Jeyamugan T
Jeyamugan T am 7 Apr. 2017

0 Stimmen

I wrote this code for some other purpose but it may useful for this problem.
function [rkList]=arrayRankEx(O)
cO=sort(O);
n=size(O,2);
rkList=zeros(1,n);
in=1;
while(in<=n)
out=1;
co=0;
while(out<=n && in<=n)
if(O(out)==cO(in))
rkList(out)=in;
co=co+1;
end
out=out+1;
end
in=in+co;
end
end
>>[5 7 -2 1 -1 0 0 1 5 3]
ans =
5 7 -2 1 -1 0 0 1 5 3
>> arrayRankEx([5 7 -2 1 -1 0 0 1 5 3])
ans =
8 10 1 5 2 3 3 5 8 7
Benjamin Levy
Benjamin Levy am 16 Nov. 2017

0 Stimmen

Not sure if this is still a 'live' thread, but the code should report these ranks for ascending order: 2.0000 5.0000 3.5000 1.0000 3.5000.
Now, suppose your data set is data = [ 11 20 2 14 15 11 13 20 7 9 1 5 17... 7 5 16 3 5 20 ]; Your answer for ascending order (correcting for ties), using sortrows([ data' ranks ],2), should provide column 1 = data, column 2 = ranks:
1.0000 1.0000
2.0000 2.0000
3.0000 3.0000
5.0000 5.0000
5.0000 5.0000
5.0000 5.0000
7.0000 7.5000
7.0000 7.5000
9.0000 9.0000
11.0000 11.0000
11.0000 11.0000
11.0000 11.0000
13.0000 13.0000
14.0000 14.0000
15.0000 15.0000
16.0000 16.0000
17.0000 17.0000
20.0000 19.0000
20.0000 19.0000
20.0000 19.0000
Note that there are several sections in the sorted data wherein there are consecutive runs of same integers (e.g., ...5 5 7 7 ).
Using your code and my data set, and the same final sort, I have (column 1 data, column 2 ranks):
1 1
2 2
3 3
5 4
5 4
5 4
7 5
11 7
7 7
11 7
9 9
11 10
13 13
20 13
20 13
14 14
15 15
16 16
17 17
20 18

1 Kommentar

Nataraja M
Nataraja M am 26 Mär. 2018
Hello Sir I used above command sortrows([ data' ranks ],2) for ranking vectors from maximum to lowest, but facing error like Not enough input arguments. Can you please help me to solve this error Thank you

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Descriptive Statistics finden Sie in Hilfe-Center und File Exchange

Gefragt:

owr
am 23 Mär. 2012

Kommentiert:

am 26 Mär. 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by