Matlab find unique column-combinations in matrix and respective index
124 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Benvaulter
am 22 Mär. 2017
Bearbeitet: Jan
am 23 Mär. 2017
I have a large matrix with with multiple rows and a limited (but larger than 1) number of columns containing values between 0 and 9 and would like to find an efficient way to identify unique row-wise combinations and their indices to then build sums (somehwat like a pivot logic). Here is an example of what I am trying to achieve:
a =
1 2 3
2 2 3
3 2 1
1 2 3
3 2 1
uniqueCombs =
1 2 3
2 2 3
3 2 1
numOccurrences =
2
1
2
indizies:
[1;4]
[2]
[3;5]
From matrix a, I want to first identify the unique combinations (row-wise), then count the number occurrences / identify the row-index of the respective combination.
I have achieved this through generating strings with num2str and strcat, but this method appears to be very slow. Along these thoughts I have tried to find a way to form a new unique number through concatenating the values horizontally, but Matlab does not seem to support this (e.g. from [1;2;3] build 123). Sums won't work because they would remove the possibility to identify unique combinations. Any suggestions on how to best achieve this? Thanks!
0 Kommentare
Akzeptierte Antwort
Guillaume
am 22 Mär. 2017
More or less the same as Jan's, using accumarray instead of splitapply (I'm still old school!):
A = [ 1 2 3
2 2 3
3 2 1
1 2 3
3 2 1];
[B, ~, ib] = unique(A, 'rows');
numoccurences = accumarray(ib, 1);
indices = accumarray(ib, find(ib), [], @(rows){rows}); %the find(ib) simply generates (1:size(a,1))'
4 Kommentare
Guillaume
am 23 Mär. 2017
Bearbeitet: Guillaume
am 23 Mär. 2017
I suspect that accumarray will be faster as it is built-in compiled code whereas splitapply is m code, but I haven't conducted any test.
Note: for the indices,
indices = accumarray(ib, (1:numel(ib))', [], @(rows){rows});
is probably slightly faster, just not as concise.
Jan
am 23 Mär. 2017
Bearbeitet: Jan
am 23 Mär. 2017
@Guillaume: I compare this with cellfun: In older versions Matlab contained the C-sources for this Mex function. Here calling a function handle is very expensive, because the Matlab tier has to be called. Therefore the implicitely defined methods provided by strings are much faster: 'length', 'isclass' etc.
Then using a compiled Mex function is not a real benefit, because mexCallMATLAB has some overhead. This might concern accumarray also. I guess that your accumarray approach is faster than the loop, but I know that it looks very cryptic ;-)
But now I can leave the speculations and run a test: With
A = randi([1, 100], 1e5, 3); % Test data
my loop takes 14.75 seconds, your accumarray approach takes 0.44 seconds. The results differ in the order of the indices. So perhaps this is wanted:
[B, iB, iA] = unique(A, 'rows');
indices = accumarray(iA, (1:numel(iA)).', [], @(r){sort(r)});
The result is clear: @Benvaulter, please unaccept my answer and select Guillaume's, and of course use it also to save time and energy.
Weitere Antworten (1)
Jan
am 22 Mär. 2017
Bearbeitet: Jan
am 23 Mär. 2017
A = [ 1 2 3; ...
2 2 3; ...
3 2 1; ...
1 2 3; ...
3 2 1];
[B, iB, iA] = unique(A, 'rows');
G = unique(iA);
numOccurrences = splitapply(@sum, iA, G);
I cannot test a method to obtain the indices list as wanted. I assume this works with splitapply also. A simple loop approach at least:
n = length(G);
indices = cell(1, n);
for k = 1:n
indices{k} = find(iA == G(k));
end
Siehe auch
Kategorien
Mehr zu Matrix Indexing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!