Unique elements in matrix efficiently

Question

George Aipal am 26 Feb. 2016

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/270204-unique-elements-in-matrix-efficiently

Kommentiert: George Aipal am 26 Feb. 2016

Given a matrix A, what is an efficient way to obtain a matrix B that consists of only the unique elements in A (where by unique, I mean row-wise, therefore the 'ismember' function is not suitable).

For example:

A = [1 2 3 1; 4 1 6 7; 8 9 8 11]
A =
      1    2    3    1
      4    1    6    7
      8    9    8   11

Since each row contains different number of unique elements than other rows, the matrix of unique elements would have rows each of possibly different dimensions, which is not possible. Therefore, I can fill the repeated/non-unique elements with some dumb filler values (since I am working with only positives I can replace them with a negative number (eg -1), or if it had negative numbers too, they could be replaced by NaN values.

The result therefore can be:

B =
  2    3   -1
  1    6    7
  9   -1   11

(where the negative -1s could be replaced by NaNs alternatively).

Notice that although A(2, 2) has a value (1) that already exists in the previous column (A(1, 1)), it is still unique in its own row, therefore the 'ismember' function cannot be applied.

I have created a solution, but I can imagine there are more elegant and more efficient solutions using vectorization and avoiding for loops, for when matrix A is very large which happens to be the case:

B = A(:, 1); 
for i = 2:size(A, 2)
  NEWMEMBERS = !sum(bsxfun(@eq, A(:, i), B), 2);
  NEWCOL = NEWMEMBERS .* A(:, i);
  FILLER = -1 * ~NEWMEMBERS;
  NEWCOL = NEWCOL + FILLER;
  B = [B NEWCOL]; 
end

(FILLER can be more generally replaced by a vector of 0 and NaNs instead of 0s and -1s)

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

John D'Errico am 26 Feb. 2016

Bearbeitet: John D'Errico am 26 Feb. 2016

Note that this question does not actually contain validly executable MATLAB code, having constructs like +=, and ! in the code.

George Aipal am 26 Feb. 2016

Edited

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Titus Edelhofer am 26 Feb. 2016

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/270204-unique-elements-in-matrix-efficiently#answer_211327

In MATLAB Online öffnen

Hi,

a rather simple version would be this:

B = -ones(size(A));
for row=1:size(A, 1)
  val = unique(A(row,:));
  [~,idx] = ismember(val, A(row,:));
  B(row,idx) = val;
end

I haven't tried though what happens if A is large ...

Titus

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 2

Stephen23 am 26 Feb. 2016

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/270204-unique-elements-in-matrix-efficiently#answer_211328

Bearbeitet: Stephen23 am 26 Feb. 2016

In MATLAB Online öffnen

Without any loops (and can be easily adapted to use a tolerance):

A = [1 2 3 1; 4 1 6 7; 8 9 8 11]
S = size(A);
[B,C] = sort(A,2);
D = [false(S(1),1),diff(B,1,2)==0];
R = (1:S(1))'*ones(1,S(2));
X = sub2ind(S,R(D),C(D));
A(X) = NaN

displays this in the command window:

A =
   2     3   NaN
   1     6     7
   9   NaN    11

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

George Aipal am 26 Feb. 2016

Thanks for your great solution! It replies to my statement indeed! I have followed up in a comment below with a question regarding performance, and how to eliminate redundant columns of only NaNs. I will rephrase the problem to make it look more like a common case, so I'll now refer to rows instead of columns, and X instead of A:

You have a large dataset Xdata (NxM), containing many samples of dimension M (M parameters), each sample on each row. N is very large (many samples), and the task is to reduce the memory size by creating an Xreduced that does not contain unnecessary rows. In this particular case, I want to store only unique parameters, so each column of Xreduced should have only unique values, or fillers (NaN or -1), but rows containing only NaN should not appear in Xreduced.

I suppose I can use your solution, sort it, and then use a for loop to find the first column of A (or first) that contains at least 1 element different than NaN, and then copy from there ownards the rest of A or X, to B or Xreduced, but this requires loops, and duplicating the original large size of the dataset A (Xdata). Is there a better way to do this? Or given this specific problem, perhaps my initial solution in my original question is as optimal as it gets?

Melden Sie sich an, um zu kommentieren.

Answer 3

George Aipal am 26 Feb. 2016

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/270204-unique-elements-in-matrix-efficiently#answer_211333

Thank you all for your answers. It is correct the comment that A(2, 2) should not be NaN because it is the only 1 in that row. Regarding the solutions from Titus and Stephen, they both work, and Stephen's solution does respond to my question of avoiding for loops as I was using in my own solution, aiming to increase speed. However, interestingly enough, I have just found out that some of these solutions without the for loop are performing slower than with the for loop, or am I doing something wrong? I will investigate about performances and let you know (it seems that if I run solution 1 followed by solution 2 and then 3rd, somehow affects their individual performance than if I invert the order in which I run them, so I should quit the environment and run 1 solution at a time, perhaps with bigger matrices, and I shall report on the findings.

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Titus Edelhofer am 26 Feb. 2016

In MATLAB Online öffnen

Hi George,

the filler comes before by setting

B = nan(size(A));

or

B = -ones(size(B));

or whatever.

My advice to customers is usually: write the code in a way that is simple to write and simple to read. When it's progressed and you identify bottlenecks, then start investigating by tic/toc or profiler. Don't get me wrong, a good deal of my work is teaching vectorization (one of my favorite underused functions is bsxfun). But writing unreadable vectorized code without need I try to avoid. And if I do, I add as comment the simpler/loop version so that someone else (or myself) understand what's happening.

Titus

George Aipal am 26 Feb. 2016

Bearbeitet: George Aipal am 26 Feb. 2016

Great, I'll keep your advice and tips in mind. Regarding the pre-filling, that would not eliminate the columns of only NaNs or -1s, so I guess I would then have to use a for loop where I somehow find the first column containing at least one element different than NaN or -1, and then copy the matrix from then onwards, but still, I would initially require to use the very large matrix?

That may be the only way to do it, the problem is that in this case, each column belongs to a sample from a dataset, and I am extracting only the unique values per dimensions in all this large dataset. Depending on the size of the dataset, it could be too big to store an initial B with so many columns at first (plus all the samples already stored in dataset A). Having said that, is my initial solution in my original question the best solution? A vectorized implementation like Stephen's also looks great, but I would again need to sort out this problem I suppose?

A summary of my problem would be, given a large dataset A (M x N), extract (efficiently, in the fastest possible way since this is a large matrix) the unique values on each dimension/parameter, since the repeated values are redundant and of no use, simply occupying much memory. Usually the samples tend to be on each row, so I could simply do X=permute(A, [2 1]) to make it look more like a common case, then produce an Xreduced that should have fewer rows than X, containing only unique values per parameter, and the rest with fillers, but without needing to have (rows in this case of X) of only fillers, as the idea is to reduce the memory used. I hope that makes any sense?

Melden Sie sich an, um zu kommentieren.

Unique elements in matrix efficiently

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Antworten (3)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Unique elements in matrix efficiently

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Antworten (3)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden