I have a matrix with many (1e5+) rows and I want to remove both copies of all duplicate rows. Is there a fast way to do this? (This function needs to be run many times.)

4 Kommentare

Azzi Abdelmalek
Azzi Abdelmalek am 4 Mai 2016
Do you mean faster way then using unique function?
Michael Siebold
Michael Siebold am 4 Mai 2016
Correct me if I'm wrong, but won't unique always leave one copy of the duplicated rows. I want to remove all copies of the duplicated rows.
You can use the other calling methods to get replicate counts.
a = [1 2; 1 2; 2 3; 2 4; 2 5; 4 2; 4 2; 1 3; 1 3; 4 5];
[C,ia,ic] = unique(a,'rows');
[count key] = hist(ic,unique(ic));
Then you can just select the keys with non-unit counts and drop them.
Michael Siebold
Michael Siebold am 4 Mai 2016
Perfect and thanks a million! I kept messing with ia and ic, but just wasn't thinking histogram... Would you mind submitting this as an answer so I can accept it?

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Roger Stafford
Roger Stafford am 5 Mai 2016
Bearbeitet: Roger Stafford am 5 Mai 2016

1 Stimme

Let A be your matrix.
[B,ix] = sortrows(A);
f = find(diff([false;all(diff(B,1,1)==0,2);false])~=0);
s = ones(length(f)/2,1);
f1 = f(1:2:end-1); f2 = f(2:2:end);
t = cumsum(accumarray([f1;f2+1],[s;-s],[size(B,1)+1,1]));
A(ix(t(1:end-1)>0),:) = []; % <-- Corrected

6 Kommentare

Roger Stafford
Roger Stafford am 5 Mai 2016
In your request to remove "duplicated rows" I have assumed you meant to remove any duplicate rows, however far apart they might occur in the matrix, not just adjacent ones, and that is why the first operation above is 'sortrows' so as to group any duplicate rows temporarily together. After these are eliminated, the remaining rows are returned to their original order. If that is not what you meant, then this code will not be correct for you.
There is an error in the last line of my code. This last line should read:
A(ix(t(1:end-1)>0),:) = [];
Jan
Jan am 5 Mai 2016
@Roger: Why don't you fix the line directly in your code?
Roger Stafford
Roger Stafford am 5 Mai 2016
Good idea, Jan!
Michael Siebold
Michael Siebold am 5 Mai 2016
Bearbeitet: Michael Siebold am 5 Mai 2016
And this solution is even faster than the first suggestion in the comments! Thanks for all the help!
saad sulaiman
saad sulaiman am 5 Nov. 2022
greetings.
how could we apply this code to a mesh where we have coordinate points for each triangle, such that we remove the internal edges, or edges shared by two triangles?
thanks in advance.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (2)

Azzi Abdelmalek
Azzi Abdelmalek am 4 Mai 2016
Bearbeitet: Azzi Abdelmalek am 4 Mai 2016

1 Stimme

A=randi(5,10^5,3);
tic
A=unique(A,'rows');
toc
The result
Elapsed time is 0.171778 seconds.

3 Kommentare

Michael Siebold
Michael Siebold am 4 Mai 2016
Not at all what I asked about.
Azzi Abdelmalek
Azzi Abdelmalek am 4 Mai 2016
Bearbeitet: Azzi Abdelmalek am 4 Mai 2016
You said that unique function will leave a copy of duplicate rows. With this example, I show you that there is no duplicates rows stored! And also it doesn't take much time
I reckon your answer does not address OP's question because running the following:
A=[1 1 1;1 1 1;1 1 0];
tic
A=unique(A,'rows');
toc
Will yield:
A = 1 1 0
1 1 1
Therefore, A still contains one instance of each row that was duplicate. I believe Michael wanted all instances of each row that appears multiple times be removed.

Melden Sie sich an, um zu kommentieren.

GeeTwo
GeeTwo am 16 Aug. 2022

0 Stimmen

%Here's a much cleaner way to do it with 2019a or later!
[B,BG]=groupcounts(A);
A_reduced=BG(B==1); % or just A if you want the results in the same variable.

Kategorien

Mehr zu MATLAB finden Sie in Hilfe-Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by