delete rows depend on repeated value from 1 col

Hi I have a 20000x60 cell matrix I want to delete rows depend on repeated values from the first columns.
s = {
1 '2013-10-01' '05:35:00' '01:05:00'
2 '2013-10-01' '10:20:00' '00:00:00'
2 '2013-10-01' '10:20:00' '00:00:00'
3 '2013-10-01' '11:00:00' '00:40:00'
3 '2013-10-01' '11:00:00' '00:40:00'
3 '2013-10-01' '11:00:00' '00:40:00'
3 '2013-10-01' '11:00:00' '00:40:00'
5 '2013-10-01' '14:50:00' '00:50:00'
5 '2013-10-01' '14:50:00' '00:50:00'
6 '2013-10-01' '15:15:00' '01:15:00'
7 '2013-10-01' '15:55:00' '00:05:00'
7 '2013-10-01' '15:55:00' '00:05:00'
7 '2013-10-01' '15:55:00' '00:05:00'
7 '2013-10-01' '15:55:00' '00:05:00'}
the result should like:
s1 =
1 '2013-10-01' '05:35:00' '01:05:00'
2 '2013-10-01' '10:20:00' '00:00:00'
3 '2013-10-01' '11:00:00' '00:40:00'
5 '2013-10-01' '14:50:00' '00:50:00'
6 '2013-10-01' '15:15:00' '01:15:00'
7 '2013-10-01' '15:55:00' '00:05:00'
8 '2013-10-01' '19:00:00' '00:05:00'
9 '2013-10-01' '19:10:00' '00:10:00'
I use unique for it but because the different type of data there is Error using cell/unique (line 85) Input A must be a cell array of strings."

 Akzeptierte Antwort

Star Strider
Star Strider am 16 Feb. 2016

2 Stimmen

If the number in the first column is the same for all repeated values in the rest of the row, just use it.
Using your example data:
s = { 1 '2013-10-01' '05:35:00' '01:05:00'
2 '2013-10-01' '10:20:00' '00:00:00'
2 '2013-10-01' '10:20:00' '00:00:00'
3 '2013-10-01' '11:00:00' '00:40:00'
3 '2013-10-01' '11:00:00' '00:40:00'
3 '2013-10-01' '11:00:00' '00:40:00'
3 '2013-10-01' '11:00:00' '00:40:00'
5 '2013-10-01' '14:50:00' '00:50:00'
5 '2013-10-01' '14:50:00' '00:50:00'
6 '2013-10-01' '15:15:00' '01:15:00'
7 '2013-10-01' '15:55:00' '00:05:00'
7 '2013-10-01' '15:55:00' '00:05:00'
7 '2013-10-01' '15:55:00' '00:05:00'
7 '2013-10-01' '15:55:00' '00:05:00'};
sc1 = cellfun(@(x)num2str(x, '%.0f'), s(:,1));
[s1u, ia] = unique(sc1);
s1 = s(ia,:)
s1 =
[1.0000e+000] '2013-10-01' '05:35:00' '01:05:00'
[2.0000e+000] '2013-10-01' '10:20:00' '00:00:00'
[3.0000e+000] '2013-10-01' '11:00:00' '00:40:00'
[5.0000e+000] '2013-10-01' '14:50:00' '00:50:00'
[6.0000e+000] '2013-10-01' '15:15:00' '01:15:00'
[7.0000e+000] '2013-10-01' '15:55:00' '00:05:00'
You may have to modify this slightly if your actual cell array is different, but it works with the array you posted, and as I interpreted it.

8 Kommentare

bero
bero am 17 Feb. 2016
As I see it works for the sample data you test,but in my sample data from the attach file there is a error,could you please check the error
The only change necessary is to set ( 'UniformOutput',false ) in the ‘sc1’ assignment and it will work with your data:
sc1 = cellfun(@(x)num2str(x, '%.0f'), s(:,1), 'Uni',0);
Here, ( 'Uni',0 ) does the same thing.
bero
bero am 17 Feb. 2016
Bearbeitet: bero am 17 Feb. 2016
Ok, I need to sort the results based on first col 1-2-3... can i use : sortrows(s1,1) is it sort all results based on the first columns?? thx
With that change, the full code is now:
sc1 = cellfun(@(x)num2str(x, '%.0f'), s(:,1), 'UniformOutput',false);
[s1u, ia] = unique(sc1, 'stable');
s1 = s(ia,:);
That should do what you want.
bero
bero am 17 Feb. 2016
Great...Done could you please explain (%.0f)?? short comment about the code thanks
Stephen23
Stephen23 am 17 Feb. 2016
@bero: MATLAB's great documentation tells us exactly what '%.0f' does:
The '%.0f' is one of a number of format descriptors (see the link Stephen provided for a full discussion of all of them), this one telling MATLAB to produce a string representing a floating-point number with no digits to the right of the decimal. So using that format descriptor, pi=3.1415... would print as 3 with no trailing decimal point. When you read the documentation, I leave it for you to determine the reason I chose this format rather than, for example, '%d'.
bero commented "good"

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

MHN
MHN am 16 Feb. 2016
Bearbeitet: MHN am 16 Feb. 2016

0 Stimmen

Consider this example (it is not the fastest and easiest way, but it solves your problem)
s = {3 '2013-10-01' '11:00:00' '00:40:00'; 3 '2013-10-01' '11:00:00' '00:40:00'; 3 '2013-10-01' '11:00:00' '00:40:00'; 5 '2013-10-01' '14:50:00' '00:50:00'; 5 '2013-10-01' '14:50:00' '00:50:00'; 6 '2013-10-01' '15:15:00' '01:15:00'; 7 '2013-10-01' '15:55:00' '00:05:00'};
col1 = cell2mat(s(:,1));
uniqcol1 = union(col1,col1);
% change all the repeated rows to '0' (or some number which is unused in the first column) except the first one and then remove them.
for i=1:length(uniqcol1)
for j = 1:size(s,1)
if uniqcol1(i)==s{j,1}
for k = j+1: size(s,1)
if uniqcol1(i)==s{k,1}
s{k,1} = 0;
end
end
end
end
end
col1 = cell2mat(s(:,1));
s(col1==0,:)=[];

1 Kommentar

bero
bero am 17 Feb. 2016
It is so slowly for my my big matrix, I need more efficient and faster process...

Melden Sie sich an, um zu kommentieren.

Kategorien

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by