Removing rows duplicates based on some conditions
    5 Ansichten (letzte 30 Tage)
  
       Ältere Kommentare anzeigen
    
    pink flower
 am 24 Sep. 2020
  
    
    
    
    
    Kommentiert: pink flower
 am 30 Sep. 2020
            Hello. I already asked a very similar question here, but I didn't understand the commands well, so I decided to ask for help again. I now have a file with 7 columns. I need to exclude the lines where the values in the third column are the same, but based on the conditions: the date (column 1), the time (column2) and the index (column3) cannot be repeated together and you must keep only the line where the value of the seventh column is the largest. The matrix looks like this:
20140101 2100 1762 -5.983 -37.483 26.8 11000
20140101 2100 1762 -5.983 -37.483 36.8 15000
20140109 2040 1762 -5.983 -37.483 25.0 3000
20140109 2040 1762 -5.983 -37.483 28.0 7000
20140130 2250 1762 -5.983 -37.483 22.0 3000
20140109 2040 1763 -5.992 -37.483 25.0 4000
20140109 2040 1763 -5.992 -37.483 23.2 6000
20140114 1940 1763 -5.992 -37.483 34.0 6000
And I want this result:
20140101 2100 1762 -5.983 -37.483 36.8 15000
20140109 2040 1762 -5.983 -37.483 28.0 7000
20140130 2250 1762 -5.983 -37.483 22.0 3000
20140109 2040 1763 -5.992 -37.483 23.2 6000
20140114 1940 1763 -5.992 -37.483 34.0 6000
In other words, the conditions are: if you have more than one row with the same date (for example, 2040101) in column 1, the same time (for example, 2100) in column 2 and the same index (for example, 1762 ) in the third column, keep only the row where the value in the last column is greater. In the case of the example I gave, just keep:
20140101 2100 1762 -5.983 -37.483 36.8 15000
20140109 2040 1762 -5.983 -37.483 28.0 7000
20140130 2250 1762 -5.983 -37.483 22.0 3000
20140109 2040 1763 -5.992 -37.483 23.2 6000
20140114 1940 1763 -5.992 -37.483 34.0 6000
Thank you so much for your help!
0 Kommentare
Akzeptierte Antwort
  Adam Danz
    
      
 am 25 Sep. 2020
        
      Bearbeitet: Adam Danz
    
      
 am 25 Sep. 2020
  
      Method 1
Here's a simple solution that assumes matching rows in columns 1:3 are next to each other.  The assert command tests this assumption and will throw an error if the assumption is not met. 
data = [20140101 2100 1762 -5.983 -37.483 26.8 11000
    20140101 2100 1762 -5.983 -37.483 36.8 15000
    20140109 2040 1762 -5.983 -37.483 25.0 3000
    20140109 2040 1762 -5.983 -37.483 28.0 7000
    20140130 2250 1762 -5.983 -37.483 22.0 3000
    20140109 2040 1763 -5.992 -37.483 25.0 4000
    20140109 2040 1763 -5.992 -37.483 23.2 6000
    20140114 1940 1763 -5.992 -37.483 34.0 6000];
[~,~,groupID] = unique(data(:,1:3),'rows','stable');
assert(all(diff(groupID)>=0),'Assumption violation: Matching rows in cols 1:3 are not juxtaposed.')
[~,groupMax] = splitapply(@max,data(:,7),groupID);
m = data(cumsum(groupMax),:)
Result
m =
    2.014e+07         2100         1762       -5.983      -37.483         36.8        15000
    2.014e+07         2040         1762       -5.983      -37.483           28         7000
    2.014e+07         2250         1762       -5.983      -37.483           22         3000
    2.014e+07         2040         1763       -5.992      -37.483         23.2         6000
    2.014e+07         1940         1763       -5.992      -37.483           34         6000
Method 2
This method does not make any assumptions about row order. 
[~,~,groupID] = unique(data(:,1:3),'rows','stable');
[~,groupMax] = splitapply(@max,data(:,7),groupID);
groups = unique(groupID);
finalRowNums = zeros(numel(groups),1); 
for i = 1:numel(groups)
    rowNums = find(groupID==i);
    finalRowNums(i) = rowNums(groupMax(i)); 
end
m = data(finalRowNums, :); 
7 Kommentare
  Adam Danz
    
      
 am 25 Sep. 2020
				
      Bearbeitet: Adam Danz
    
      
 am 28 Sep. 2020
  
			It works perfectly fine for me on the sample you provided.   Run the code below.  If it results in an error, tell us what Matlab release you're using and the full error message.  If Matlab crahses, something unrelated is wrong on your system. 
If you attach the mat file so I can simply load the matrix into Matlab, I can test the full dataset.
data =  [20140101 0 69760 -5.965 -36.250 26.0 2000
20140101 0 69761 -5.974 -36.250 23.5 2000
20140101 0 73180 -5.247 -36.187 23.5 2000
20140101 0 73678 -5.229 -36.178 26.5 2000
20140101 0 74178 -5.229 -36.169 26.5 2000
20140101 0 128828 -6.576 -35.181 22.6 2000
20140101 0 138373 -6.980 -35.009 20.8 2000
20140101 0 139404 -7.259 -34.991 22.0 2000
20140101 0 139904 -7.259 -34.982 23.0 2000
20140101 0 140375 -6.998 -34.973 22.2 2000
20140101 0 140404 -7.259 -34.973 24.5 2000
20140101 0 140903 -7.250 -34.964 22.2 2000
20140101 0 140904 -7.259 -34.964 27.2 2000
20140101 0 141403 -7.250 -34.955 29.2 2000
20140101 0 141404 -7.259 -34.955 28.5 2000
20140101 0 141874 -6.989 -34.946 23.4 2000
20140101 0 141902 -7.241 -34.946 20.8 2000
20140101 0 141903 -7.250 -34.946 32.0 2000
20140101 0 141904 -7.259 -34.946 27.0 2000
20140101 0 142403 -7.250 -34.937 22.5 2000
20140101 0 142404 -7.259 -34.937 23.9 2000
20140101 0 142903 -7.250 -34.928 23.2 2000
20140101 0 142904 -7.259 -34.928 24.5 2000
20140101 0 143403 -7.250 -34.919 24.5 2000
20140101 0 146874 -6.989 -34.855 21.5 2000
20140101 0 146934 -7.528 -34.855 21.5 2000
20140101 0 148907 -7.286 -34.819 21.2 2000
20140101 0 149406 -7.277 -34.810 26.0 2000]; 
[~,~,groupID] = unique(data(:,1:3),'rows','stable');
[~,groupMax] = splitapply(@max,data(:,7),groupID);
groups = unique(groupID);
finalRowNums = zeros(numel(groups),1); 
for i = 1:numel(groups)
    rowNums = find(groupID==i);
    finalRowNums(i) = rowNums(groupMax(i)); 
end
m = data(finalRowNums, :); 
Result
m =
    2.014e+07            0        69760       -5.965       -36.25           26         2000
    2.014e+07            0        69761       -5.974       -36.25         23.5         2000
    2.014e+07            0        73180       -5.247      -36.187         23.5         2000
    2.014e+07            0        73678       -5.229      -36.178         26.5         2000
    2.014e+07            0        74178       -5.229      -36.169         26.5         2000
    2.014e+07            0   1.2883e+05       -6.576      -35.181         22.6         2000
    2.014e+07            0   1.3837e+05        -6.98      -35.009         20.8         2000
    2.014e+07            0    1.394e+05       -7.259      -34.991           22         2000
    2.014e+07            0    1.399e+05       -7.259      -34.982           23         2000
    2.014e+07            0   1.4038e+05       -6.998      -34.973         22.2         2000
    2.014e+07            0    1.404e+05       -7.259      -34.973         24.5         2000
    2.014e+07            0    1.409e+05        -7.25      -34.964         22.2         2000
    2.014e+07            0    1.409e+05       -7.259      -34.964         27.2         2000
    2.014e+07            0    1.414e+05        -7.25      -34.955         29.2         2000
    2.014e+07            0    1.414e+05       -7.259      -34.955         28.5         2000
    2.014e+07            0   1.4187e+05       -6.989      -34.946         23.4         2000
    2.014e+07            0    1.419e+05       -7.241      -34.946         20.8         2000
    2.014e+07            0    1.419e+05        -7.25      -34.946           32         2000
    2.014e+07            0    1.419e+05       -7.259      -34.946           27         2000
    2.014e+07            0    1.424e+05        -7.25      -34.937         22.5         2000
    2.014e+07            0    1.424e+05       -7.259      -34.937         23.9         2000
    2.014e+07            0    1.429e+05        -7.25      -34.928         23.2         2000
    2.014e+07            0    1.429e+05       -7.259      -34.928         24.5         2000
    2.014e+07            0    1.434e+05        -7.25      -34.919         24.5         2000
    2.014e+07            0   1.4687e+05       -6.989      -34.855         21.5         2000
    2.014e+07            0   1.4693e+05       -7.528      -34.855         21.5         2000
    2.014e+07            0   1.4891e+05       -7.286      -34.819         21.2         2000
    2.014e+07            0   1.4941e+05       -7.277       -34.81           26         2000
Weitere Antworten (0)
Siehe auch
Kategorien
				Mehr zu Structures finden Sie in Help Center und File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

