How can I find matching rows from three out of four colomns

I have a matrix with 4 columns, x y z and D, and a lot of rows. I want the program to find the rows where x, y and z are the same (so the first three colomns match) and add the corresponding D value of these matching rows. When the matching rows are found only one stays in the matrix with the new D value. To make it a little more complicated I like to attach a tolerance for what the matching values can diverge.
For example:
x y z D
1 2 3 4
2 4 6 8
1 2 3 5
New one:
x y z D
1 2 3 9
2 4 6 8

 Akzeptierte Antwort

Henric Rydén
Henric Rydén am 15 Mai 2014
Bearbeitet: Henric Rydén am 15 Mai 2014
This solution might not be as simple and elegant as the one Andrei Bobrov provided, but it allows for a tolerance level.
% Example data
A=[ 1 2 3 ; ...
2 4 6 ; ...
1.05 2 3 ; ...
1 52 3 ; ...
2 4 6 ; ...
1.05 2 3 ; ...
5 2 3 ; ...
1 52 3 ; ...
5 2 3 ; ...
];
D = [4; 8; 5; 1; 2; 3; 5; 7; 8];
% Vector to track which have been processed and how they are grouped
processed = zeros(size(A,1),1);
% Tolerance level
tol = .1;
row = 1;
while ~all(processed)
% Only check rows that havent been matched yet
if ~processed(row)
% Take out the row you want to compare to the others
curRow = A(row,:);
% Subtract that row from all rows in A
subtractedA = A - repmat(curRow,size(A-1,1), 1);
% Check to see if they are within the tolerance level
matchingRows = all((subtractedA <= tol & subtractedA >= -tol),2);
% Mark the rows as processed
processed(matchingRows) = row;
% Update A to the mean of all matches
A(matchingRows,:) = repmat(mean(A(matchingRows,:),1),sum(matchingRows),1);
% Sum the D values that matches
D(matchingRows) = sum(D(matchingRows));
end
% Check the next row
row = row + 1;
end
% Results
[A(unique(processed),:) D(unique(processed))]

4 Kommentare

Thanks! I have a little problem with the code and that is when I change all values last column of A to 3 so:
A=[ 1 2 3 ; ... 2 4 3; ... 1.05 2 3 ; ... 1 52 3 ; ... 2 4 3 ; ... 1.05 2 3 ; ... 5 2 3 ; ... 1 52 3 ; ... 5 2 3 ; ... ];
and I change the tolerance for example to 0.01 there is a change in the values in the last column altough they don't change so the average should be the same, 3.
Henric Rydén
Henric Rydén am 15 Mai 2014
Bearbeitet: Henric Rydén am 15 Mai 2014
Ah! That's because the mean was taken along columns instead of rows when there was no match. This is fixed by explicitly stating that the mean should be along rows. I've updated the code
It works thank you!!
You're welcome

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Andrei Bobrov
Andrei Bobrov am 15 Mai 2014
Bearbeitet: Andrei Bobrov am 17 Mai 2014
d = [1 2 3 4
2 4 6 8
1 2 3 5];
[a,b,c] = unique(d(:,1:end-1),'rows');
out = [a, accumarray(c,d(:,end))];
add
d0 =[[ 1.1 2 3.2
2 4 6
1.2 2 3.3
2.09 3.96 6.05
1.05 1.99 3.25],randi(25,5,1)];
eps1 = .1;
d = d0(:,1:end-1);
l = bsxfun(@(x,y)abs(x-y)<=eps1,permute(d,[1 3 2]),permute(d,[3 1 2]));
[ii,jj] = find(triu(all(l,3),1));
a = num2cell(unique([[ii;ii],[ii;jj]],'rows'),1);
out = [cell2mat(accumarray(a{:},[],@(x){mean(d0(x,1:end-1))})),...
accumarray(a{1},d0(a{2},end))];

2 Kommentare

This works.
But what if the values diverge a little so I can add a tolerance and take the average of the row:
d = [1.1 2 3.2 4; 2 4 6 8; 1.2 2 3.3 5]
and the tolerance is 0.1 so the outcome will be:
dout = [1.15 2 3.25 9; 2 4 6 8]
see after add

Melden Sie sich an, um zu kommentieren.

Kategorien

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by