Filter löschen
Filter löschen

Find and reduce a numeric array with identical columns

2 Ansichten (letzte 30 Tage)
John Smith
John Smith am 30 Dez. 2018
Bearbeitet: John Smith am 3 Jan. 2019
Dear Sir/Madam,
I would like to ask you the following question:
I have a data file like this
tmp = [...
121 12 6914 0.5625
122 -48 6853 0.29688
119 48 6914 0.17188
125 -12 6853 0.078125
125 4 6853 0.4375
119 5 6832 0.20313
119 4 6832 0.039063
119 -4 6832 0.023438]
I would like re-group (or reduce) it with following conditions:
For any row, if column 1 AND column 3 of this row is identical with any column 1 AND column 3 of any other row. Then reduce to one new row with new value of column 2, this new value of column 2 is the sum of original values of column 2. Column 1 is kept the same, Column 4 is not important.
So, for above data, I expect to have the answer:
119 5 6832 0.20313 % 5+4-4=5
122 -48 6853 0.29688
125 -8 6853 0.4375 % -12+4=-8
121 12 6914 0.5625
119 48 6914 0.17188
What Matlab command to use? I would greatly appreciate it if you left your code and running output.
I am using MATLAB R2014a.
Thank you very much
  3 Kommentare
Image Analyst
Image Analyst am 30 Dez. 2018
I was wondering the same thing. Hopefully the order doesn't matter. I'm sure you could write the code afterwards in such a ways that it didn't matter.
John Smith
John Smith am 30 Dez. 2018
Bearbeitet: John Smith am 30 Dez. 2018
In tmp (data file ), the order of the rows were randomly inputed by hand, no order at all.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Stephen23
Stephen23 am 30 Dez. 2018
>> [~,X,Y] = unique(tmp(:,[1,3]),'rows');
>> out = tmp(X,:);
>> out(:,2) = accumarray(Y,tmp(:,2),[],@sum)
out =
119.000000 5.000000 6832.000000 0.023438
119.000000 48.000000 6914.000000 0.171880
121.000000 12.000000 6914.000000 0.562500
122.000000 -48.000000 6853.000000 0.296880
125.000000 -8.000000 6853.000000 0.437500
  7 Kommentare
Stephen23
Stephen23 am 1 Jan. 2019
Bearbeitet: Stephen23 am 1 Jan. 2019
Replace the line with accumarray with these three lines:
S = max([X1,X3]);
C = cell(S);
C(sub2ind(S,X1,X3)) = baz;
and a Happy New Year!
John Smith
John Smith am 2 Jan. 2019
Bearbeitet: John Smith am 3 Jan. 2019
Dear Stephen,
By changing those three lines, you code works.
Back to the question in the very beginning, I said " Column 4 is not important". but now, I need to treat column 4 the same way as column 2:
tmp =
121 12 6914 0.5625
122 -48 6853 0.29688
119 48 6914 0.17188
125 -12 6853 0.078125
125 4 6853 0.4375
119 5 6832 0.20313
119 4 6832 0.039063
119 -4 6832 0.023438
out =
119 5 6832 0.265631 (%=0.20313+0.039063+0.023438)
119 48 6914 0.17188
121 12 6914 0.5625
122 -48 6853 0.29688
125 -8 6853 0.515625 (%=0.078125+0.4375)
how do you change your three line code:
[~,X,Y] = unique(tmp(:,[1,3]),'rows');
out = tmp(X,:);
out(:,2) = accumarray(Y,tmp(:,2),[],@sum)
I tried to modify the following way, it did not work:
out(:,2) = accumarray(Y,tmp(:,2), tmp(:,4), [],@sum)
However, when I use two lines (two time) then it worked:
out(:,2) = accumarray(Y,tmp(:,2),[],@sum)
out(:,4) = accumarray(Y,tmp(:,4),[],@sum)
Thank you very much.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Image Analyst
Image Analyst am 30 Dez. 2018
Bearbeitet: Image Analyst am 30 Dez. 2018
What about using grpstats(), if you have the Statistics and Machine Learning Toolbox.
tmp = [...
121 12 6914 0.5625
122 -48 6853 0.29688
119 48 6914 0.17188
125 -12 6853 0.078125
125 4 6853 0.4375
119 5 6832 0.20313
119 4 6832 0.039063
119 -4 6832 0.023438]
col5 = 10000*tmp(:, 1) + tmp(:, 3)
tmp = [tmp, col5];
% No sum in grpstats, so have to do it twice.
% Once to get the mean and once to get the count.
outputMean = grpstats(tmp, tmp(:, 5), 'mean')
outputNumel = grpstats(tmp, tmp(:, 5), 'numel')
% Crop off temporary 5th column
output = outputMean(:, 1:4) % Initialize
% Column 2 is the sum = mean * count
output(:, 2) = outputMean(:, 2) .* outputNumel(:, 2)
The output seems to be sorted by the first column though:
output =
119 5 6832 0.088544
119 48 6914 0.17188
121 12 6914 0.5625
122 -48 6853 0.29688
125 -8 6853 0.25781
That might be a problem for you. I'm not sure. Of course column 4 can be cropped off or ignored since you say it's not important.

Produkte


Version

R2014a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by