Find duplicate entries and sum up their associated values then put everything back in a table
9 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hello everyone,
I have an excel sheet with two columns. The first column is a list of names, and the second column is the score of each person. Some names have multiple score and I want to find duplicate entries and sum up their associated values then put everything back in a table. Basically clean up the duplicate but want the sum of the scores. The excel file is attached as well
I would really appreciate if you have a script or a function can achieve the results.
Here where I’m at
Data = readtable("Data.xlsx")
unique(Data.Names)
[C,ia,idx] = unique(Data(:,1),'stable');
val = accumarray(idx,Data(:,2),[],@sum);
mat = [C val]
0 Kommentare
Akzeptierte Antwort
dpb
am 8 Jul. 2021
tData = readtable("Data.xlsx");
tData.Properties.VariableNames={'Name','Score'}; % set a known set of variable names
tSum=rowfun(@sum,tData,'InputVariables','Score','GrouptingVariables','Name');
writetable(tSum,'PickAnOutputFileName.xls')
Weitere Antworten (1)
Cris LaPierre
am 8 Jul. 2021
It gets a bit trickly since you don't want to change the order of the names, but I think you could do this with groupsummary.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
Data.Order = (1:height(Data))'
% Use groupsummary to find sum and min of scores and order for each unique name
sumTbl = groupsummary(Data,"Names",["sum","min"],["Scores" "Order"])
% sort the results, which are alphabetical, so they are back in the original order
sumTbl = sortrows(sumTbl,"min_Order");
% create a new table with just Names and total scores
finalData = sumTbl(:,["Names","sum_Scores"]);
% rename 'sum_Scores' back to 'Scores'
finalData.Properties.VariableNames(2) = "Scores"
3 Kommentare
dpb
am 8 Jul. 2021
I didn't think of groupsummary, Chris. Good thinking. Was keeping the order an input requirement? I missed it if it was. I've done same as you before several times; have one humongous spreadsheet that has to get updated piecemeal -- it has a (hidden) column that contains =ROW() precisely for the purpose.
Cris LaPierre
am 8 Jul. 2021
I had interpreted it as having to be in the same order, but on a re-read, that is not stated. If not, it gets much simpler.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
% Use groupsummary to find sum scores for each unique name
finalData = groupsummary(Data,"Names","sum")
This keeps the GroupCount information, but that's not bad info to have either.
Siehe auch
Kategorien
Mehr zu Logical finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!