Efficient Way To Split Dataset Into Subsets
Ältere Kommentare anzeigen
Hello,
I need to split a large dataset (DxN numeric array) into multiple subsets. I can use the code below (where groupIDs is an Nx1 matrix of integer IDs - the group to which each datapoint belongs).
groups = unique(groupIDs);
for i = 1:numel(groups)
tempData = data(:,groupIDs==groups(i));
%do work on tempData
end
However, 90% of the run time of the above code is spent just creating tempData! That amounts to over a minute every time I want to do this. Is there a more efficient way to split data by groupIDs? I tried splitapply() but it doesn't seem to be any faster.
Are there any matlab gurus out there that know a trick? Thanks!
5 Kommentare
Jos (10584)
am 24 Nov. 2017
how large is "large"?
E
am 24 Nov. 2017
Use the second (or third? - I always have to guess and check between the two) output of unique(groupIDs).
Edit: This likely isn't faster, you still need a comparison check inside the loop. I always forget that part about the third output of unique.
Jos (10584)
am 24 Nov. 2017
12Gb? That is quite a lot. If this doesn't fit in memory, swapping to disk is the likely bottleneck ...
E
am 26 Nov. 2017
Antworten (0)
Kategorien
Mehr zu Structures finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!