Fast access/concatenation of large array structure
Ältere Kommentare anzeigen
I have a large structure array (500K+ items), and I wish to access certain fields of the that array and concatenate the results. Below is a placeholder example.
A{1}.time = 1500
A{1}.data.temp = 70;
A{1}.data.humidity = 20;
A{2}.time = 1501
A{2}.data.temp = 73;
A{2}.data.humidity = 19;
etc. Till we have 500,000 of these. (I have made it a cell array since the actual entries differ in my data, and I have other code that will go through and just grab the cells we want.)
Now, I want to access e.g. all of the 'data' and concatenate it so that I have a simple vector I can plot. Currently this is done using a loop, but that is very slow. Is there a faster way to do this than some version of the below:
fieldNames = fields(A{1}.data);
for ii = 1:length(fieldNames)
out.(fieldNames{ii}) = ...
cat(1,cellfun(@(x) getField(x,'data',fieldNames{ii}), A));
end
where
function out = getField(in, fieldname1,fieldname2)
out = in.(fieldname1).(fieldname2);
end
Again, this certainly works but for extremely large datasets with lots of fields it becomes very very slow. I bet that there is a much more efficient way of gathering all of the data contained in the fields and subfields of a large data set like above. Any help is appreciated.
Thanks, -Dan
An additional discovery: Matlab is somehow storing the field names for each sub-structure individually. In the above example, it has memory allocated for the fieldnames data.temp and data.humidity TWICE (once for each copy). This is why it is so slow. A 50 Mbyte set of data has grown to 3 GB because of this organization scheme. I am going to make a separate post about this (is the memory issue resolved if each entry is a known class? That way the field names aren't stored once for each copy?).
5 Kommentare
Walter Roberson
am 13 Apr. 2017
Your use of fieldnames as a variable is confusing as that is the MATLAB function normally used to extract field names.
D. Plotnick
am 13 Apr. 2017
Bearbeitet: D. Plotnick
am 13 Apr. 2017
Stephen23
am 14 Apr. 2017
@Daniel Plotnick: storing your data in a cell arry of structure is less efficient than storing your data in a simple structure array (aka a non-scalar structure). If yous stored the data in a structure array then accessing that data would be much simpler, as you can use constructs like:
[S.f1]
{S(idx).f2}
and other neat and simple ways to access the data.
But then you should also keep in mind that in general you should keep data together as much as possible, not split in into lots of separate scalar values on their own. When data is stored in simple numeric vectors/matrices/arrays, then working with that data is faster and easier. Then you can unleash the real power of MATLAB!
After all, the name MATLAB come from MATrix LABoratory, and not from "lets split the data up into millions of scalar values".
D. Plotnick
am 14 Apr. 2017
Walter Roberson
am 14 Apr. 2017
memmapfile is convenient but not mandatory: you can use a bunch of fread() instead.
Akzeptierte Antwort
Weitere Antworten (0)
Kategorien
Mehr zu Data Type Conversion finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!