matfile and half inefficient storage
Ältere Kommentare anzeigen
Dear MATLAB users,
I have encountered the following inefficient storage problem:
delete('myfile.mat')
handle = matfile('myfile.mat')
handle.X = half(X); % X is big
handle.Y = half(Y); % Y is big
handle.a = a;
handle.b = b;
%%% the size of myfile.mat is 2.4Gb %%%
data = load('myfile');
save('mynewfile1.mat', '-v7.3', '-struct', 'data')
%%% the size of mynewfile1.mat is 1.2Gb %%%
data = load('myfile');
save('mynewfile2.mat', '-struct', 'data')
%%% the size of mynewfile2.mat is 1.2Gb %%%
What could be causing this doubling of storage and how can I avoid it without loading and resaving the file.
Update: the problem does not seem to be caused by the -v7.3 flag. I updated the code above to show this.
Thank you for your help.
30 Kommentare
Image Analyst
am 24 Jul. 2021
Why are you saving it in 7.3 (old) format?
Mika
am 24 Jul. 2021
dpb
am 24 Jul. 2021
-v7,3 is lastest version; -v7 is the default https://www.mathworks.com/help/matlab/ref/save.html set by TMW in the preferences, apparently for compatibility.
There's a note in the doc under the 'version' named parameter that says--
"Version 7.3 MAT-files use an HDF5 based format that requires some overhead storage to describe the contents of the file. For cell arrays, structure arrays, or other containers that can store heterogeneous data types, Version 7.3 MAT-files are sometimes larger than Version 7 MAT-files."
The blowup is something I've noted in some other Q? over last few months -- there was another conversation just the other day it seems where a file was saved also at something like 2X the size w/ -v7.3 flag but the save command w/o the flag was half the size. Turned out it's in the preferences that the -v7 flag is set by default on initial install.
Seems as though this needs some attention from TMW -- the huge blow-up in size indicates something's not kosher/as intended in the implementation.
Quite possible; there's got to be overhead with the matifle object in order to be able to access pieces-parts.
Alternatively, what does half actually do? Does it create some object or what? I don't have any of the TBs that have it so not sure.
Just for checking, what is the settings in Preferences--General-MAT-files? Just so we know for sure what version is used with no explicit flag on the command line.
Mika
am 24 Jul. 2021
dpb
am 24 Jul. 2021
OK, that the default is -v7 and that both
save('mynewfile1.mat', '-v7.3', '-struct', 'data')
save('mynewfile2.mat', '-struct', 'data')
returned the same size file shows the different file size is not related to the version for whatever data actually is.
Now, what we (at least me, since I can't test) don't know yet is what half actually returns -- the doc above was unclear.
What does
x=half(X);
whos x X
return?
Mika
am 24 Jul. 2021
dpb
am 24 Jul. 2021
Well, then, it would seem it is the matfile overhead that's the killer -- if you just store X and x, nothing untoward happens, does it?
Mika
am 24 Jul. 2021
Walter Roberson
am 25 Jul. 2021
But I need matfile to save in a parfor loop.
I have not seen any guarantee that two different processes writing to the same matfile() will not interfere with each other.
The file structure designed for simultaneous access is memmapfile() .
Mika
am 25 Jul. 2021
Walter Roberson
am 25 Jul. 2021
Please explain more about why using parfor requires you to use matfile? As opposed to just saving (possibly using 7.3 if you have big objects)?
We've eliminated everything on the size conundrum excepting matfile with the exception that haven't seen the explict result of a save statment for the half object (that I can't test). We got so far as to show it didn't show extra memory used via whos but that doesn't prove save didn't need some extra info to go with it. One presumes not, but it hasn't been proven.
If performance is a Q? as I would presume it would be using parfor anyways, the matfile solution may seem "elegant" in minimizing source code, but I think it would still be a sizable time hit even without the the file size issue as compared to the suggested workaround.
Mika
am 26 Jul. 2021
I had presumed that would be the result, but since I couldn't/can't test, just for the record... :)
I agree, I think it's well worth bringing to their explicit attention (altho I would presume they're already aware of it) as it appears they may need to re-examine just what is causing such a huge blowup and rethink what they're doing going forward.
While they probably won't classify it as a bug since it seems to still work to provide the documented functionality, certainly from a performance and quality of implementation POV it deserves to be flagged.
Walter Roberson
am 26 Jul. 2021
Using a small auxillary function to do the save() is what is recommended.
dpb
am 26 Jul. 2021
That avoids it, but doesn't resolve that storage requirements blow up remarkably with matfile which seems to me at least to be a problem even if one can get around it in some instances by not using it. If never going to use it, isn't much point in having it in the language... :)
James Tursa
am 27 Jul. 2021
For the record, half data types are stored as opaque classdef objects. They are fundamentally different from the other native numeric types such as double and single. Whether this has anything to do with the behavior I don't know.
Walter Roberson
am 27 Jul. 2021
Good point, James. The representation of classdef objects can end up being quite different in HDF5 .
dpb
am 27 Jul. 2021
But the testing didn't show any difference w/ save of the raw type; only w/|matfile...
Eike Blechschmidt
am 28 Jul. 2021
You could do the following and see if there is a difference in how the files are stored as hdf5 files:
h5disp('myfile.mat');
h5disp('mynewfile1.mat');
Mika
am 29 Jul. 2021
Q490
am 16 Aug. 2021
As a side note, and not sure if this is directly related to an answer to your question, a function I've found very useful that can be a good substitute for using matfile is "savefast", written by Tim Holy (https://www.mathworks.com/matlabcentral/profile/authors/1337381) and which can be downloaded at:
For the file sizes you are talking about it saves it extremely quickly and in the smallest possible file size. I highly recommend it.
Mika
am 18 Aug. 2021
Pavithra Jayachandran
am 21 Aug. 2021
Thank you I will try
xingxingcui
am 24 Aug. 2021
Bearbeitet: xingxingcui
am 24 Aug. 2021
Similar questions here ,TMW should provide an effective solution.
S Priyadharshini
am 30 Aug. 2021
Myfile and mynewfile2.mat
Antworten (0)
Kategorien
Mehr zu Workspace Variables and MAT Files finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!