Big Problem/Bug with new matfile command for partial mat file read/writes - creates massivly bloated files.

1 Ansicht (letzte 30 Tage)
Please look at this minimal example:
%create a 1mb "incompressible" array
one_meg = uint8(rand(1,1000,1000)*256);
%choose a file, clear it and open it with write access
testfile = 'D:\Data\PGRtest\testfile.mat';
system(['del "' testfile '"'] );
matObj = matfile(testfile,'Writable',true);
%keep a copy of what we write to the file in memory for verification
memcpy = zeros(50,1000,1000,'uint8');
%write the array 50 times to this file
for i = 1:50
tic
%store in file and memory in same format - pages of 1000x1000
matObj.RawDat(i,1:1000,1:1000) = one_meg;
memcpy(i,1:1000,1:1000) = one_meg;
tm = toc;
%time increases from 45ms to 250ms at last iteration
fprintf('Iteration %i, time taken: %ims\n',i,tm*1000);
end
%check file size - should be 50mb or smaller from compression
%the file size is 1200mb....?
s = dir(testfile);
fprintf('file size: %i mb\n', s.bytes/1024/1024);
%load the mat file
load(testfile)
%the data inside is 50mb as expected no where near 1200mb
whos('RawDat')
%verify
%the read data is equal to the memory copy.. where did all that extra space go?
sum(abs(memcpy(:)-RawDat(:)))
This is using Windows 7 64bit, Matlab 2011b 64bit.
The problem is mostly described in the comments - essentially why does 50mb of data create a 1200mb mat file when created using the matfile system object?
I have tried storing the data with 2 dimensions instead of 3 I have tried using doubles not uint8. I have tried changing the default .mat file format from 7.3 although this is the only version that supports it.
I cant understand why it takes longer and longer - it is as if each write to the file rewrites all the existing data a second time so the first write is 1mb then 2mb then 3mb etc instead of 1mb each time.
I expect 'testfile' to be a <50mb mat file containing a 50x1000x1000 array. What I see is a 1.2GB file containing that array - clearly incorrect.
If the array is saved directly from workspace using 'save' the mat file is 2mb containing the same data.
Looks like this is a bug.
Any ideas? Do you get the same results? Thanks, Tom.
  3 Kommentare
Thomas Osgood
Thomas Osgood am 28 Nov. 2011
Thanks, I already submitted it as a ticket to mathworks with ID 1-FWBIMT if it helps you.
Tom.
Jiri Hajek
Jiri Hajek am 20 Apr. 2021
Ten years later and the same problem is still around... My data saved into a v7 mat file are around 1MB, as compared to almost 100MB in a v7.3 file. Loading and saving times are unfortunately proportionally longer as well. Note however that the data contained in the file take up only around 20MB so there is roughly a 5-fold increase in size by saving to a v7.3 file. Also note that I dont use the -nocompression flag when saving the file.
Any ideas or suggestions would be highly appreciated...

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Philip Borghesani
Philip Borghesani am 28 Nov. 2011
For the same reasons that growing an array in memory is a bad idea growing an array in a matfile is not a good programming practice. Your file has been horribly fragmented because of the matrix growth. The full 3d matrix must occupy one linear segment of the file.
If you preallocate the file variable by adding the line:
matObj.RawDat=memcpy; %preallocate
after creating the memcpy variable then your file size will be reasonable.
If your code is a model of what you want to do I suggest storing your chunks of data in cells of a RawData cell array inside your file.
You are also indexing into your array inefficiently but that does not seem to be causing any performance issues. For MATLAB it would be best if RawData was (1000,1000,50) in size.
  4 Kommentare
Philip Borghesani
Philip Borghesani am 28 Nov. 2011
Did you see my suggestion to use a cell array? A cell array should not need to be preallocated and each cell is stored separately in the file so growth will not be an issue.
Thomas Osgood
Thomas Osgood am 28 Nov. 2011
Yes sorry, I will give that a try and report how it goes either way it looks like you have come up with a solution!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Walter Roberson
Walter Roberson am 27 Nov. 2011
Keep in mind that save defaults to -v7, which has compression, but matfile uses -v7.3 which is HDF5 files which appear not to be compressed the way MATLAB uses them (though it could be that that has changed since -v7.3 files were first introduced.)
  3 Kommentare

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Workspace Variables and MAT-Files finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by