Ways to get around using 7.3 version of mat files

29 views (last 30 days)
I am building a set of tools that will regularly deal with datasets larger than 2 GB and I want users to be able to save their current state including all loaded data to make getting back into the tool as easy as can be. All of the users data and setup is contained in a class object. I initially got around this by using the 7.3 version of the mat file, but the file size and additional time required for these really hurts the efficiency. I was trying to circumvent this, but I am running into some issues. Below are my attempts and the issues I've run into:
  1. Since the 2GB is a per variable limit, I was planning to split the data into multiple variables inside my saveobj method, but as far as I can tell saveobj must return a single variable that will be saved in the mat file. This make sense, and I can see why the question I am about to ask is nonsense, but I'll ask anyway. Can saveobj return multiple variables that saved separately in the mat file?
  2. Since I am guessing the answer to #1 is no, I was thinking I could create a separate function within my class that could save the data into multiple mat files. I could then call this separate function with saveobj to split the data and save it. I would then need to know what those extra files were named in the loadobj function so that I could restore the data. My first thought here was to use the mat file name that was used in the call to saveobj and just append something to it. Is there a way to get the mat file name being saved to/loaded from within the saveobj and loadobj functions?
If you have other ideas, I am open to those as well.
Here is a simple code example:
classdef LargeData
properties
Data
end
methods
function filenames = saveDataToDisk(obj,filename)
% Split into 2 sections for demonstrative purposes
numSplits = 2;
numRows = size(obj.Data,1);
numRowsPerSplit = ceil(numRows/numSplits);
filenames = cell(numSplits,1);
for iS = 1:numSplits
startIdx = 1+(iS-1)*numRowsPerSplit;
endIdx = min(numRows,startIdx + numRowsPerSplit - 1);
if startIdx < endIdx
filenames{iS} = sprintf('%s_%1',filename,iS);
thisSplitData = obj.Data(startIdx:endIdx,:);
save(filenames{iS},'thisSplitData','-mat');
end
end
end % saveDataToDisk
function sobj = saveobj(obj)
% Assume the data is large, real implementation has protection for
% small data to avoid complexity
% Save the data to separate files
% HOW CAN I DETERMINE THE MAT FILE BEING SAVED TO SO THAT I CAN
% KEEP THE DATA WITH THE MAT FILE THAT THE USER SAVED TO?
sobj.dataFiles = obj.saveDataToDisk(tempname);
% Remove the data so that a v7 mat file can be used
sobj.origObj.Data = [];
end % saveobj
end
methods (Static)
function obj = loadobj(s)
if isstruct(s)
obj = LargeData;
if isfield(s,'dataFiles')
for iF = 1:numel(s.dataFiles)
newData = load(s.dataFiles{iF},'-mat','thisSplitData');
if iF == 1
data = newData.thisSplitData;
else
data = [data;newData.thisSplitData];
end
end
obj.Data = data;
end
else
obj = s;
end % isstruct
end % loadobj
end % Static methods
end
  5 Comments
Matt Butts
Matt Butts on 29 Apr 2022
The speed is the biggest issue, but all of my trials have suggested that I can get a speed and file size benefit from the v7 format. Using my actual implementation, here are the results I have gotten:
  • Using v7.3 and no compression: 571 s to save 1 file totaling 4.47 GB
  • Using v7.3 and compression: 573 s to save 1 file totaling 2.59 GB
  • Using v7 and compression: 44s to save 7 files totaling 102 MB
This really seems to suggest that the v7 mat file is the right solution. I just need to figure out how to keep the extra files stored with the filename that was input to the save(...) command.

Sign in to comment.

Answers (0)

Categories

Find more on Variables in Help Center and File Exchange

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by