How to use the matfile function to call and slice up a very large structure and use it in parfor without having broadcast variable warning?

2 Ansichten (letzte 30 Tage)
I have a very large structure with an array in one of the fields that is approximately 15,000,000 x 20. Currently, I am locating the structure via the matfile command in hopes to only load windows of data that I can apply a simple function to. I am recieving a broadcast variable warning for the structure even though I thought I wrote code in such a way that slices up the dataset. I have access to a remote cluster however, it is running extremely slow. I fear that the overhead of the broadcast variable is the cause for this. I will have to do this same routine for 50+ Files and can use anytime performance modifications that I can.
Is there a way possible to load only slices of array using the matfile command in such a way to speed up this process and avoid broadcasting this very large array? Any help would be appreciated. Sample code of exactly what I am doing is seen below:
%Locate the file without loading it into memory
m = matfile('filename.mat');
%Note, m is a large structure witht the following:
% m.A = {'Information'}
% m.B = {'Information'}
% m.C = [15,000,000 x 1]
% m.D = [15,000,000 x 20]
%Extract the size of the array
Info = whos(m,'C');
Length = Info.size(1);
%Using parallel processing, step through the data set with Window Size, Win
parfor j = Win:Length
Result(j,:) = SomeFunc(m.C(j-Win+1,:)); %call function along columns for window size, Win
end
Note, I am using 2017b

Akzeptierte Antwort

Edric Ellis
Edric Ellis am 3 Mai 2019
In this case, the warning about broadcasting the matfile object is probably safe to ignore. The point is that the matfile object itself is not large, it simply knows how to load the data on demand. You can easily prove this to yourself using ticBytes and tocBytes:
%% Prepare data
N = 100000;
C = rand(N,1);
fname = tempname();
save('-v7.3', fname, 'C');
clear C
%% Create matfile and pool
pool = gcp();
if isempty(pool)
pool = parpool('local', 4);
end
m = matfile(fname);
%% Use matfile, check bytes transmitted
Info = whos(m, 'C');
Length = Info.size(1);
t = ticBytes(pool);
parfor j = 1:Length
out(j) = m.C(j, 1).^2;
end
tocBytes(pool, t);
This gives the output:
BytesSentToWorkers BytesReceivedFromWorkers
__________________ ________________________
1 23408 2.163e+05
2 23408 2.163e+05
3 23408 2.163e+05
4 23408 2.163e+05
Total 93632 8.6522e+05
One thing I would note though is that reading single elements at a time from a matfile can be slow. It might be more efficient to load larger chunks of data, more like this:
%% Load in chunks
chunkSize = 1024;
numChunks = ceil(Length / chunkSize);
t = ticBytes(pool);
parfor j = 1:numChunks
firstIdx = 1 + ((j-1) * chunkSize);
lastIdx = min(Length, firstIdx + chunkSize - 1);
out2{j} = m.C(firstIdx:lastIdx, 1).^2;
end
tocBytes(pool, t);

Weitere Antworten (1)

Walter Roberson
Walter Roberson am 2 Mai 2019
You do not appear to be using the D array in your parfor, and your C array is less than 120 megabytes. Just copy all of m.C into a local variable in your client and then let it be sliced automatically.
If your code had a typo and really refers to M.D then that array is about 2 1/4 gigabytes. It might still be worth taking a local copy if it and letting it be sliced.
  1 Kommentar
Adam
Adam am 3 Mai 2019
Walter-
I apolgoize for the typo. Actually, the parfor function should read:
parfor j=Win:Length
Results(j,:) = SomeFunc(m.D(j-Win+1:j,:));
end
where each slice has "Win" number of values. The function needs to operate on an array with "Win" number of rows which is why I am trying to slice it as such.
Can you elaborate on your answer with this new information please?

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Parallel for-Loops (parfor) finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by