Setting up parallel computations for a single dataset, as opposed to spmd
3 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Joseph Byrnes
am 14 Apr. 2022
Kommentiert: Joseph Byrnes
am 25 Apr. 2022
I am working with a program that needs to enter in and out of a parfor loop many times, and the data I am working on is the same for all iterations and for all workers. The code stripped down and commented is below. The key is that the data is being sent to the workers over and over again, but never changes. SPMD or distributed arrays won't help (I believe!) because its the same dataset each time ie I do not want to carve it up into sections. Its the models I need to change, which are much smaller than the data (allWfs).
Is there a way to, say, predistribute an array (allWfs in this case) to each worker and keep it on the workers for the whole calculation?
Code:
%%%%TD_parameters is predefined
%allWfs is the data I am distributing around. It needs to be the same for all models and is never modified.
[ allWfs, ~ ] = load_data_syn(TD_parameters, t);
%%%%%%%%%%%
for iter = 1:TD_parameters.n_iter
%This is the parallel loop, where I do one iteration of the monte-carlo on each model.
parfor i = 1:TD_parameters.n_chains
mset(i) = TD_inversion_function_PT(mset(i), t, TD_parameters, allWfs);
end
%%%%%%%%This is the parallel tempering step that needs to happen after each parfor statement, which is why I am entering an exiting the parallel loop so many times.
inds = randperm(length(mset));
for m1 = 1:length(inds)
for m2 = 1:length(inds)
if mset(inds(m1)).T == mset(inds(m2)).T || m2 >= m1
continue
end
a = (mset(inds(m2)).llh - mset(inds(m1)).llh)*mset(inds(m1)).T;%mset[inds[m2]].lp - mset[inds[m1]].lp;
a = a + (mset(inds(m1)).llh - mset(inds(m2)).llh)*mset(inds(m2)).T;% + mset[inds[m1]].lp - mset[inds[m2]].lp;
thresh = log(rand());
if a > thresh
T1 = mset(inds(m1)).T;
mset(inds(m1)).T = mset(inds(m2)).T;
mset(inds(m2)).T = T1;
end
end
end
%%%%%%models are saved here, removed for conciseness
end
0 Kommentare
Akzeptierte Antwort
Edric Ellis
am 22 Apr. 2022
This is precisely the sort of thing that parallel.pool.Constant was designed for. You build a Constant once on the client, the data is transferred to the workers once, and then you can access it in multiple parfor loops (or spmd blocks...). In your case, you'd use it a bit like this:
[allWfs, ~] = load_data_syn(TD_parameters, t);
allWfsConstant = parallel.pool.Constant(allWfs);
for iter = 1:TD_parameters.n_iter
parfor i = 1:TD_parameters.n_chains
mset(i) = TD_inversion_function_PT(mset(i), t, TD_parameters, allWfsConstant.Value);
end
% etc...
end
0 Kommentare
Weitere Antworten (1)
Joseph Byrnes
am 22 Apr. 2022
2 Kommentare
Edric Ellis
am 25 Apr. 2022
Yes, ticBytes / tocBytes should show the difference. Like this:
pool = parpool("local");
data = magic(1000);
%% Without Constant
t = ticBytes(pool);
for ii = 1:10
parfor jj = 1:10
sum(data, "all");
end
end
tocBytes(pool, t)
%% With Constant
t = ticBytes(pool);
dataC = parallel.pool.Constant(data);
for ii = 1:10
parfor jj = 1:10
sum(dataC.Value, "all");
end
end
tocBytes(pool, t)
%% With Constant, using function-handle constructor
t = ticBytes(pool);
% Here we send just the function handle to the workers, and they execute it
% to build |magic(1000)|.
dataC = parallel.pool.Constant(@() magic(1000));
for ii = 1:10
parfor jj = 1:10
sum(dataC.Value, "all");
end
end
tocBytes(pool, t)
Siehe auch
Kategorien
Mehr zu Parallel for-Loops (parfor) finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!