Random shuffling (with shuffle() command) does not work in combination with randomPatc​hExtractio​nDatastore​-Objects and/or multi-gpu training

5 Ansichten (letzte 30 Tage)
I want to shuffle my partitioned randomPatchExtractionDatastore objects in "workerImds" randomly by using the "shuffle()" command like shown in the following lines:
iteration = 0;
spmd
% partition datastore
workerImds = partition(dsTrain,numWorkers,labindex);
workerImds.MiniBatchSize = workerMiniBatchSize(labindex);
% loop over epochs
for epoch = 1:options.MaxEpochs
% shuffle data every epoch
reset(workerImds)
workerImds = shuffle(workerImds);
% loop over mini-batches
while gop(@and,hasdata(workerImds))
iteration = iteration + 1;
% read mini-batch of data
[workerXYBatch,workerImdsInfo] = read(workerImds);
...
end
end
end
This is done according to the related documentation: https://de.mathworks.com/help/parallel-computing/train-network-in-parallel-with-custom-training-loop.html
But everytime starting a new training loop the shuffling results in the same order of indices, which can be seen when adding the following line to my while-loop and starting the training over and over again.
if iteration == 1 || mod(iteration-1,numIterationsPerEpoch) == 0
if labindex == 1
fprintf("\nEpoch: %2.0f Iteration: %4.0f WorkerIndices: %s",epoch,iteration,mat2str(workerImdsInfo.ImageIndices.'))
end
end
Lab 1:
Epoch: 1 Iteration: 1 WorkerIndices: [16 23 15 21 49]
Epoch: 2 Iteration: 11 WorkerIndices: [40 17 38 24 15]
Epoch: 3 Iteration: 21 WorkerIndices: [49 21 18 11 12]
Epoch: 4 Iteration: 31 WorkerIndices: [24 6 48 42 40]
Epoch: 5 Iteration: 41 WorkerIndices: [18 36 22 20 23]
This order of indices ([16 23 15 21 49] ...) is what I get, everytime doing the training. This is not what I would call random shuffling. This is more like a "static shuffling algorithm".
The same problem also occurs without multi-GPU training (without spmd block) when trying to randomly shuffle the normal (non-partitioned) randomPatchExtractionDatastore-Object "dsTrain".
However, the problem does not occur when shuffling an ImageDatastore object. In this case, as well as in the case of shuffling a minibatchqueue by using the shuffle() command, the random shuffling works very well. But unfortunately in this multi-GPU case I can neither use a minibatchqueue nor a normal imageDatastore object in my regression use case.
Looking forward to help and advice on this subject! Thank you!

Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by