Parallel reinforcement learning on HPC with warning "Received duplicate id = x from worker"
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
When I'm running training of a reinforcement learning agent using a HPC cluster and parallel computing toolbox I get the warning "Received duplicate id = 22 from worker" (or other id) after e.g. 180 training episodes. Then the training seems to be stopped and there is no further error or warning. I am using this command to start the .m-script:
module load matlab/R2021a
matlab -nodisplay < rl_training.m
When I set
trainOpts.UseParallel = false;
often I get the warning "Error reading character from command line". Does anyone know why these messages are occurring and is there perhaps a way to continue the training?
5 Kommentare
Image Analyst
am 2 Dez. 2021
If you have a maintenance contract in place, I'd call them on the phone. Of course you can use email like @Raymond Norris said. I never use email or a support page since when I encounter a problem I need an immediate solution so I call them.
Walter Roberson
am 5 Dez. 2021
I never call them, myself -- I open support cases, where I can describe the problem and include code and results to show clearly what is expected and what is received instead. 85% of the time the response is going to be "You are right, that's not good, the developers have been notified and it might get fixed some day".
Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!