- Try decreasing the sparsity in your episode reward. You have some episodes with 0 reward and some with 10k reward which can generate some problems with gradients. Maybe add a multiplier to the rewards you are giving so that your high-reward episodes reach a reward of ~10, but play around with it.
- Decrease learning rate, which always helps when you start a new RL project. At least until you find a number that works. Maybe try something like 1e-4, 1e-5, 1e-6, i wouldn't go lower.
I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.
4 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Following is the code for creating rl Agent:
criticOpts = rlRepresentationOptions("LearnRate",1e-3,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},criticOpts);
actorOptions = rlRepresentationOptions("LearnRate",1e-4,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},actorOptions);
agentOpts = rlDDPGAgentOptions(...
"SampleTime",sampleTime,...
"TargetSmoothFactor",1e-3,...
"DiscountFactor",0.995, ...
"MiniBatchSize",128, ...
"ExperienceBufferLength",1e6);
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
obstacleAvoidanceAgent = rlDDPGAgent(actor,critic,agentOpts);
Training options are:
maxEpisodes = 5000;
maxSteps = ceil(Tfinal/sampleTime);
trainOpts = rlTrainingOptions(...
"MaxEpisodes",maxEpisodes, ...
"MaxStepsPerEpisode",maxSteps, ...
"ScoreAveragingWindowLength",50, ... "StopTrainingCriteria","AverageReward", ...
"StopTrainingValue",10000, ...
"Verbose", true, ...
"Plots","training-progress");
trainingStats = train(obstacleAvoidanceAgent,env,trainOpts);
and for training, it is not converging as shown in the attached fig:
0 Kommentare
Antworten (1)
Matteo D'Ambrosio
am 28 Mai 2023
Bearbeitet: Matteo D'Ambrosio
am 28 Mai 2023
I'm not too familiar with DDPG as i use other agents, but by looking at your episode reward figure a few things come to mind:
Hope this helps.
0 Kommentare
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!