- Try decreasing the sparsity in your episode reward. You have some episodes with 0 reward and some with 10k reward which can generate some problems with gradients. Maybe add a multiplier to the rewards you are giving so that your high-reward episodes reach a reward of ~10, but play around with it.
- Decrease learning rate, which always helps when you start a new RL project. At least until you find a number that works. Maybe try something like 1e-4, 1e-5, 1e-6, i wouldn't go lower.
I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.
10 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Following is the code for creating rl Agent:
criticOpts = rlRepresentationOptions("LearnRate",1e-3,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},criticOpts);
actorOptions = rlRepresentationOptions("LearnRate",1e-4,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},actorOptions);
agentOpts = rlDDPGAgentOptions(...
"SampleTime",sampleTime,...
"TargetSmoothFactor",1e-3,...
"DiscountFactor",0.995, ...
"MiniBatchSize",128, ...
"ExperienceBufferLength",1e6);
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
obstacleAvoidanceAgent = rlDDPGAgent(actor,critic,agentOpts);
Training options are:
maxEpisodes = 5000;
maxSteps = ceil(Tfinal/sampleTime);
trainOpts = rlTrainingOptions(...
"MaxEpisodes",maxEpisodes, ...
"MaxStepsPerEpisode",maxSteps, ...
"ScoreAveragingWindowLength",50, ... "StopTrainingCriteria","AverageReward", ...
"StopTrainingValue",10000, ...
"Verbose", true, ...
"Plots","training-progress");
trainingStats = train(obstacleAvoidanceAgent,env,trainOpts);
and for training, it is not converging as shown in the attached fig:
0 Kommentare
Antworten (1)
Matteo D'Ambrosio
am 28 Mai 2023
Bearbeitet: Matteo D'Ambrosio
am 28 Mai 2023
I'm not too familiar with DDPG as i use other agents, but by looking at your episode reward figure a few things come to mind:
Hope this helps.
0 Kommentare
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!