PPO Agent training - Is it possible to control the number of epochs dynamically?
3 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Federico Toso
am 17 Mär. 2024
Kommentiert: Federico Toso
am 25 Mär. 2024
In the deault implementation of PPO agent in Matlab, the number of epochs is a static property that must be selected before the training starts.
However I've seen that state-of-the-art implentations of PPO sometimes select dynamically the number of epochs: basically, for each learning phase, the algorithm decides whether to execute a new epoch or not, basing on the value of the KL divergence just calculated. This seems to help the robustness of the algorithm significanlty.
Is it possible for a user to implement such a routine in Matlab in the context of PPO training, possibly applying some slight modifications to the default process?
0 Kommentare
Akzeptierte Antwort
Kartik Saxena
am 22 Mär. 2024
Hi,
Given below is the code snippet depicting the logic/pseudo algorithm you can refer to for this purpose:
% Assume env is your environment and agent is your PPO agent
for episode = 1:maxEpisodes
experiences = collectExperiences(env, agent);
klDivergence = inf;
epochCount = 0;
while klDivergence > klThreshold && epochCount < maxEpochs
oldPolicy = getPolicy(agent);
agent = updateAgent(agent, experiences);
newPolicy = getPolicy(agent);
klDivergence = calculateKLDivergence(oldPolicy, newPolicy);
epochCount = epochCount + 1;
end
end
Additionally, you can refer to the following documentations and examples to get an idea and use it for your custom implementation of PPO agent:
I hope it helps!
Weitere Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!