Why do agents trained by the reinforcement learning PPO algorithm get different results each time they load？

Question

ye am 17 Aug. 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2145889-why-do-agents-trained-by-the-reinforcement-learning-ppo-algorithm-get-different-results-each-time-th

Beantwortet: Shivansh am 18 Aug. 2024

In the process of reinforcement learning, a problem will be encountered. During the training process, an effective agent will appear. At this time, the training will be finished in advance, but the result of the saved agent running out will be worse

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Shivansh am 18 Aug. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2145889-why-do-agents-trained-by-the-reinforcement-learning-ppo-algorithm-get-different-results-each-time-th#answer_1500264

Hello Ye,

The behaviour shown by the agents trained by the reinforcement learning PPO algorithm suggests there might be some randomness involved in the environment. Since, the performance worsens for saved agents, it is also possible that the agent might be overfitted on certain episodes while training and the performance might not be same for other episodes. Try to remove any stochasticity in the model using deterministic seeds. You can also try to evaluate the model on various episodes to get a better understanding of the issue.

If the issue still remains unclear, share more information regarding the model and the environment.

I hope it helps!

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Why do agents trained by the reinforcement learning PPO algorithm get different results each time they load？

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Why do agents trained by the reinforcement learning PPO algorithm get different results each time they load？

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden