Issue with Q0 Convergence during Training using PPO Agent
Ältere Kommentare anzeigen
Hi guys,
I have developed my model and trained using PPO agent. Overall, the training process has been successful. However, I have encountered an issue with the Q0 values. The maximum achieable rewards is 6000. I set to stop my training at 98.5% of the maximum rewards (5910).
During the training, I have noticed that the Q0 values did not converge as expected. In fact, they seem to be capped at 100, as indicated by the figures. I am currently seeking an explanation for this behavior and trying to understand why the Q0 values are not reaching the desired convergence.

My agent option is as follow:

If anyone has any insights or explanations regarding the behavior of Q0 during training with the PPO agent, I would greatly appreciate your input. Your expertise and guidance would be invaluable in helping me understanding and addressing this issue.
Thank you.
2 Kommentare
Emmanouil Tzorakoleftherakis
am 10 Jul. 2023
Can you share the code with the training options?
Muhammad Fairuz Abdul Jalal
am 11 Jul. 2023
Akzeptierte Antwort
Weitere Antworten (0)
Kategorien
Mehr zu Reinforcement Learning finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


