PPO agent low reward episodes

Question

Sourabh am 11 Dez. 2023

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2058894-ppo-agent-low-reward-episodes

Kommentiert: Sourabh am 3 Jan. 2024

I am trying to implement PPO agent and i m getting rewards as shown and i have tried tuning hyperparameter settings but still training looks like this and I dont know what is the isssue I also tried to use Isdone signal to terminatee the episode when reward reaches below a certain value but still no use . can someone help.?

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Sourabh am 14 Dez. 2023

it feels like my algorithm never starts exploiting and is just exploring throughout the training how can i reduce the exploration?

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Shivansh am 27 Dez. 2023

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2058894-ppo-agent-low-reward-episodes#answer_1378587

Hi Sourabh,

I understand that you are training a PPO agent which is stuck in exploration and not able to learn a stable policy. This is resulting in poor training as shown in the provided graph.

Here are a few steps to improve the performance of your model:

Try normalizing or scaling down the rewards to a smaller range.
A PPO agent uses entropy regularization to encourage exploration. Try reducing the entropy regularization coefficient gradually and observe the results.
You can try verifying your “IsDone” flag’s implementation. It can terminate the episode at the right time to prevent the agent from learning undesired states.

Apart from the above points, you can also try experimenting with learning rates, hyperparameter tuning, and network architecture, etc to find the best configurations for your problem.

You can use a known benchmark environment to verify that your PPO implementation works as expected. Once you have confirmed that the agent can learn in a simpler setting, gradually reintroduce complexity.

You can refer to the common lander vehicle example to learn more about the working of a PPO agent in MATLAB: https://in.mathworks.com/help/reinforcement-learning/ug/train-ppo-agent-to-land-vehicle.html.

For more information on PPO agents, refer to the following documentation: https://in.mathworks.com/help/reinforcement-learning/ug/ppo-agents.html.

If the issue persists, provide more information related to problem statement and the environment. It will help to get a better understanding of the issue you are facing.

Hope it helps!

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Shivansh am 3 Jan. 2024

Hi Sourabh!

You don't have to change the structure of reward function to normalize rewards. You can use reward normalization techniques like dividing by a maximum possible reward or using statistical normalization (subtracting the mean and dividing by the standard deviation of the rewards).

Adding a large positive constant might not be a good way as this might lead to suboptimal policies. If adding a constant improves learning, there might be a case that the original rewards might be too negative or sparse. You can try revising your reward function to better balance positive and negative rewards, guiding the agent more effectively towards the optimal policy.

Hope it helps!

Sourabh am 3 Jan. 2024

so do i need to modify my script or change something in my env model (sorry but I am very new to all these and dont have much idea) i am attaching the script and env file for reference.

Melden Sie sich an, um zu kommentieren.

PPO agent low reward episodes

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Antworten (1)

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

PPO agent low reward episodes

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Antworten (1)

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden