Advantage normalization for PPO Agent

Question

Federico Toso am 13 Feb. 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2081813-advantage-normalization-for-ppo-agent

Kommentiert: Federico Toso am 23 Feb. 2024

When dealing with PPO Agents, it is possibile to set a "NormalizedAdvantageMethod" to normalize the advantage function values for each mini-batch of experiences. The default value is "none".

While I can intuitively grasp that such a normalization operation may be beneficial in terms of reducing variance, I could not find any reference online which describes with sufficient details when and why this procedure should be useful. My questions are:

1) Under which circumstances does the normalization of advantage function values turn out to be practically beneficial?

2) If I decide to normalize the advantage function values, are there situations where the "moving" option (which uses a restrict number of samples) can be more beneficial then the "current" option (which uses all of the current samples available)? Intuitively I would say that the "current" option should always perform better

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Shivansh am 21 Feb. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2081813-advantage-normalization-for-ppo-agent#answer_1413113

Hi Federico,

It seems like you are working with Proximal Policy Optimization(PPO) and want to know more about the "NormalizedAdvantageMethod".

Normalization of the advantage function is beneficial in circumstances where the scale of the advantage values varies significantly across different parts of the state space or throughout training. It can help in variance reduction as well as faster training and optimal reward scaling.

For the second part, the choice between the "moving" and "current" normalization depends on the problem and environment. A few cases where "moving" might be a better option than "current" can be:

Non-stationary environments: In cases where the environment might change over time, "moving" normalization (which is based on a restricted number of recent samples) can adapt more quickly to the changes than using "current" normalization. A common example can be stock market trading data.
Large batches: In the case of large batches, "moving" normalization makes more sense as it can adapt faster than "current" normalization. It can also be useful in case of memory or computation constraints.

There can also be other use cases for "moving" normalization as it can smooth out the learning updates by providing a more consistent reference for advantage estimates over consecutive batches.

It might seem that "current" normalization should always perform better because it uses all available data, "moving" normalization can provide a more robust and adaptive reference for the advantage function, which can be particularly useful in dynamic or complex environments.

I hope it resolves your query!

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Federico Toso am 23 Feb. 2024

Thank you very much!

Melden Sie sich an, um zu kommentieren.

Advantage normalization for PPO Agent

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Advantage normalization for PPO Agent

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden