Why does Soft actor critic have Entropy terms instead of Log probability?

4 Ansichten (letzte 30 Tage)

Aniruddha Datta am 29 Jun. 2021

2
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/867688-why-does-soft-actor-critic-have-entropy-terms-instead-of-log-probability

Kommentiert: Takeshi Takahashi am 1 Jul. 2021

According to the Soft Actor Critic paper by Haarnoja et al. (2018) the TD learning, Policy update and the entropy coefficient or temperature update all have used log probability inside the Expectation symbol due to the soft state function and thus leading to Entropy indirectly. I want to know if it is a documentation error in MATLAB 2021a that entropy was used directly or is there a coding error in the backend. SInce i cant seem to find the function for the training loops for these functions i cannot verify for myself. I will put the formulas for comparison here as images as it might exceed the characters.

From the Spinning up in RL Documentation by Open AI parent company of the authors who tweaked the algo slightly to include only Q values, here only Log probability is used and then summed over.

In MATLAB's Documentation we have entropy before the summation

SAC is a very important off policy reinforcement learning algorithm for various research purposes which specialises in sample efficiency, if the mistakes in the documentation is reflected in the code then it will be a term higher in degree than entropy and thus non accurate results will occur and if it is a minor documentation error, nevertheless it needs to be fixed.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Akzeptierte Antwort

Takeshi Takahashi am 1 Jul. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/867688-why-does-soft-actor-critic-have-entropy-terms-instead-of-log-probability#answer_737128

RL toolbox also uses the log of the probability density to approximate the differential entropy.

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Aniruddha Datta am 1 Jul. 2021

Is it possible to look at the source code for the training loop of the policy and Critic?

Takeshi Takahashi am 1 Jul. 2021

Yes. You are right that we call the log density Entropy in our SAC documentation.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Aniruddha Datta am 1 Jul. 2021

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/867688-why-does-soft-actor-critic-have-entropy-terms-instead-of-log-probability#answer_737148

The follow up paper, Soft Actor Critic Algorithm and Applications is much more consistent in the terms used for Soft Q update and Soft policy iteration.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Takeshi Takahashi am 1 Jul. 2021

Thank you for your feedback!

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Kategorien

Control Systems Reinforcement Learning Toolbox Policies and Value Functions

Mehr zu Policies and Value Functions finden Sie in Help Center und File Exchange

Produkte

Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Why does Soft actor critic have Entropy terms instead of Log probability?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Weitere Antworten (1)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Why does Soft actor critic have Entropy terms instead of Log probability?

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

6 Kommentare 4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Weitere Antworten (1)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden