Episode Q0 increases exponentially

16 Feb. 2021

1 Antwort

9 Ansichten (30 Tage)

0 Stimmen

Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

0 Stimmen

Hello,

Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.

Hope this helps

DAMODARAN B.K am 17 Feb. 2021

Bearbeitet: DAMODARAN B.K am 17 Feb. 2021

is episode Q0, criticnetwork output or target value?

Mehr zu Reinforcement Learning finden Sie in Hilfe-Center und File Exchange

Find the treasures in MATLAB Central and discover how the community can help you!

Translated by