Episode Q0 increases exponentially

Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

Antworten (1)

0 Stimmen

Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps

1 Kommentar

DAMODARAN B.K
DAMODARAN B.K am 17 Feb. 2021
Bearbeitet: DAMODARAN B.K am 17 Feb. 2021
is episode Q0, criticnetwork output or target value?

Melden Sie sich an, um zu kommentieren.

Gefragt:

am 16 Feb. 2021

Bearbeitet:

am 17 Feb. 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by