Episode Q0 increases exponentially
Ältere Kommentare anzeigen
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

Antworten (1)
Emmanouil Tzorakoleftherakis
am 16 Feb. 2021
0 Stimmen
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
1 Kommentar
DAMODARAN B.K
am 17 Feb. 2021
Bearbeitet: DAMODARAN B.K
am 17 Feb. 2021
Kategorien
Mehr zu Reinforcement Learning finden Sie in Hilfe-Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!