Expected reward blows up while training (DDPG agent, reinforcement learning)

Sayak Mukherjee
Sayak Mukherjee on 12 Oct 2020
I am training a DDPG network and after training for around 5000 iterations, the model seems doesnot seem to converge while the expected reward keeps on increasing exponentially. What can be a possible reason and how to solve the issue.

Answers (1)

Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 12 Oct 2020
This answer may be helpful.
I would make sure your reward signal outputs values that make sense, and also possibly simplify the critic network.

