I'm using the soft actor critic agent from Matlab in a custon environment, and I am observing it goes really bad after some training time.
My custon environment simulates a robot motion in a scenario with obstacles, and if the robot collides with one of the obstacles it terminates the episode and the reward is -100. And if the robot reachs a target spot in the scenario, the episode terminates with a +100 reward. So, the agent seems to learn how to get the reward, but then loses all learning after some iterations.
This is a screenshot from one example:
Can someone help me to understand what is happening?