Reinforcement Learning Toolbox: Episode Q0 stopped predicting after a few thousand simulations. DQN Agent.

Question

Cecilia S. am 4 Jun. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/848160-reinforcement-learning-toolbox-episode-q0-stopped-predicting-after-a-few-thousand-simulations-dqn

Kommentiert: Cecilia S. am 9 Jun. 2021

Q0 values were pretty ok until episode 2360, it's not stuck, just increasing very very slowly

I'm using the default generated DQN agent (with continuous observations and discrete actions) with only a few modifications. I'm not sure I understand what the issue is here or if this is the correct behaviour and this means my agent has converged to a somewhat stable result.

I understood, from documentation, that Episode Q0 should give a prediction of the "true discounted long-term reward", I assumed this meant the discounted reward for each single episode regardless of the convergence or lack thereof, but maybe I understood something wrong.

Please help clarify. I made several runs and they all display the same behaviour over a few thousand episodes (no always the same amount)

____

The changes I made were only these ones:

critic.Options = rlRepresentationOptions(...

'LearnRate',1e-3,...

'GradientThreshold',1,...

'UseDevice','gpu');

% extract agent options

agentOpts = agent.AgentOptions;

% modify agent options

agentOpts.EpsilonGreedyExploration.EpsilonDecay = 0.005;

agentOpts.DiscountFactor = 0.1;

% resave agent with new options

agent = rlDQNAgent(critic,agentOpts);

2 Kommentare
Keine anzeigenKeine ausblenden

Emmanouil Tzorakoleftherakis am 9 Jun. 2021

Hello,

This behavior is strange, I would create a technical support case so that we can take a closer look if possible.

Cecilia S. am 9 Jun. 2021

Hello! It happens every time, after some thousands of runs. I leave another example where you can see the value decreases very slowly after "getting stuck"

I thought it might be the "correct" behaviour and I was understanding the concept of "true discounted reward" wrong based on this example:

https://www.mathworks.com/help/reinforcement-learning/ug/train-biped-robot-to-walk-using-reinforcement-learning-agents.html

in which Q0 seems to be "stuck" too and that appears to be the expectable result.

Perhaps the problem is my reward definition? My reward function gives more negative reward as the system gets away from a target output value and a positive reward (single value) when it is in range, this also finishes the episode.

Pseudocode for reward in case it helps:

if ~IsDone

if parameter 1 out of range

Reward = -100*10^(difference between output1 and target value 1);

elseif parameter 1 is ok but parameter 2 is out of range

Reward = -1*10^(difference between output2 and target value 2);

end

else

Reward = +10;

end

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Reinforcement Learning Toolbox: Episode Q0 stopped predicting after a few thousand simulations. DQN Agent.

2 Kommentare
Keine anzeigenKeine ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Reinforcement Learning Toolbox: Episode Q0 stopped predicting after a few thousand simulations. DQN Agent.

2 Kommentare Keine anzeigenKeine ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

2 Kommentare
Keine anzeigenKeine ausblenden