Reward in training manager higher than should be

5 Ansichten (letzte 30 Tage)
Mohammed Eleffendi
Mohammed Eleffendi am 10 Mär. 2021
Kommentiert: zhq am 29 Aug. 2024
Hi,
I am trying to train a reinfocement learning agent and I have the environment setup in simulink. I'm facing two issues:
1- The reward in the training manager appears to be much higher than it should be. As shown in the picture below, the scope connected to the reward signal shows a reward value of 1 which is correct. However, in the training manager it is 70 which is not correct.
2- After a number of episodes, the training stops and I get an error message:
Error using rl.env.AbstractEnv/simWithPolicy (line 82)
An error occurred while simulating "ADSTestBed" with the agent "falsifier_agent".
Error in rl.task.SeriesTrainTask/runImpl (line 33)
[varargout{1},varargout{2}] = simWithPolicy(this.Env,this.Agent,simOpts);
Error in rl.task.Task/run (line 21)
[varargout{1:nargout}] = runImpl(this);
Error in rl.task.TaskSpec/internal_run (line 166)
[varargout{1:nargout}] = run(task);
Error in rl.task.TaskSpec/runDirect (line 170)
[this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
Error in rl.task.TaskSpec/runScalarTask (line 194)
runDirect(this);
Error in rl.task.TaskSpec/run (line 69)
runScalarTask(task);
Error in rl.train.SeriesTrainer/run (line 24)
run(seriestaskspec);
Error in rl.train.TrainingManager/train (line 421)
run(trainer);
Error in rl.train.TrainingManager/run (line 211)
train(this);
Error in rl.agent.AbstractAgent/train (line 78)
TrainingStatistics = run(trainMgr);
Error in ADSTestBedScript (line 121)
trainingStats = train(falsifier_agent,env,trainOpts);
Caused by:
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
Unable to compute gradient from representation.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
Error using 'backwardLoss' in Layer rl.layer.FcnLossLayer. The function threw an
error and could not be executed.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
Number of elements must not change. Use [] as one of the size inputs to
automatically calculate the appropriate size for that dimension.
I should mention that I have another agent in the simulink model but that agent is not being trained.
Version 2020b
Any help is appreciated. Thanks

Akzeptierte Antwort

Mohammed Eleffendi
Mohammed Eleffendi am 18 Mär. 2021
For the first issue, the reward in the training manager is the cumulative episode reward whereas the reward in the scope is a plot of the reward for every time step. So the reward in the training manager is correct there is no issue in here.
For the second issue, it turns out if you have 'UseDevice" set to 'gpu' you will encounter this error. Change it to 'cpu' and the error disappears. Support is exploring what is causing this issue.

Weitere Antworten (1)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis am 11 Mär. 2021
Cannot be sure about the error, but it seems somewhere in your setup you are currently changing changing the number of parameters/inputs (check inputs to the RL Agent block).
For your first question, individual reward at each time step is different than the episode reward shown in the Episode Manager. The latter sums up the individual rewards over all time steps of an episode
  4 Kommentare
zhq
zhq am 29 Aug. 2024
我想问下:如果我把individual reward按时间步累加得到的值应该和the episode reward shown in the Episode Manager差不多对不对?我遇到一个不太明白的场景https://ww2.mathworks.cn/matlabcentral/answers/2148684-reinforcement-learning-training-monitor-episode-reward

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Environments finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by