DDPG Agent isn't learning (reward 0 for every episode)

6 Ansichten (letzte 30 Tage)

Reinforcement Learning am 21 Mär. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/779177-ddpg-agent-isn-t-learning-reward-0-for-every-episode

Kommentiert: yovel atia am 2 Dez. 2021

Akzeptierte Antwort: Emmanouil Tzorakoleftherakis

Hello,

I suspect that the SampleTime Ts has something to do with it, I set Ts = 1e-6, but the trainnig is still going very fast.

I tried the same thing that was used for the Water Tank Model:

https://de.mathworks.com/help/reinforcement-learning/ug/create-simulink-environment-and-train-agent.html?searchHighlight=create%20simulink%20rl&s_tid=srchtitle

Just like in the example my system is very well controlled with a PI-Controller, but the DDPG Agent isn't learning anything.

What exactly is wrong with this system?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Akzeptierte Antwort

Emmanouil Tzorakoleftherakis am 22 Mär. 2021

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/779177-ddpg-agent-isn-t-learning-reward-0-for-every-episode#answer_654827

The reason why you see 0 rewards is because thw IsDone flag (which is used to terminate episodes early) is immediately set to true at the beginning of each episode. Either set it to false, or set the appropriate logic for you system.

Note that there are other things that need to be fixed in your setup, the most notable one being the agent sample time (which is currently very small) and the episode duration/max steps (which is currently very large). Unless you adjust these to more reasonable values for RL, your training will take days

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Emmanouil Tzorakoleftherakis am 22 Mär. 2021

Tha's what I am saying, you may need to reconsider your inputs/outputs. I am not sure what you mean by "made it even worse". If you are not seeing good training results, there could be a lot of other reasons including training options and hyperparameters, networks architectures etc. I would give the duty cycle idea another try and spend more time on the hyperparameters

yovel atia am 2 Dez. 2021

I get the same error, only for me the graph converges to -5000 constant

I do not know how to fix it I would be very happy to help

thanks!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Melden Sie sich an, um diese Frage zu beantworten.

Kategorien

AI and Statistics Deep Learning Toolbox Applications Autonomous and Control Systems Reinforcement Learning

Mehr zu Reinforcement Learning finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by