Agent repeats same sequence of actions each episode

Question

Braydon Westmoreland am 1 Jul. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/557872-agent-repeats-same-sequence-of-actions-each-episode

Bearbeitet: Emmanouil Tzorakoleftherakis am 2 Jul. 2020

Akzeptierte Antwort: Emmanouil Tzorakoleftherakis

Can someone please help me understand why my RL Agent is outputting the same sequence of actions each episode, regardless of the observations made from the environment. Here is an example of what I mean:

prev_state = 11.20 11.90 11.30 11.50

action = 0.00 0.00 0.00 0.00

new_state = 11.20 11.90 11.30 11.50

prev_state = 11.20 11.90 11.30 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 11.30 12.00 11.20 11.50

prev_state = 11.30 12.00 11.20 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 11.40 12.00 11.10 11.50

prev_state = 11.40 12.00 11.10 11.50

action = -0.10 -0.10 0.10 0.00

new_state = 11.30 11.90 11.20 11.50

prev_state = 11.30 11.90 11.20 11.50

action = 0.00 0.00 0.10 0.10

new_state = 11.30 11.90 11.30 11.60

prev_state = 12.00 11.20 11.70 11.50

action = 0.00 0.00 0.00 0.00

new_state = 12.00 11.20 11.70 11.50

prev_state = 12.00 11.20 11.70 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 11.30 11.60 11.50

prev_state = 12.00 11.30 11.60 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 11.40 11.50 11.50

prev_state = 12.00 11.40 11.50 11.50

action = -0.10 -0.10 0.10 0.00

new_state = 11.90 11.30 11.60 11.50

prev_state = 11.90 11.30 11.60 11.50

action = 0.00 0.00 0.10 0.10

new_state = 11.90 11.30 11.70 11.60

Let me know if you have any questions about the simulation.

More info on the simulation & my other issues: https://www.mathworks.com/matlabcentral/answers/555799-reinforcement-learning-sample-time

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Emmanouil Tzorakoleftherakis am 2 Jul. 2020

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/557872-agent-repeats-same-sequence-of-actions-each-episode#answer_460096

Bearbeitet: Emmanouil Tzorakoleftherakis am 2 Jul. 2020

Hi Braydon,

I am not really sure why you are only looking at the first two episodes. RL can take thousands of episodes to converge so the first few really don't give you enough information. As a matter of fact, I ran your models for 20 episodes and the action sequence was different after a few episodes or so. If nothing else, I would check the reward formulation since this would drive how the neural networks weights change and thus how actions are selected (in addition to exploration).

1.0000e-04

prev_state = 11.90 11.90 12.00 11.20

action = 0.00 0.00 0.00 0.00

new_state = 11.90 11.90 12.00 11.20

prev_state = 11.90 11.90 12.00 11.20

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 12.00 11.90 11.20

prev_state = 12.00 12.00 11.90 11.20

action = -0.10 0.00 -0.10 0.10

new_state = 11.90 12.00 11.80 11.30

prev_state = 11.90 12.00 11.80 11.30

action = -0.10 0.10 0.00 -0.10

new_state = 11.80 12.00 11.80 11.20

prev_state = 11.80 12.00 11.80 11.20

action = 0.10 0.00 -0.10 0.00

new_state = 11.90 12.00 11.70 11.20

1.0000e-04

prev_state = 11.70 11.90 11.50 11.60

action = 0.00 0.00 0.00 0.00

new_state = 11.70 11.90 11.50 11.60

prev_state = 11.70 11.90 11.50 11.60

action = 0.10 0.10 -0.10 0.00

new_state = 11.80 12.00 11.40 11.60

prev_state = 11.80 12.00 11.40 11.60

action = -0.10 0.00 -0.10 0.10

new_state = 11.70 12.00 11.30 11.70

prev_state = 11.70 12.00 11.30 11.70

action = -0.10 0.10 0.00 -0.10

new_state = 11.60 12.00 11.30 11.60

prev_state = 11.60 12.00 11.30 11.60

action = 0.10 0.00 -0.10 0.00

new_state = 11.70 12.00 11.20 11.60

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Agent repeats same sequence of actions each episode

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Agent repeats same sequence of actions each episode

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden