Is it possible to change RL action values under certain conditions?

7 Ansichten (letzte 30 Tage)

black_cat am 18 Mai 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/833083-is-it-possible-to-change-rl-action-values-under-certain-conditions

Bearbeitet: black_cat am 20 Mai 2021

I want my agent to output a target value, but in certain situations (reward drops dramatically), I would want the agent to look for a better solution by letting him change the target value. I tried to use initial condition block in order to use the target value in the first place. However, my agent (PPO) always outputs an average value after some training episodes.

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

black_cat am 20 Mai 2021

Bearbeitet: black_cat am 20 Mai 2021

I've tried to create a minimal version that illustrates my problem. Here, I'm outputing numbers from 1-3. I hope it's more understandable that way.

black_cat am 20 Mai 2021

Bearbeitet: black_cat am 20 Mai 2021

Okay, even though the attached example is supposed to be easy to understand, I think I'm able to put my problem in simple terms now:

I'm training my agent to output 3 discrete values (1, 2, 3)
I punish him for not outputing my target value
My target value is 1 for 50% of the time and 3 for the other 50% of the time

When training the agent is done (no matter which one, they all act the same in this case), it will output 1 or 3. For 100% of the time. It's not changing the output values at all. It's just using one. This is my problem.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.