Enforce action space constraints within the environment

26 Ansichten (letzte 30 Tage)
Hi,
My agent is training!!! But it's pretty much 0 reward every episode right now. I think it might be due to this:
contActor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.
How can I do this?
Also, is there a way to view the logged signals as the agent is training?
Thanks!
  1 Kommentar
John Doe
John Doe am 24 Feb. 2021
There's something odd going on. It's not 0 reward, but it's not growing. I do have that first action method i said implemented in the other question (so for 4 of the continuous actions, it only chooses the first action) and for 1 action it's used every time step. I guess i need to check the logged signals to really determine what's going on. I'm too excited to make it work on the first or second try lol

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Emmanouil Tzorakoleftherakis
If the environment is in Simulink, you can setup scopes and observe what's happening during training. If the environment is in MATLAB, you need to do some extra work and plot things yourself.
For your contraints question, which agent are you using? Some agents are stochastic and some like DDPG add noise for exploration on top of the action output. To be certain, you can use a saturation block in Simulink or an if statement to clip the action as needed in MATLAB.
  28 Kommentare
John Doe
John Doe am 2 Mär. 2021
Bearbeitet: John Doe am 2 Mär. 2021
How can I do the scaling of the inputs to the network? That seems like the best way forward.
The environment is already constraining the actions, but the training is extremely sample inefficient and basically bouncing across the upper and lower limits of the actions for hundreds of episodes.
Emmanouil Tzorakoleftherakis
multiply the observations inside the 'step' function with a number that makes sense

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

John Doe
John Doe am 17 Mär. 2021
Bearbeitet: John Doe am 17 Mär. 2021
Hi,
I feel like i'm really close to getting this. I haven't gotten a successful run yet. For thousands of episodes, the agent continues to use actions way out of the limits. I've tried adding the min/max thing for forcing them in the environment. Do you have any tips on how I can make it converge to stay within the limits? I even tried changing the rewards to be equivalent to be close to the limits.
I'm wondering whether this is perhaps a known issue that is on the roadmap to make the agent pick actions within spec limits for the continuous agent?
  5 Kommentare
John Doe
John Doe am 18 Mär. 2021
Here's an example training. I gave it a negative reward for going outside the bounds of the action. This demonstrates how far outside the range the actor is picking. This same thing occurs for more episodes (5000) , although I don't have a screenshot for that. Surely there must be something I"m doing wrong? How can I make this converge?
John Doe
John Doe am 25 Mär. 2021
I had a bug where I was using normalized values instead of the real values! I was able to solve the environment after that after changing the action to discrete! THanks for all your help and this wonderful toolbox!

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Applications finden Sie in Help Center und File Exchange

Produkte


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by