Custom Action Space DDPG Reinforcement Learning Agent

Question

0 Stimmen

After running into a challenge with my reinforcement learning agent I hope you can help me with at least a little hint.

My DDPG agent has a continuous action space which works totally fine. Unfortunately it cannot get transfered to a real-life system this way. Trying to find an optimal value for the actions in different situations the agent should avoid certain combinations.

The action space is defined like:

actionInfo = rlNumericSpec([4 1], ...
                           'LowerLimit', [0; 0; 0; 0], ...
                           'UpperLimit', [maxA1; maxA2; maxA3; maxA4]);

But due to restrictions in the real-life system it should more be like

A1 = (0 || [minA1; maxA1])

to avoid actions in the range

A1 = ]0; minA1[

Is there any possibility to define my action space this way?

Note:

I have already tried to route the agent to avoid actions in this range by penalizing it via the reward but it doesn't seem to work out. Instead of steadily improving over the episodes it now tends more to a sideways movement after reaching a certain (not desirable) level.

Thanks in advance!

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis am 4 Mär. 2020

0 Stimmen

To my knowledge, you cannot implement a custom action space with rlNumericSpec, but what you could possibly do (since adding penalty terms in the reward does not help), is to add some additional logic to manipulate the agent's actions/output of RL agent block. Your policy would then be the combined neural network+new logic. Just an idea

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Emmanouil Tzorakoleftherakis am 5 Mär. 2020

This will change the data stored in experience buffers/mini batches during training, as well as logged data when you perform simulations after training. For the latter, you can just choose to log the respective signal after the action transformation. For the former, I don't think it will cause issues. You can think of the additional logic as an extra layer in your neural network that only does algebraic manipulations (like a scaling layer for instance). There are no weights/parameters to be learned.

The three candidate places you mentioned should lead to the same results. Just for visualization purposes (I am assuming you use Simulink since you mentioned 'AgentWrapper'), I would add the logic right after the agent block, and put both under a separate subsystem so that you can treat the agent+logic as your new decision making system.

Hans-Joachim Steinort am 6 Mär. 2020

Bearbeitet: Hans-Joachim Steinort am 12 Mär. 2020

Thank you for your explanation!

This actually helped me to wrap my head around this issue. I will definitively try out your suggestion with the additional logic and will come back to you afterwards.

EDIT:

It worked the way you suggested, thanks a lot!

Melden Sie sich an, um zu kommentieren.

Custom Action Space DDPG Reinforcement Learning Agent

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

Custom Action Space DDPG Reinforcement Learning Agent

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare 1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen 1 älteren Kommentar ausblenden