How to modify actions in experiences during a reinforcement learning training

Question

0 Stimmen

Hi experts

I am doing a reinforcement learning project using reinforcement learning. The formulated problem has a huge discrete action set. So instead of using a Deep Q learning with discrete actions, I turned to DDPG with continuous action space. What I want to do is that after each time I got an action from the actor network, I discretize it to the closest VALID discrete action. Then what I want to store in the experience is not the original continuous action, but the closest discrete action. The DDPG training in Matlab seems to store the original action generated by the actor network plus noise by default. Is there any way to MODIFY the stored action in the experience before it is pushed in the memory buffer? Thanks!

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Ran am 29 Jul. 2022

@Emmanouil Tzorakoleftherakis

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis am 29 Jul. 2022

1 Stimme

If you are working in Simulink, you can use the "Last Action" port in the RL Agent block to indicate what was the action that was actually applied to the environment.

If your environment is in MATLAB, you can either move it to Simulink with a MATLAB Fcn block and follow the above, or you can write your own custom training loop.

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Ran am 9 Aug. 2022

@Emmanouil Tzorakoleftherakis That makes a lot of sense. One more question that confuses me is that when calculating the observations (which I assume is the next states), reward and isdone, we need to have the current states information. But from the examples provided in Matlab, I don't see any modules that store the current states of the system. Can I use the observation input in the RL agent block or I should create some variables in Environment module to store the current states? Thanks!

Ran am 11 Aug. 2022

In MATLAB Online öffnen

Hi @Emmanouil Tzorakoleftherakis

I have created a simulink draft as shown below.

I create a function block to discretize my action actually applied to the environment. The environment is another block on the right with output ports including NextObs, reward, and isdone. The "delay" block on the top right corner is to let the environment derive the next observations based on the previous observation. Could you please help check whether the draft makes sense or not?

Particurly, two questions confuse me:

1) As RL needs to derive next states based on the current states, how do the current states are stored in the environment block?

2) I tried to reset the initial state by doing this

function in = localResetFcn(in,N_UAV)
% Initial state: all fully charged with E_Cap, all start from ground, hr is
%
state = [2*ones(1,N_UAV),zeros(1,N_UAV),4]';  %/E_Cap*2 because of input normalization
blk = sprintf('Env_UAVChg/Environment/NextObs');
in = setBlockParameter(in,blk,'InitialCondition',num2str(state));
end

but I got an error: Outport block does not have a parameter named 'InitialCondition'. Could you please advise how to reset the states for each episode? Thanks

Melden Sie sich an, um zu kommentieren.

How to modify actions in experiences during a reinforcement learning training

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Antworten (1)

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

How to modify actions in experiences during a reinforcement learning training

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Antworten (1)

7 Kommentare 5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden