How to modify actions in experiences during a reinforcement learning training

Hi experts
I am doing a reinforcement learning project using reinforcement learning. The formulated problem has a huge discrete action set. So instead of using a Deep Q learning with discrete actions, I turned to DDPG with continuous action space. What I want to do is that after each time I got an action from the actor network, I discretize it to the closest VALID discrete action. Then what I want to store in the experience is not the original continuous action, but the closest discrete action. The DDPG training in Matlab seems to store the original action generated by the actor network plus noise by default. Is there any way to MODIFY the stored action in the experience before it is pushed in the memory buffer? Thanks!

Antworten (1)

If you are working in Simulink, you can use the "Last Action" port in the RL Agent block to indicate what was the action that was actually applied to the environment.
If your environment is in MATLAB, you can either move it to Simulink with a MATLAB Fcn block and follow the above, or you can write your own custom training loop.

7 Kommentare

Thanks for your reply. You mentioned that I can use "last action" port to indicate what was the action that was actually applied to the environment. Right now I am able to indicate the action applied to the environment, which can be different from the output of the actor network. However, could you please confirm that if I use last action port to indicate the action actually applied to the environment, will this last action be stored in the experience buffer or still the action output of the RL agent block be stored? This is very important to me. Thanks.
The "last action port" value will be the one stored in the experience buffer, not the actual output of the RK agent block
@Emmanouil Tzorakoleftherakis That's great. But I am still confused in how to connect this last action port. It looks like an input of the RL agent block. For the current setup, the output of the RL agent block is applied to the environment. What I hope to realize is that based on the original output of the RL agent block, I can decide the actual action applied to the environment. This actual action will be specified by the last action port and then applied to the environtment. Any advice of how to do the connections? Thanks.
Ignoring observations, reward, IsDone, here is an example:
Also, make sure to only use the last action port with off-policy agents as mentioned in the doc. Hope this helps
@Emmanouil Tzorakoleftherakis That makes a lot of sense. One more question that confuses me is that when calculating the observations (which I assume is the next states), reward and isdone, we need to have the current states information. But from the examples provided in Matlab, I don't see any modules that store the current states of the system. Can I use the observation input in the RL agent block or I should create some variables in Environment module to store the current states? Thanks!
I have created a simulink draft as shown below.
I create a function block to discretize my action actually applied to the environment. The environment is another block on the right with output ports including NextObs, reward, and isdone. The "delay" block on the top right corner is to let the environment derive the next observations based on the previous observation. Could you please help check whether the draft makes sense or not?
Particurly, two questions confuse me:
1) As RL needs to derive next states based on the current states, how do the current states are stored in the environment block?
2) I tried to reset the initial state by doing this
function in = localResetFcn(in,N_UAV)
% Initial state: all fully charged with E_Cap, all start from ground, hr is
%
state = [2*ones(1,N_UAV),zeros(1,N_UAV),4]'; %/E_Cap*2 because of input normalization
blk = sprintf('Env_UAVChg/Environment/NextObs');
in = setBlockParameter(in,blk,'InitialCondition',num2str(state));
end
but I got an error: Outport block does not have a parameter named 'InitialCondition'. Could you please advise how to reset the states for each episode? Thanks

Melden Sie sich an, um zu kommentieren.

Produkte

Version

R2021b

Gefragt:

Ran
am 28 Jul. 2022

Kommentiert:

Ran
am 11 Aug. 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by