Enforce action space constraints within the environment

Question

John Doe am 24 Feb. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/754489-enforce-action-space-constraints-within-the-environment

Kommentiert: John Doe am 25 Mär. 2021

Akzeptierte Antwort: Emmanouil Tzorakoleftherakis

Hi,

My agent is training!!! But it's pretty much 0 reward every episode right now. I think it might be due to this:

contActor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.

How can I do this?

Also, is there a way to view the logged signals as the agent is training?

Thanks!

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

John Doe am 24 Feb. 2021

There's something odd going on. It's not 0 reward, but it's not growing. I do have that first action method i said implemented in the other question (so for 4 of the continuous actions, it only chooses the first action) and for 1 action it's used every time step. I guess i need to check the logged signals to really determine what's going on. I'm too excited to make it work on the first or second try lol

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Emmanouil Tzorakoleftherakis am 24 Feb. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/754489-enforce-action-space-constraints-within-the-environment#answer_632299

If the environment is in Simulink, you can setup scopes and observe what's happening during training. If the environment is in MATLAB, you need to do some extra work and plot things yourself.

For your contraints question, which agent are you using? Some agents are stochastic and some like DDPG add noise for exploration on top of the action output. To be certain, you can use a saturation block in Simulink or an if statement to clip the action as needed in MATLAB.

28 Kommentare
26 ältere Kommentare anzeigen26 ältere Kommentare ausblenden

John Doe am 26 Feb. 2021

In MATLAB Online öffnen

actionInfo = getActionInfo(env);
observationInfo = getObservationInfo(env);
numObs = observationInfo.Dimension(1);
numAct = actionInfo.Dimension(1);
disp([numObs numAct])
criticLayerSizes = [400 300];
actorLayerSizes = [400 300];
criticNetwork = [
        featureInputLayer(numObs,'Normalization','none','Name','observation')
        fullyConnectedLayer(criticLayerSizes(1),'Name','CriticFC1', ...
            'Weights',sqrt(2/numObs)*(rand(criticLayerSizes(1),numObs)-0.5), ...
            'Bias',1e-3*ones(criticLayerSizes(1),1))
        reluLayer('Name','CriticRelu1')
        fullyConnectedLayer(criticLayerSizes(2),'Name','CriticFC2', ...
            'Weights',sqrt(2/criticLayerSizes(1))*(rand(criticLayerSizes(2),criticLayerSizes(1))-0.5), ...
            'Bias',1e-3*ones(criticLayerSizes(2),1))
        reluLayer('Name','CriticRelu2')
        fullyConnectedLayer(1,'Name','CriticOutput', ...
            'Weights',sqrt(2/criticLayerSizes(2))*(rand(1,criticLayerSizes(2))-0.5), ...
            'Bias',1e-3)];
Create the critic representation.
criticOpts = rlRepresentationOptions('LearnRate',1e-4);
critic = rlValueRepresentation(criticNetwork,observationInfo,'Observation',{'observation'},criticOpts);
Create the actor using a deep neural network with six inputs and two outputs. The outputs of the actor network are the probabilities of taking each possible action pair. Each action pair contains normalized action values for each thruster. The environment step function scales these values to determine the actual thrust values.
actorNetwork = [featureInputLayer(numObs,'Normalization','none','Name','observation')
        fullyConnectedLayer(actorLayerSizes(1),'Name','ActorFC1', ...
            'Weights',sqrt(2/numObs)*(rand(actorLayerSizes(1),numObs)-0.5), ...
            'Bias',1e-3*ones(actorLayerSizes(1),1))
        reluLayer('Name','ActorRelu1')
        fullyConnectedLayer(actorLayerSizes(2),'Name','ActorFC2', ...
            'Weights',sqrt(2/actorLayerSizes(1))*(rand(actorLayerSizes(2),actorLayerSizes(1))-0.5), ...
            'Bias',1e-3*ones(actorLayerSizes(2),1))
        reluLayer('Name', 'ActorRelu2')
        fullyConnectedLayer(numAct*2,'Name','Action', ...
            'Weights',sqrt(2/actorLayerSizes(2))*(rand(numAct*2,actorLayerSizes(2))-0.5), ...
            'Bias',1e-3*ones(numAct*2,1))
        softmaxLayer('Name','actionProb')];
Create the actor using a stochastic actor representation. 
actorOpts = rlRepresentationOptions('LearnRate',1e-4);
actor = rlStochasticActorRepresentation(actorNetwork,observationInfo,actionInfo,...
    'Observation',{'observation'},actorOpts);
Specify the agent hyperparameters using an rlPPOAgentOptions object.
agentOpts = rlPPOAgentOptions(...
                'ExperienceHorizon',600,... 
                'ClipFactor',0.02,...
                'EntropyLossWeight',0.01,...
                'MiniBatchSize',128,...
                'NumEpoch',3,...
                'AdvantageEstimateMethod','gae',...
                'GAEFactor',0.95,...
                'SampleTime',Ts,...
                'DiscountFactor',0.997);

John Doe am 1 Mär. 2021

Bearbeitet: John Doe am 1 Mär. 2021

In MATLAB Online öffnen

I've used the continuous 1 and 2 input examples to set up the agent now and I think I'm really close, but I can't figure this out for 5 real actions with their own limits. I've simplified things and it still throws mismatch errors. Is there another example with more than 2 actions that I can follow. It's perplexing to figure out which ones to change using the 1 and 2 action examples. Whether something's supposed to be 5, 10 or 1 or 2 :)

Error using rl.representation.model.rlLayerModel/createInternalNeuralNetwork (line 701)

Invalid network.

Error in rl.representation.model.rlLayerModel/buildNetwork (line 681)

this.Assembler, this.AnalyzedLayers,this.NetworkInfo] = createInternalNeuralNetwork(this);

Error in rl.representation.model.rlLayerModel (line 68)

this = buildNetwork(this);

Error in rl.util.createInternalModelFactory (line 14)

Model = rl.representation.model.rlLayerModel(Model, UseDevice, ObservationNames, ActionNames);

Error in rlStochasticActorRepresentation (line 137)

Model = rl.util.createInternalModelFactory(Model, Options, ObservationNames, ActionNames, InputSize, OutputSize);

Caused by:

Layer 'mean&sdev': Input size mismatch. Size of input to this layer is different from the expected input size.

Inputs to this layer:

from layer 'mp_out' (output size 5)

from layer 'splus' (output size 5)

My code is:

numObs = obsInfo.Dimension(1);
numAct = actInfo.Dimension(1);
disp([numObs, numAct])
8 5
% input path layers (8 by 1 input and a 5 by 1 output)
inPath = [ 
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(10,'Name', 'ip_fc')  % 10 by 1 output
    reluLayer('Name', 'ip_relu')             % nonlinearity
    fullyConnectedLayer(5,'Name','ip_out') ];% 5 by 1 output
meanPath = [
%     fullyConnectedLayer(5,'Name', 'mp_fc1') % 5 by 1 output
%     reluLayer('Name', 'mp_relu')             % nonlinearity
%     fullyConnectedLayer(5,'Name','mp_fc2');  % 5 by 1 output
    tanhLayer('Name','mp_tanh');                % output range: (-1,1)
    scalingLayer('Name','mp_out','Scale',actInfo.UpperLimit(1)) ]; 
sdevPath = [
%     fullyConnectedLayer(5,'Name', 'sp_fc1') % 5 by 1 output
%     reluLayer('Name', 'sp_relu')             % nonlinearity
%     fullyConnectedLayer(5,'Name','sp_fc2');  % 5 by 1 output
    softplusLayer('Name', 'splus') ];       % output range: (0,+Inf)
% 
% % conctatenate two inputs (along dimension #3) to form a single (10 by 1) output layer
outLayer = concatenationLayer(3,2,'Name','mean&sdev');
disp(outLayer);
% add layers to layerGraph network object
actorNet = layerGraph(inPath);
actorNet = addLayers(actorNet,meanPath);
actorNet = addLayers(actorNet,sdevPath);
actorNet = addLayers(actorNet,outLayer);
% connect layers: the mean value path output MUST be connected to the FIRST input of the concatenation layer
actorNet = connectLayers(actorNet,'ip_out','mp_tanh/in');   % connect output of inPath to meanPath input
actorNet = connectLayers(actorNet,'ip_out','splus/in');   % connect output of inPath to sdevPath input
actorNet = connectLayers(actorNet,'mp_out','mean&sdev/in1');% connect output of meanPath to mean&sdev input #1
actorNet = connectLayers(actorNet,'splus','mean&sdev/in2');% connect output of sdevPath to mean&sdev input #2

John Doe am 2 Mär. 2021

In MATLAB Online öffnen

actorNet = getModel(getActor(agent));
criticNet = getModel(getCritic(agent));
criticNet.Layers
ans = 
  7×1 Layer array with layers:
     1   'input_1'              Feature Input       8 features
     2   'fc_1'                 Fully Connected     256 fully connected layer
     3   'relu_body'            ReLU                ReLU
     4   'fc_body'              Fully Connected     256 fully connected layer
     5   'body_output'          ReLU                ReLU
     6   'output'               Fully Connected     1 fully connected layer
     7   'RepresentationLoss'   Regression Output   mean-squared-error
actorNet.Layers
ans = 
  12×1 Layer array with layers:
     1   'input_1'              Feature Input       8 features
     2   'fc_1'                 Fully Connected     256 fully connected layer
     3   'relu_body'            ReLU                ReLU
     4   'fc_body'              Fully Connected     256 fully connected layer
     5   'body_output'          ReLU                ReLU
     6   'fc_mean'              Fully Connected     5 fully connected layer
     7   'tanh'                 Tanh                Hyperbolic tangent
     8   'scale'                ScalingLayer        Scaling layer
     9   'fc_std'               Fully Connected     5 fully connected layer
    10   'std'                  SoftplusLayer       Softplus layer
    11   'output'               Concatenation       Concatenation of 2 inputs along dimension 1
    12   'RepresentationLoss'   Regression Output   mean-squared-error
test = getAction(agent,{rand(obsInfo(1).Dimension)})
celldisp(test);
test{1} =
 
   26.4674
    3.2738
    2.3261
    1.9611
    2.5504
 
test = getAction(agent,{[ 3001 -6.8012 3.1039 354.7631 0 0 8.2629e+04 0 ]})
celldisp(test) 
test{1} =
 
   1.0e+03 *
    0.0020
    0.0223
   -5.1131
   -1.0185
   -6.1576
 % And during training some action examples align with the example one above:
 1.0e+03 *
 
 0.0020
 0.0012
 -8.4782
 4.2069
 -2.3160
 
 1.0e+03 *
 0.0020
 -0.0054
 6.1911
 5.2785
 0.6783
 
 

John Doe am 2 Mär. 2021

In MATLAB Online öffnen

Action spec:

action_spec = { { ["action1; " ],                                                         [ 2 ],                       [ 50 ]         } ...
                { ["action2;"; "action3;";  "action4;"; "action5;"; ],                    [ 0.01; 0.01; 0.01; 0.01 ],  [4; 4; 4; 4 ]  } ...
                        } ;
            [ action_info_combined_name, action_info_combined_lower_limits, action_info_combined_upper_limits ]  = retrieveCombinedSpecColumn(action_spec);
            
            ActionInfo.Name = action_info_combined_name;
            ActionInfo = rlNumericSpec([5 1],'LowerLimit', action_info_combined_lower_limits, 'UpperLimit', action_info_combined_upper_limits); 
function [ name, upper, lower ] = retrieveCombinedSpecColumn(spec)
    % function to combine action spec columns into one object for specification to RL
    for index = 1:3
        % Start with first row
        concated_array_column = spec{1}{index};
        % loop through all the remaining action spec rows
       for i=2:length(spec)
            % gobble up action info specs into one obj array
            if i>2
                joined_array = obj;
            else
                joined_array = concated_array_column;
            end
            obj = cat(1, joined_array, spec{i}{index});
           
       end
       if index == 1
           name = obj;
       elseif index == 2
           upper = obj;
       elseif index == 3
           lower = obj;
       end
    end
end
>> ActionInfo
ActionInfo = 
  rlNumericSpec with properties:
     LowerLimit: [5×1 double]
     UpperLimit: [5×1 double]
           Name: [0×0 string]
    Description: [0×0 string]
      Dimension: [5 1]
       DataType: "double"
>> ActionInfo.UpperLimit
ans =
    50
     4
     4
     4
     4
>> ActionInfo.LowerLimit
ans =
    2.0000
    0.0100
    0.0100
    0.0100
    0.0100
>> ActionInfo.Name
ans = 
  0×0 empty string array

John Doe am 2 Mär. 2021

Bearbeitet: John Doe am 2 Mär. 2021

How can I do the scaling of the inputs to the network? That seems like the best way forward.

The environment is already constraining the actions, but the training is extremely sample inefficient and basically bouncing across the upper and lower limits of the actions for hundreds of episodes.

Emmanouil Tzorakoleftherakis am 3 Mär. 2021

multiply the observations inside the 'step' function with a number that makes sense

Melden Sie sich an, um zu kommentieren.

Answer 2

John Doe am 17 Mär. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/754489-enforce-action-space-constraints-within-the-environment#answer_649852

Bearbeitet: John Doe am 17 Mär. 2021

Hi,

I feel like i'm really close to getting this. I haven't gotten a successful run yet. For thousands of episodes, the agent continues to use actions way out of the limits. I've tried adding the min/max thing for forcing them in the environment. Do you have any tips on how I can make it converge to stay within the limits? I even tried changing the rewards to be equivalent to be close to the limits.

I'm wondering whether this is perhaps a known issue that is on the roadmap to make the agent pick actions within spec limits for the continuous agent?

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

John Doe am 18 Mär. 2021

Here's an example training. I gave it a negative reward for going outside the bounds of the action. This demonstrates how far outside the range the actor is picking. This same thing occurs for more episodes (5000) , although I don't have a screenshot for that. Surely there must be something I"m doing wrong? How can I make this converge?

John Doe am 25 Mär. 2021

I had a bug where I was using normalized values instead of the real values! I was able to solve the environment after that after changing the action to discrete! THanks for all your help and this wonderful toolbox!

Melden Sie sich an, um zu kommentieren.

Enforce action space constraints within the environment

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

28 Kommentare
26 ältere Kommentare anzeigen26 ältere Kommentare ausblenden

Weitere Antworten (1)

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Enforce action space constraints within the environment

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

28 Kommentare 26 ältere Kommentare anzeigen26 ältere Kommentare ausblenden

Weitere Antworten (1)

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

28 Kommentare
26 ältere Kommentare anzeigen26 ältere Kommentare ausblenden

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden