Reinforcement Learning PPO Problem

Hi, I work a continuous PPO code nowadays.However, I encountered the following problem with the setup of actor network. Is there anyone who can produce a solution? Thank you.
command window:
Caused by:
Layer 'mean&sdev': Input size mismatch. Size of input to this layer is different from the expected input size.
Inputs to this layer:
from layer 'scale' (output size 2)
from layer 'splus' (output size 2)
%%
% L = 100; % number of neurons
% statePath = [
% featureInputLayer(6,'Normalization','none','Name','observation')
% fullyConnectedLayer(L,'Name','fc1')
% reluLayer('Name','relu1')
% fullyConnectedLayer(L,'Name','fc2')
% additionLayer(2,'Name','add')
% reluLayer('Name','relu2')
% fullyConnectedLayer(L,'Name','fc3')
% reluLayer('Name','relu3')
% fullyConnectedLayer(1,'Name','fc4')];
%
% actionPath = [
% featureInputLayer(2,'Normalization','none','Name','action')
% fullyConnectedLayer(L,'Name','fc5')];
%%
L = 100;
criticNetwork = [
featureInputLayer(6,'Normalization','none','Name','observations')
fullyConnectedLayer(L,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(L,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(L,'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(1,'Name','fc4')];
%%
% criticNetwork = layerGraph(statePath);
% criticNetwork = addLayers(criticNetwork,actionPath);
%
% criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
%% Create critic representation
useGPU = false;
if useGPU
criticOptions.UseDevice = 'gpu';
end
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
critic = rlValueRepresentation(criticNetwork,observationInfo,...
'Observation',{'observations'},criticOptions);
%%
% input path layers (6 by 1 input and a 2 by 1 output)
inPath = [ featureInputLayer(6,'Normalization','none','Name','observations')
fullyConnectedLayer(L,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(L,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(2,'Name','fc3')];
% path layers for mean value (2 by 1 input and 2 by 1 output)
% using scalingLayer to scale the range
meanPath = [ tanhLayer('Name','tanh'); % output range: (-1,1)
scalingLayer('Name','scale','Scale',actionInfo.UpperLimit) ]; % output range: (-10,10)
% path layers for standard deviations (2 by 1 input and output)
% using softplus layer to make it non negative
sdevPath = softplusLayer('Name', 'splus');
% conctatenate two inputs (along dimension #3) to form a single (4 by 1) output layer
outLayer = concatenationLayer(3,2,'Name','mean&sdev');
% add layers to network object
actorNetwork = layerGraph(inPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,sdevPath);
actorNetwork = addLayers(actorNetwork,outLayer);
% connect layers: the mean value path output MUST be connected to the FIRST input of the concatenationLayer
actorNetwork = connectLayers(actorNetwork,'fc3','tanh/in'); % connect output of inPath to meanPath input
actorNetwork = connectLayers(actorNetwork,'fc3','splus/in'); % connect output of inPath to sdevPath input
actorNetwork = connectLayers(actorNetwork,'scale','mean&sdev/in1'); % connect output of meanPath to gaussPars input #1
actorNetwork = connectLayers(actorNetwork,'splus','mean&sdev/in2');% connect output of sdevPath to gaussPars input #2
% plot network
plot(actorNetwork);
%%
% actorNetwork = [
% featureInputLayer(6,'Normalization','none','Name','observations')
% fullyConnectedLayer(L,'Name','fc1')
% reluLayer('Name','relu1')
% fullyConnectedLayer(L,'Name','fc2')
% reluLayer('Name','relu2')
% fullyConnectedLayer(L,'Name','fc3')
% softmaxLayer('Name','actionProb')];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
actor = rlStochasticActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'observations'},actorOptions);
%Create actor representation
if useGPU
actorOptions.UseDevice = 'gpu';
end
agentOptions = rlPPOAgentOptions(...
'SampleTime',Tf,...
'ExperienceHorizon',200,...
'ClipFactor',0.2,...
'EntropyLossWeight',0.01,...
'NumEpoch',3,...
'AdvantageEstimateMethod',"gae",...
'GAEFactor',0.95,...
'DiscountFactor',0.99,...
'MiniBatchSize',64);
agentOptions.NoiseOptions.Variance = [0.6;0.1];
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
%%
agent = rlPPOAgent(actor,critic,agentOptions);
%%
maxepisodes = 5000;
maxsteps = ceil(Ts/Tf); % Ts: simulation time, Tf: sampled time
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'Verbose',true,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',300,...
'SaveAgentCriteria','EpisodeReward',...
'SaveAgentValue',100);
%'ScoreAveragingWindowLength',250,... number concetive episodes
% in workspace,save(opt.SaveAgentDirectory + "/finalAgent.mat",'agent')
%you can show in workspace for example;
%trainOpts =
% rlTrainingOptions with properties:
%
% MaxEpisodes: 1000
% MaxStepsPerEpisode: 1000
% ScoreAveragingWindowLength: 5
% StopTrainingCriteria: "AverageReward"
% StopTrainingValue: 480
% SaveAgentCriteria: "none"
% SaveAgentValue: "none"
% SaveAgentDirectory: "savedAgents"
%%
doTraining = false;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainingOpts);
else
% Load a pretrained agent for the example.
load('Trains/savedAgents_1/finalAgent','agent')
end
simOptions = rlSimulationOptions('MaxSteps',maxsteps);
experience = sim(env,agent,simOptions);
%%
function in = localResetFcn(in)
% reset
in = setVariable(in,'e1_initial', 0.5*(-1+2*rand)); % random value for lateral deviation
in = setVariable(in,'e2_initial', 0.1*(-1+2*rand)); % random value for relative yaw angle
end

Antworten (1)

Emmanouil Tzorakoleftherakis
Bearbeitet: Emmanouil Tzorakoleftherakis am 8 Jan. 2021

0 Stimmen

Hello,
Please take a look at how to create the actor and critic networks for continuous PPO here. It seems there is a dimension mismatch and following the doc example should help.
If you are using R2020b, there is a new feature that lets you create a PPO agent without creating the actor and critic neural networks - Reinforcement Learning Toolbox will create a default architecture for you that you can then modify as needed. Please take a look at this example to see how to implement this.

2 Kommentare

onder kelevic
onder kelevic am 10 Jan. 2021
Thank you your answer. I created a PPO agent without creating the actor and critic neural networks.but i still encountered a problem. the learning rate of actor is seen 0.01 in RL Episode Menager, but in my code and workspace it is 1e-4. I work Highway path following control and according to me 0.01 is small of converging. how can i change Actor learnRate in Episode Menager.Do you have any ideas?
actor.Options.LearnRate = 1e-4;
You can change the learn rate using rlRepresentaitonOptions

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by