Reinforcement Learning PPO Problem

Question

0 Stimmen

Hi, I work a continuous PPO code nowadays.However, I encountered the following problem with the setup of actor network. Is there anyone who can produce a solution? Thank you.

command window:

Caused by:

Layer 'mean&sdev': Input size mismatch. Size of input to this layer is different from the expected input size.

Inputs to this layer:

from layer 'scale' (output size 2)

from layer 'splus' (output size 2)

%%

% L = 100; % number of neurons

% statePath = [

% featureInputLayer(6,'Normalization','none','Name','observation')

% fullyConnectedLayer(L,'Name','fc1')

% reluLayer('Name','relu1')

% fullyConnectedLayer(L,'Name','fc2')

% additionLayer(2,'Name','add')

% reluLayer('Name','relu2')

% fullyConnectedLayer(L,'Name','fc3')

% reluLayer('Name','relu3')

% fullyConnectedLayer(1,'Name','fc4')];

%

% actionPath = [

% featureInputLayer(2,'Normalization','none','Name','action')

% fullyConnectedLayer(L,'Name','fc5')];

%%

L = 100;

criticNetwork = [

featureInputLayer(6,'Normalization','none','Name','observations')

fullyConnectedLayer(L,'Name','fc1')

reluLayer('Name','relu1')

fullyConnectedLayer(L,'Name','fc2')

reluLayer('Name','relu2')

fullyConnectedLayer(L,'Name','fc3')

reluLayer('Name','relu3')

fullyConnectedLayer(1,'Name','fc4')];

%%

% criticNetwork = layerGraph(statePath);

% criticNetwork = addLayers(criticNetwork,actionPath);

%

% criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');

%% Create critic representation

useGPU = false;

if useGPU

criticOptions.UseDevice = 'gpu';

end

criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',1e-4);

critic = rlValueRepresentation(criticNetwork,observationInfo,...

'Observation',{'observations'},criticOptions);

%%

% input path layers (6 by 1 input and a 2 by 1 output)

inPath = [ featureInputLayer(6,'Normalization','none','Name','observations')

fullyConnectedLayer(L,'Name','fc1')

reluLayer('Name','relu1')

fullyConnectedLayer(L,'Name','fc2')

reluLayer('Name','relu2')

fullyConnectedLayer(2,'Name','fc3')];

% path layers for mean value (2 by 1 input and 2 by 1 output)

% using scalingLayer to scale the range

meanPath = [ tanhLayer('Name','tanh'); % output range: (-1,1)

scalingLayer('Name','scale','Scale',actionInfo.UpperLimit) ]; % output range: (-10,10)

% path layers for standard deviations (2 by 1 input and output)

% using softplus layer to make it non negative

sdevPath = softplusLayer('Name', 'splus');

% conctatenate two inputs (along dimension #3) to form a single (4 by 1) output layer

outLayer = concatenationLayer(3,2,'Name','mean&sdev');

% add layers to network object

actorNetwork = layerGraph(inPath);

actorNetwork = addLayers(actorNetwork,meanPath);

actorNetwork = addLayers(actorNetwork,sdevPath);

actorNetwork = addLayers(actorNetwork,outLayer);

% connect layers: the mean value path output MUST be connected to the FIRST input of the concatenationLayer

actorNetwork = connectLayers(actorNetwork,'fc3','tanh/in'); % connect output of inPath to meanPath input

actorNetwork = connectLayers(actorNetwork,'fc3','splus/in'); % connect output of inPath to sdevPath input

actorNetwork = connectLayers(actorNetwork,'scale','mean&sdev/in1'); % connect output of meanPath to gaussPars input #1

actorNetwork = connectLayers(actorNetwork,'splus','mean&sdev/in2');% connect output of sdevPath to gaussPars input #2

% plot network

plot(actorNetwork);

%%

% actorNetwork = [

% featureInputLayer(6,'Normalization','none','Name','observations')

% fullyConnectedLayer(L,'Name','fc1')

% reluLayer('Name','relu1')

% fullyConnectedLayer(L,'Name','fc2')

% reluLayer('Name','relu2')

% fullyConnectedLayer(L,'Name','fc3')

% softmaxLayer('Name','actionProb')];

actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);

actor = rlStochasticActorRepresentation(actorNetwork,observationInfo,actionInfo,...

'Observation',{'observations'},actorOptions);

%Create actor representation

if useGPU

actorOptions.UseDevice = 'gpu';

end

agentOptions = rlPPOAgentOptions(...

'SampleTime',Tf,...

'ExperienceHorizon',200,...

'ClipFactor',0.2,...

'EntropyLossWeight',0.01,...

'NumEpoch',3,...

'AdvantageEstimateMethod',"gae",...

'GAEFactor',0.95,...

'DiscountFactor',0.99,...

'MiniBatchSize',64);

agentOptions.NoiseOptions.Variance = [0.6;0.1];

agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;

%%

agent = rlPPOAgent(actor,critic,agentOptions);

%%

maxepisodes = 5000;

maxsteps = ceil(Ts/Tf); % Ts: simulation time, Tf: sampled time

trainingOpts = rlTrainingOptions(...

'MaxEpisodes',maxepisodes,...

'MaxStepsPerEpisode',maxsteps,...

'Verbose',true,...

'Plots','training-progress',...

'StopTrainingCriteria','AverageReward',...

'StopTrainingValue',300,...

'SaveAgentCriteria','EpisodeReward',...

'SaveAgentValue',100);

%'ScoreAveragingWindowLength',250,... number concetive episodes

% in workspace,save(opt.SaveAgentDirectory + "/finalAgent.mat",'agent')

%you can show in workspace for example;

%trainOpts =

% rlTrainingOptions with properties:

%

% MaxEpisodes: 1000

% MaxStepsPerEpisode: 1000

% ScoreAveragingWindowLength: 5

% StopTrainingCriteria: "AverageReward"

% StopTrainingValue: 480

% SaveAgentCriteria: "none"

% SaveAgentValue: "none"

% SaveAgentDirectory: "savedAgents"

%%

doTraining = false;

if doTraining

% Train the agent.

trainingStats = train(agent,env,trainingOpts);

else

% Load a pretrained agent for the example.

load('Trains/savedAgents_1/finalAgent','agent')

end

simOptions = rlSimulationOptions('MaxSteps',maxsteps);

experience = sim(env,agent,simOptions);

%%

function in = localResetFcn(in)

% reset

in = setVariable(in,'e1_initial', 0.5*(-1+2*rand)); % random value for lateral deviation

in = setVariable(in,'e2_initial', 0.1*(-1+2*rand)); % random value for relative yaw angle

end

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis am 8 Jan. 2021

Bearbeitet: Emmanouil Tzorakoleftherakis am 8 Jan. 2021

0 Stimmen

Hello,

Please take a look at how to create the actor and critic networks for continuous PPO here. It seems there is a dimension mismatch and following the doc example should help.

If you are using R2020b, there is a new feature that lets you create a PPO agent without creating the actor and critic neural networks - Reinforcement Learning Toolbox will create a default architecture for you that you can then modify as needed. Please take a look at this example to see how to implement this.

2 Kommentare
Keine anzeigen Keine ausblenden

onder kelevic am 10 Jan. 2021

Thank you your answer. I created a PPO agent without creating the actor and critic neural networks.but i still encountered a problem. the learning rate of actor is seen 0.01 in RL Episode Menager, but in my code and workspace it is 1e-4. I work Highway path following control and according to me 0.01 is small of converging. how can i change Actor learnRate in Episode Menager.Do you have any ideas?

actor.Options.LearnRate = 1e-4;

Emmanouil Tzorakoleftherakis am 10 Jan. 2021

You can change the learn rate using rlRepresentaitonOptions

Melden Sie sich an, um zu kommentieren.

Reinforcement Learning PPO Problem

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (1)

2 Kommentare
Keine anzeigen Keine ausblenden

Kategorien

Tags

Community Treasure Hunt

Reinforcement Learning PPO Problem

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (1)

2 Kommentare Keine anzeigen Keine ausblenden

Kategorien

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden