PPO with CNN layer

Question

Fabien SANCHEZ am 30 Mai 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2124061-ppo-with-cnn-layer

Kommentiert: Fabien SANCHEZ am 18 Jun. 2024

Hi!

I have a PPO agent with CNN layer which I'm trying to train on a simulink environnment. There is the agent creation (only the CNN part which still not work alone) and the training setup.

%% Create Deep Learning Network Architecture
%% Actor Net
net = dlnetwork;
sequencenet = [
    sequenceInputLayer(242,Name="input_2")
    convolution1dLayer(3,32,"Name","conv1d","Padding","causal")
    reluLayer("Name","relu_body_1")
    globalAveragePooling1dLayer("Name","gapool1d")
    fullyConnectedLayer(32,"Name","fc_2")];
net = addLayers(net,sequencenet);
tempNet = [
    reluLayer("Name","body_output")
    fullyConnectedLayer(3,"Name","fc_action")
    softmaxLayer("Name","output")];
net = addLayers(net,tempNet);
clear tempNet;
net = connectLayers(net,"fc_2","body_output");
actor_net = initialize(net);
%% Critic Net
net = dlnetwork;
sequencenet = [
    sequenceInputLayer(242,Name="input_2")
    convolution1dLayer(3,32,"Name","conv1d","Padding","causal")
    reluLayer("Name","relu_body_1")
    globalAveragePooling1dLayer("Name","gapool1d")
    fullyConnectedLayer(32,"Name","fc_2")];
 net = addLayers(net,sequencenet);
tempNet = [
    reluLayer("Name","body_output")
    fullyConnectedLayer(1,"Name","output")];
net = addLayers(net,tempNet);
clear tempNet;
net = connectLayers(net,"fc_2","body_output");
critic_net = initialize(net);
%% Training
%%
initOpts = rlAgentInitializationOptions(NumHiddenUnit=64);
actor = rlDiscreteCategoricalActor(actor_net,obsInfo,actInfo,ObservationInputNames={"input_2"},UseDevice="gpu");
critic = rlValueFunction(critic_net,obsInfo,ObservationInputNames={"input_2"},UseDevice="gpu");
actorOpts = rlOptimizerOptions(LearnRate=1e-3,GradientThreshold=1);
criticOpts = rlOptimizerOptions(LearnRate=1e-3,GradientThreshold=1);
agentOpts = rlPPOAgentOptions(...
    ExperienceHorizon=512,...
    MiniBatchSize=128,...
    ClipFactor=0.02,...
    EntropyLossWeight=0.01,...
    ActorOptimizerOptions=actorOpts,...
    CriticOptimizerOptions=criticOpts,...
    NumEpoch=3,...
    AdvantageEstimateMethod="gae",...
    GAEFactor=0.95,...
    SampleTime=10,...
    DiscountFactor=0.997,...
    MaxMiniBatchPerEpoch=100);
agent = rlPPOAgent(actor,critic,agentOpts);
%%
trainOpts = rlTrainingOptions(...
    MaxEpisodes=1000,...
    MaxStepsPerEpisode=8640,...
    Plots="training-progress",...
    StopTrainingCriteria="AverageReward",...
    StopTrainingValue=0,...
    ScoreAveragingWindowLength=5);
simOptions = rlSimulationOptions(MaxSteps=600);
simOptions.NumSimulations = 5;
trainingStats = train(agent,env, trainOpts);

To be able to train, I have create a simulink environnement created by the following code:

%% Setup env
obs(1) = Simulink.BusElement;
obs(1).Dimensions = [242,1];
obs(1).Name = "schedule_sequence";
obsBus.Elements = obs;
obsInfo = bus2RLSpec("obsBus","Model",mdl);
actInfo = rlFiniteSetSpec(linspace(0,1,3));
agent=mdl+"/Stateflow control/RL Agent";
%% Create env
env = rlSimulinkEnv(mdl,agent,obsInfo,actInfo);
env.ResetFcn = @(in) myResetFunction(in, 0,par_irradiance,par_load,par_schedule, capacity_factor );
validateEnvironment(env)

There is a custom ResetFcn which help to correctly reset the loaded data into the model.

Once I start the training, it goes well until the 100th batch (what ever the batchsize, either it stops on a timestep batch_size*100, or at the end of the episode) and I have the following error. It should be noticed that the input layer take 242(C)x1(B)x1(T) data due to constraint of Simulink meanwhile 1(C)x1(B)x242(T) would be appropriated. It cannot support dlarray.

Error using rl.internal.train.PPOTrainer/run_internal_/nestedRunEpisode (line 335)
There was an error executing the ProcessExperienceFcn for block "PAR_Discrete_SinglePhaseGridSolarPV_modified_typeRL_3/Stateflow control/RL Agent".
Caused by:
	Error using gpuArray/reshape
	Number of elements must not change. Use [] as one of the size inputs to automatically calculate the appropriate size for that dimension.

The idea is to train a PPO agent to control a battery system using timesequences as input, like powerschedule. Ever before it worked well with scalar features and I wanted to try timesequences. Thanks for reading.

2 Kommentare
Keine anzeigenKeine ausblenden

Joss Knight am 14 Jun. 2024

Does your input data have 242 channels or is that the sequence length? You should permute the data, not the formats. Your error looks to be a result of the sequence length changing after the 100th iteration but your network cannot tolerate that because you are injecting the sequence as channels and the number of channels isn't allowed to change. It also won't be computing the right results.

Fabien SANCHEZ am 17 Jun. 2024

Bearbeitet: Fabien SANCHEZ am 17 Jun. 2024

Hi, thanks for reading.

My input is a sequence of 242 timesteps. I'm actually struggling to force the simulink to recognise the sequence input as 242 timesteps long rather than 242 channels. In fact, I cannot send Dlarray directly to the reinforcement learning agent as this sequence comes from the simulink itself.

I have tried changing the format from [242 1] to [1 242] before posting, and doing so, I have changed the size of the convolution layer to 1. This does not change the error.

My sequence size is always the same during the simulation and even if I change it, the error still happens on the 100th run.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Drew Davis am 17 Jun. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2124061-ppo-with-cnn-layer#answer_1473046

In MATLAB Online öffnen

The problem here is that the sequenceInputLayer aligns its size input with the “C” dimension. Therefore, when you specify the input layer as sequenceInputLayer(242,…) a data dimension of [242,1,1] with format CBT is implied. Since your networks do not have any recurrence, the easiest way to work around this issue is to use inputLayer instead of sequenceInputLayer. That way you can easily specify which dimension aligns with the sequence dimension of your data like so:

sequencenet = [
    inputLayer([1 242 nan],"CTB",Name="input_2")
    convolution1dLayer(3,32,"Name","conv1d","Padding","causal")
    reluLayer("Name","relu_body_1")
    globalAveragePooling1dLayer("Name","gapool1d")
    fullyConnectedLayer(32,"Name","fc_2")];

In addition, you will also want to specify the observation dimension as [1 242] to properly align with the "T" dimension of the network.

obs(1) = Simulink.BusElement;
obs(1).Dimensions = [1,242];
obs(1).Name = "schedule_sequence";
obsBus.Elements = obs;
obsInfo = bus2RLSpec("obsBus","Model",mdl);

We recognize that the error message received during training was not helpful and will work on providing a more meaningful error in cases like this.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Fabien SANCHEZ am 18 Jun. 2024

Hello,

I tried to force format detection using SequenceInputLayer but not the InputLayer. And the [1 242 NaN] tip is a masterstroke. This will open up new performance possibilities for my work. Thank you very much.

Melden Sie sich an, um zu kommentieren.

PPO with CNN layer

2 Kommentare
Keine anzeigenKeine ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

PPO with CNN layer

2 Kommentare Keine anzeigenKeine ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

2 Kommentare
Keine anzeigenKeine ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden