Saved agent always gives constant output no matter how or how much I train it

Question

Abdul Basith Ashraf am 5 Apr. 2021

1
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/792657-saved-agent-always-gives-constant-output-no-matter-how-or-how-much-i-train-it

Bearbeitet: Abdul Basith Ashraf am 8 Apr. 2021

Akzeptierte Antwort: Emmanouil Tzorakoleftherakis

I trained a DDPG RL Agent in Simulink environment. The training looked fine to me and I saved agents in the process.

I trained the RL agent using different networks and the saved agents always gives a const output (namely, the LowerLimit of action)

Please help me. I have been looking for help from the past week.

INPUTMAX = 1E-4;
actionInfo = rlNumericSpec([2 1],'LowerLimit',-INPUTMAX,'UpperLimit', INPUTMAX);
actionInfo.Name = 'Inlet flow rate change';
observationInfo = rlNumericSpec([5 1],'LowerLimit',[300;300;1.64e5;0;0],'UpperLimit',[393;373;6e5;0.01;0.01]);
observationInfo.Name = 'Temperatures, Pressure and flow rates';
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],observationInfo,actionInfo);
L = 25; % number of neurons
%% CRITIC NETWORK
statePath = [
    featureInputLayer(5,'Normalization','none','Name','observation')
    fullyConnectedLayer(L,'Name','fc1')
    reluLayer('Name','relu1')
    concatenationLayer(1,2,"Name",'concat')
    fullyConnectedLayer(29,'Name', 'fc2')
    reluLayer("Name",'relu3')
    fullyConnectedLayer(29,'Name', 'fc3')
    reluLayer('Name','relu2')
    fullyConnectedLayer(1,'Name','fc4')
    ];
actionPath = [
    featureInputLayer(2,'Normalization','none','Name','action')
    fullyConnectedLayer(4,'Name','fcaction')
    reluLayer("Name",'actionrelu')
    ];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
    
criticNetwork = connectLayers(criticNetwork,'actionrelu','concat/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',1e-4,"UseDevice","gpu");
critic = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
%  plot(criticNetwork)
%% ACTOR NETWORK
actorNetwork = [
    featureInputLayer(5,'Normalization','none','Name','observation')
    fullyConnectedLayer(L,'Name','fc1')
    sigmoidLayer('Name','sig1')
    fullyConnectedLayer(L,'Name','fc4')
    reluLayer('Name','relu4')
    fullyConnectedLayer(2,'Name','fc5')
    tanhLayer('Name','tanh1')
    scalingLayer("Name","scale","Scale",INPUTMAX*ones(2,1))
    ];
actorNetwork = layerGraph(actorNetwork);
% plot(actorNetwork)
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-5,"UseDevice","gpu");
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
    'Observation',{'observation'},'Action',{'scale'},actorOptions);
agentOptions = rlDDPGAgentOptions(...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e4,...
    'SampleTime',1,...
    'DiscountFactor',0.99,...
    'MiniBatchSize',64,...
    "NumStepsToLookAhead",1,...
    "SaveExperienceBufferWithAgent",true, ...
    "ResetExperienceBufferBeforeTraining",false);
agentOptions.NoiseOptions.Variance = 0.4;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOptions);
maxepisodes = 1000;
maxsteps = 500;
trainingOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'Verbose',false,...
    'Plots','training-progress',...
    "ScoreAveragingWindowLength",50,...
    "StopTrainingCriteria","AverageSteps",...
    'StopTrainingValue',501,...
    'SaveAgentCriteria',"EpisodeReward", ...
    "SaveAgentValue",0);
trainingOpts.UseParallel = true;
trainingOpts.ParallelizationOptions.Mode = 'async';
trainingStats = train(agent,env,trainingOpts);

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Emmanouil Tzorakoleftherakis am 5 Apr. 2021

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/792657-saved-agent-always-gives-constant-output-no-matter-how-or-how-much-i-train-it#answer_667382

The problem formulation is not correct. I suspect that even during training, you are seeing a lot of bang bang actions. The biggest issue is that the noise variance is pretty big compared to your action range. This needs to be fixed. Take a look at this note, "It is common to set StandardDeviation*sqrt(Ts) to a value between 1% and 10% of your action range"

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Emmanouil Tzorakoleftherakis am 8 Apr. 2021

It decays over global episode steps - so it carries over from episode to episode. Reducing the decay rate would make the agent explore more over time, that may be something to try

Abdul Basith Ashraf am 8 Apr. 2021

Bearbeitet: Abdul Basith Ashraf am 8 Apr. 2021

Also, what is the effect of parallel workers in async mode?

Melden Sie sich an, um zu kommentieren.

Saved agent always gives constant output no matter how or how much I train it

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Saved agent always gives constant output no matter how or how much I train it

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

4 Kommentare 2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden