Saved agent always gives constant output no matter how or how much I train it
9 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Abdul Basith Ashraf
am 5 Apr. 2021
Bearbeitet: Abdul Basith Ashraf
am 8 Apr. 2021
I trained a DDPG RL Agent in Simulink environment. The training looked fine to me and I saved agents in the process.
I trained the RL agent using different networks and the saved agents always gives a const output (namely, the LowerLimit of action)
Please help me. I have been looking for help from the past week.
INPUTMAX = 1E-4;
actionInfo = rlNumericSpec([2 1],'LowerLimit',-INPUTMAX,'UpperLimit', INPUTMAX);
actionInfo.Name = 'Inlet flow rate change';
observationInfo = rlNumericSpec([5 1],'LowerLimit',[300;300;1.64e5;0;0],'UpperLimit',[393;373;6e5;0.01;0.01]);
observationInfo.Name = 'Temperatures, Pressure and flow rates';
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],observationInfo,actionInfo);
L = 25; % number of neurons
%% CRITIC NETWORK
statePath = [
featureInputLayer(5,'Normalization','none','Name','observation')
fullyConnectedLayer(L,'Name','fc1')
reluLayer('Name','relu1')
concatenationLayer(1,2,"Name",'concat')
fullyConnectedLayer(29,'Name', 'fc2')
reluLayer("Name",'relu3')
fullyConnectedLayer(29,'Name', 'fc3')
reluLayer('Name','relu2')
fullyConnectedLayer(1,'Name','fc4')
];
actionPath = [
featureInputLayer(2,'Normalization','none','Name','action')
fullyConnectedLayer(4,'Name','fcaction')
reluLayer("Name",'actionrelu')
];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'actionrelu','concat/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',1e-4,"UseDevice","gpu");
critic = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
'Observation',{'observation'},'Action',{'action'},criticOptions);
% plot(criticNetwork)
%% ACTOR NETWORK
actorNetwork = [
featureInputLayer(5,'Normalization','none','Name','observation')
fullyConnectedLayer(L,'Name','fc1')
sigmoidLayer('Name','sig1')
fullyConnectedLayer(L,'Name','fc4')
reluLayer('Name','relu4')
fullyConnectedLayer(2,'Name','fc5')
tanhLayer('Name','tanh1')
scalingLayer("Name","scale","Scale",INPUTMAX*ones(2,1))
];
actorNetwork = layerGraph(actorNetwork);
% plot(actorNetwork)
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-5,"UseDevice","gpu");
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'observation'},'Action',{'scale'},actorOptions);
agentOptions = rlDDPGAgentOptions(...
'TargetSmoothFactor',1e-3,...
'ExperienceBufferLength',1e4,...
'SampleTime',1,...
'DiscountFactor',0.99,...
'MiniBatchSize',64,...
"NumStepsToLookAhead",1,...
"SaveExperienceBufferWithAgent",true, ...
"ResetExperienceBufferBeforeTraining",false);
agentOptions.NoiseOptions.Variance = 0.4;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOptions);
maxepisodes = 1000;
maxsteps = 500;
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'Verbose',false,...
'Plots','training-progress',...
"ScoreAveragingWindowLength",50,...
"StopTrainingCriteria","AverageSteps",...
'StopTrainingValue',501,...
'SaveAgentCriteria',"EpisodeReward", ...
"SaveAgentValue",0);
trainingOpts.UseParallel = true;
trainingOpts.ParallelizationOptions.Mode = 'async';
trainingStats = train(agent,env,trainingOpts);
0 Kommentare
Akzeptierte Antwort
Emmanouil Tzorakoleftherakis
am 5 Apr. 2021
The problem formulation is not correct. I suspect that even during training, you are seeing a lot of bang bang actions. The biggest issue is that the noise variance is pretty big compared to your action range. This needs to be fixed. Take a look at this note, "It is common to set StandardDeviation*sqrt(Ts) to a value between 1% and 10% of your action range"
4 Kommentare
Emmanouil Tzorakoleftherakis
am 8 Apr. 2021
It decays over global episode steps - so it carries over from episode to episode. Reducing the decay rate would make the agent explore more over time, that may be something to try
Weitere Antworten (0)
Siehe auch
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!