DDPG multiple action noise variance error

15 Ansichten (letzte 30 Tage)
Tech Logg Ding
Tech Logg Ding am 6 Nov. 2020
Kommentiert: 勇刚 张 am 30 Mär. 2022
Hi,
I am working on developing an adaptive PID for a water tank level controller shown here:
The outputs of the RL Agent block are the 3 controller gains. As the 3 gains have very different range of values, I thought it was a good idea to use different variance for every action as suggested in the rlDDPGAgentOptions page.
However, when I initiate training, I get the following error:
Caused by:
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
For 'Output Port 1' of 'rlwatertankAdaptivePID/RL Agent/AgentWrapper', the 'outputImpl' method of the System object
'rl.simulink.blocks.AgentWrapper' returned a value whose size [3x3], does not match the value returned by the 'getOutputSizeImpl' method. Either
change the size of the value returned by 'outputImpl', or change the size returned by 'getOutputSizeImpl'.
I defined the agent options as follow:
%% Specify DDPG agent options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.9;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.TargetSmoothFactor = 5e-3;
% due to large range of action values, variance needs to be individually
% defined for every action [kp, ki and kd]
% range of kp, ki and kd should be taken into account
% kp =[-6, 6], range = 12
% ki = [-0.2, 0.2], range = 0.4
% kd = [-2, 2], range = 4
% rule states that variance should be var*sqrt(Ts) between 1% to 10% of the
% range
agentOptions.NoiseOptions.MeanAttractionConstant = 0.15;
agentOptions.NoiseOptions.Variance = [0.8, 0.02, 0.2];
%agentOptions.NoiseOptions.Variance = 0.2;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-4;
How do I work around this?
Note: If I only specify one variance it works fine, but the exploration and acheived results is not good
  3 Kommentare
张 冠宇
张 冠宇 am 18 Nov. 2021
may i ask how can i get 3 actions like kp ki kd, should i set as follows?
actInfo = rlNumericSpec([3 1]);
or [1 3]
or other settings
as i meet the error
Input data dimensions must match the dimensions specified in the corresponding observation and action info
specifications.
obsInfo = rlNumericSpec([3 1],... % rlNumericSpec:代表连续的动作或观测数据。rlFiniteSetSpec:代表离散的动作或观测数据。
'LowerLimit',[-inf -inf -inf]',...
'UpperLimit',[ inf inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured height';
numObservations = obsInfo.Dimension(1); % 取观测矩阵的维度
actInfo = rlNumericSpec([3 1]);
actInfo.Name = 'flow';
numActions = actInfo.Dimension(1);
%构建环境接口对象
env = rlSimulinkEnv('Load_Freq_Ctrl_rl2','Load_Freq_Ctrl_rl2/RL Agent',...
obsInfo,actInfo);
%设置自定义重置功能,以随机化模型的参考值。
env.ResetFcn = @(in)localResetFcn(in);
%以秒为单位指定模拟时间Tf和智能体采样时间Ts。
Ts = 0.2;
Tf = 30;
%修复随机生成器种子以提高可重复性。
rng(0)
%创建DDPG智能体
statePath = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(50,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(25,'Name','CriticStateFC2')];
actionPath = [
imageInputLayer([3 1 1],'Normalization','none','Name','Action')
fullyConnectedLayer(25,'Name','CriticActionFC1')];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
%观察评论者网路的配置。
figure
plot(criticNetwork)
%使用指定评论者表示的选项rlRepresentationOptions。
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
%创建critic
actorNetwork = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(3, 'Name','actorFC')
tanhLayer('Name','actorTanh')
fullyConnectedLayer(3,'Name','Action')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
%创建智能体
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',1.0, ...
'MiniBatchSize',64, ...
'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);
%训练agent
maxepisodes = 5000;
maxsteps = ceil(Tf/Ts);% 'SaveAgentCriteria',"EpisodeReward",'SaveAgentValue',100',
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',5, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',2000);%155较好
%自己为true
doTraining = true;
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps,'StopOnError','on');
experiences = sim(env,agent,simOpts);
thank you
勇刚 张
勇刚 张 am 30 Mär. 2022
构造深度网络为什么用imgeInputLayer() 而不用featureLayer() 作为输入层;
actorInfo中得上下限也要像obsInfo中一样重新声明一下,记得用列向量。
Good luck

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by