DDPG multiple action noise variance error

Question

Tech Logg Ding am 6 Nov. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/638155-ddpg-multiple-action-noise-variance-error

Kommentiert: 勇刚张 am 30 Mär. 2022

In MATLAB Online öffnen

Hi,

I am working on developing an adaptive PID for a water tank level controller shown here:

The outputs of the RL Agent block are the 3 controller gains. As the 3 gains have very different range of values, I thought it was a good idea to use different variance for every action as suggested in the rlDDPGAgentOptions page.

However, when I initiate training, I get the following error:

Caused by:
    Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
    For 'Output Port 1' of 'rlwatertankAdaptivePID/RL Agent/AgentWrapper', the 'outputImpl' method of the System object
    'rl.simulink.blocks.AgentWrapper' returned a value whose size [3x3], does not match the value returned by the 'getOutputSizeImpl' method. Either
    change the size of the value returned by 'outputImpl', or change the size returned by 'getOutputSizeImpl'.

I defined the agent options as follow:

%% Specify DDPG agent options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.9;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.TargetSmoothFactor = 5e-3;
% due to large range of action values, variance needs to be individually
% defined for every action [kp, ki and kd]
% range of kp, ki and kd should be taken into account
% kp =[-6, 6], range = 12
% ki = [-0.2, 0.2], range = 0.4
% kd = [-2, 2], range = 4
% rule states that variance should be var*sqrt(Ts) between 1% to 10% of the
% range
agentOptions.NoiseOptions.MeanAttractionConstant = 0.15;
agentOptions.NoiseOptions.Variance = [0.8, 0.02, 0.2];
%agentOptions.NoiseOptions.Variance = 0.2;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-4;

How do I work around this?

Note: If I only specify one variance it works fine, but the exploration and acheived results is not good

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

张冠宇 am 18 Nov. 2021

In MATLAB Online öffnen

may i ask how can i get 3 actions like kp ki kd, should i set as follows?

actInfo = rlNumericSpec([3 1]);

or [1 3]

or other settings

as i meet the error

Input data dimensions must match the dimensions specified in the corresponding observation and action info
    specifications.

    obsInfo = rlNumericSpec([3 1],...   % rlNumericSpec：代表连续的动作或观测数据。rlFiniteSetSpec：代表离散的动作或观测数据。
    'LowerLimit',[-inf -inf -inf]',...
    'UpperLimit',[ inf  inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured height';
numObservations = obsInfo.Dimension(1); % 取观测矩阵的维度
actInfo = rlNumericSpec([3 1]); 
actInfo.Name = 'flow';
numActions = actInfo.Dimension(1);
%构建环境接口对象
env = rlSimulinkEnv('Load_Freq_Ctrl_rl2','Load_Freq_Ctrl_rl2/RL Agent',...
    obsInfo,actInfo);
%设置自定义重置功能，以随机化模型的参考值。
env.ResetFcn = @(in)localResetFcn(in);
%以秒为单位指定模拟时间Tf和智能体采样时间Ts。
Ts = 0.2;
Tf = 30;
%修复随机生成器种子以提高可重复性。
rng(0)
%创建DDPG智能体
statePath = [
    imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
    fullyConnectedLayer(50,'Name','CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(25,'Name','CriticStateFC2')];
actionPath = [
    imageInputLayer([3 1 1],'Normalization','none','Name','Action')
    fullyConnectedLayer(25,'Name','CriticActionFC1')];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
%观察评论者网路的配置。
figure
plot(criticNetwork)
%使用指定评论者表示的选项rlRepresentationOptions。
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
%创建critic
actorNetwork = [
    imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
    fullyConnectedLayer(3, 'Name','actorFC')
    tanhLayer('Name','actorTanh')
    fullyConnectedLayer(3,'Name','Action')
    ];
actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
%创建智能体
agentOpts = rlDDPGAgentOptions(...
    'SampleTime',Ts,...
    'TargetSmoothFactor',1e-3,...
    'DiscountFactor',1.0, ...
    'MiniBatchSize',64, ...
    'ExperienceBufferLength',1e6); 
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);
%训练agent
maxepisodes = 5000;
maxsteps = ceil(Tf/Ts);%     'SaveAgentCriteria',"EpisodeReward",'SaveAgentValue',100', 
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'ScoreAveragingWindowLength',5, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','EpisodeCount',...
    'StopTrainingValue',2000);%155较好
%自己为true
doTraining = true;
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps,'StopOnError','on');
experiences = sim(env,agent,simOpts);

thank you

勇刚张 am 30 Mär. 2022

构造深度网络为什么用imgeInputLayer() 而不用featureLayer() 作为输入层；

actorInfo中得上下限也要像obsInfo中一样重新声明一下，记得用列向量。

Good luck

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

DDPG multiple action noise variance error

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

DDPG multiple action noise variance error

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden