Problems with control of PID parameters using reinforcement learning

Question

jh j am 15 Mai 2023

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1963919-problems-with-control-of-pid-parameters-using-reinforcement-learning

Beantwortet: Divyanshu am 26 Dez. 2024

Hello,I am currently learning about reinforcement learning. I want to use reinforcement learning to generate my PID controller parameters. But when I build the simulink model and start training, my reward value keeps going to 0 and when I look at the actions generated by the model in simulink it keeps going to null.I have no idea why this is happening and I am looking for some help.Thank you in advance for your assistance.

clc;clear;close all;
open_system('BLDCRL')
Unable to find system or file 'BLDCRL'.
obsInfo = rlNumericSpec([3 1],...
    'LowerLimit',[-inf -inf -inf ]',...
    'UpperLimit',[ inf  inf inf]');
numObservations = obsInfo.Dimension(1);
actInfo = rlNumericSpec([3 1]);
numActions = actInfo.Dimension(1);
env = rlSimulinkEnv('BLDCRL','BLDCRL/RL Agent',...
    obsInfo,actInfo);
env.ResetFcn = @localResetFcn;
Ts = 5;
Tf = 100;
rng(0)
statePath = [
    featureInputLayer(numObservations,'Normalization','none','Name','State')
    fullyConnectedLayer(500,'Name','CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(500,'Name','CriticStateFC2')];
actionPath = [
    featureInputLayer(numActions,'Normalization','none','Name','Action')
    fullyConnectedLayer(500,'Name','ActionFC1')];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'ActionFC1','add/in2');
% figure
% plot(criticNetwork)
criticOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
% critic = rlQValueFunction(criticNetwork,...
%              obsInfo,actInfo, ...
%              "ActionInputNames",{'Action'},"ObservationInputNames",{'State'},"UseDevice","gpu");
critic = rlRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
actorNetwork = [
    featureInputLayer(numObservations,'Normalization','none','Name','State')
    fullyConnectedLayer(500,'Name','ActorFC1')
    reluLayer('Name','ActorRelu1') 
    fullyConnectedLayer(500,'Name','ActorFC2')
    reluLayer('Name','ActorRelu2') 
    fullyConnectedLayer(numActions,'Name','Action')
    ];
actorOptions = rlRepresentationOptions('LearnRate',1e-05,'GradientThreshold',1);
actor = rlRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
% actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo, ...
%     "ObservationInputNames",{'State'},"UseDevice","gpu");
% act = getAction(actor, ...
%     {rand(obsInfo.Dimension)}); 
% act{1}
agentOpts = rlDDPGAgentOptions(...
    'SampleTime',Ts,...
    'TargetSmoothFactor',1e-3,...
    'DiscountFactor',1.0, ...
    'MiniBatchSize',64, ...
    'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);
maxepisodes = 500;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'ScoreAveragingWindowLength',5, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',998);
trainingStats = train(agent,env,trainOpts);

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Divyanshu am 26 Dez. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1963919-problems-with-control-of-pid-parameters-using-reinforcement-learning#answer_1556495

Hello @jh j,

If you are following some official example of Reinforcement Learning Toolbox, then ensure that the Reinforcement Learning Toolbox is installed properly on your system along with MATLAB. Because the error indicates that the model file 'BLDCRL' is missing.

However, if this is a custom code then ensure that the model 'BLDCRL' is present on MATLAB path.

Additionally, you can take reference of the following example documentation to get more details.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Problems with control of PID parameters using reinforcement learning

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Problems with control of PID parameters using reinforcement learning

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden