Reinforcement learning based control not working for a positioning system.

Question

Romina Zarrabi am 15 Mär. 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2094706-reinforcement-learning-based-control-not-working-for-a-positioning-system

Beantwortet: TARUN am 23 Apr. 2025

I've adapted the MATLAB water tank example(openExample('rl/GenerateRewardFunctionFromAModelVerificationBlockExample')) for my project by replacing the water tank model and reward function with those from my own system. Despite adjusting every possible parameter to match my system's requirements, the controller remains unresponsive. Specifically, altering the initial and final positions (h0 and hf) doesn't influence the system's behavior, and notably, the system is not even approaching the final or goal position. Could anyone shed some light on why these modifications are not affecting the controller's response as anticipated?

Thank you for any guidance or suggestions!

Attached are the simulink code, reward function and the physical model, and here is the code:

% Initial and final position
h0 = 0;
hf = 200;
% Simulation and sample times
Tf = 10;
Ts = 0.01;
open_system('PositionerStepInput');
Unable to find system or file 'PositionerStepInput'.
numObs = 6;
numAct = 1;
oinfo = rlNumericSpec([numObs 1]); % Observation space specification remains unchanged
ainfo = rlNumericSpec([numAct 1], 'LowerLimit', 0, 'UpperLimit', 100); % Define action space with lower and upper limits
env = rlSimulinkEnv('PositionerStepInput','PositionerStepInput/RL Agent',oinfo,ainfo);
rng(100);
% Critic
cnet = [
    featureInputLayer(numObs,'Normalization','none','Name', 'State')
    fullyConnectedLayer(128, 'Name', 'fc1')
    concatenationLayer(1,2,'Name','concat')
    reluLayer('Name','relu1')
    fullyConnectedLayer(128, 'Name', 'fc3')
    reluLayer('Name','relu2')
    fullyConnectedLayer(1, 'Name', 'CriticOutput')];
actionPath = [
    featureInputLayer(numAct,'Normalization','none', 'Name', 'Action')
    fullyConnectedLayer(8, 'Name', 'fc2')];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
    'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
    'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObs,'Normalization','none','Name','State')
    fullyConnectedLayer(128, 'Name','actorFC1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(128, 'Name','actorFC2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(numAct,'Name','Action')
    ];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,oinfo,ainfo,...
    'Observation',{'State'},'Action',{'Action'},actorOptions);
agentOpts = rlTD3AgentOptions("SampleTime",Ts, ...
    "DiscountFactor",0.99, ...
    "ExperienceBufferLength",1e6, ...
    "MiniBatchSize",256);
agentOpts.ExplorationModel.StandardDeviation = 0.5;
agentOpts.ExplorationModel.StandardDeviationDecayRate = 1e-5;
agentOpts.ExplorationModel.StandardDeviationMin = 0;
agent = rlTD3Agent(actor,[critic1,critic2],agentOpts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',1500, ...
    'MaxStepsPerEpisode',ceil(Tf/Ts), ...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-5,...
    'ScoreAveragingWindowLength',20);
doTraining = false;
if doTraining
    trainingStats = train(agent,env,trainOpts);
    save('myTrainedAgent.mat', 'agent')
else
    load('myTrainedAgent.mat', 'agent')
end

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

TARUN am 23 Apr. 2025

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2094706-reinforcement-learning-based-control-not-working-for-a-positioning-system#answer_1564072

Hi @Romina Zarrabi,

I understand that the modifications you have made in the water tank model are not affecting the controller’s response.

There are few possible causes for the same:

1. Variables defined in MATLAB workspace “h0” and “hf” must be linked to Simulink model blocks. You can link them by following the below steps:

Open your Simulink model.
Check the source of initial state. e.g., an Integrator block's initial condition parameter.
Make sure it is set to “h0” and not a hardcoded value.
Similarly, make sure the goal state used in reward calculation is set to “hf”.

2. The reward seems to depend on the difference between “H” and “xg” (goal) so if “xg” (goal) is not updated according to “hf”, the reward will not reflect your intended target and if “H” (system state) is not initialized to “h0”, the agent will always start from the same (possibly wrong) state.

3. Set “doTraining = true” and retrain after every change to initial/final states or reward.

4. Additionally, you can log the values of “h0”, “hf”, “H”, and “xg” during simulation to ensure they match your expectations.

Feel free to go through the following documentation to learn more about water tank model:

https://www.mathworks.com/help/releases/R2021b/reinforcement-learning/ug/water-tank-reinforcement-learning-environment-model.html?searchHighlight=water%20tank%20model&searchResultIndex=3

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Reinforcement learning based control not working for a positioning system.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Reinforcement learning based control not working for a positioning system.

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden