Reinforcement learning based control not working for a positioning system.

5 Ansichten (letzte 30 Tage)
Romina Zarrabi
Romina Zarrabi am 15 Mär. 2024
Beantwortet: TARUN am 23 Apr. 2025
I've adapted the MATLAB water tank example(openExample('rl/GenerateRewardFunctionFromAModelVerificationBlockExample')) for my project by replacing the water tank model and reward function with those from my own system. Despite adjusting every possible parameter to match my system's requirements, the controller remains unresponsive. Specifically, altering the initial and final positions (h0 and hf) doesn't influence the system's behavior, and notably, the system is not even approaching the final or goal position. Could anyone shed some light on why these modifications are not affecting the controller's response as anticipated?
Thank you for any guidance or suggestions!
Attached are the simulink code, reward function and the physical model, and here is the code:
% Initial and final position
h0 = 0;
hf = 200;
% Simulation and sample times
Tf = 10;
Ts = 0.01;
open_system('PositionerStepInput');
Unable to find system or file 'PositionerStepInput'.
numObs = 6;
numAct = 1;
oinfo = rlNumericSpec([numObs 1]); % Observation space specification remains unchanged
ainfo = rlNumericSpec([numAct 1], 'LowerLimit', 0, 'UpperLimit', 100); % Define action space with lower and upper limits
env = rlSimulinkEnv('PositionerStepInput','PositionerStepInput/RL Agent',oinfo,ainfo);
rng(100);
% Critic
cnet = [
featureInputLayer(numObs,'Normalization','none','Name', 'State')
fullyConnectedLayer(128, 'Name', 'fc1')
concatenationLayer(1,2,'Name','concat')
reluLayer('Name','relu1')
fullyConnectedLayer(128, 'Name', 'fc3')
reluLayer('Name','relu2')
fullyConnectedLayer(1, 'Name', 'CriticOutput')];
actionPath = [
featureInputLayer(numAct,'Normalization','none', 'Name', 'Action')
fullyConnectedLayer(8, 'Name', 'fc2')];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObs,'Normalization','none','Name','State')
fullyConnectedLayer(128, 'Name','actorFC1')
reluLayer('Name','relu1')
fullyConnectedLayer(128, 'Name','actorFC2')
reluLayer('Name','relu2')
fullyConnectedLayer(numAct,'Name','Action')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},actorOptions);
agentOpts = rlTD3AgentOptions("SampleTime",Ts, ...
"DiscountFactor",0.99, ...
"ExperienceBufferLength",1e6, ...
"MiniBatchSize",256);
agentOpts.ExplorationModel.StandardDeviation = 0.5;
agentOpts.ExplorationModel.StandardDeviationDecayRate = 1e-5;
agentOpts.ExplorationModel.StandardDeviationMin = 0;
agent = rlTD3Agent(actor,[critic1,critic2],agentOpts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',1500, ...
'MaxStepsPerEpisode',ceil(Tf/Ts), ...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',-5,...
'ScoreAveragingWindowLength',20);
doTraining = false;
if doTraining
trainingStats = train(agent,env,trainOpts);
save('myTrainedAgent.mat', 'agent')
else
load('myTrainedAgent.mat', 'agent')
end

Antworten (1)

TARUN
TARUN am 23 Apr. 2025
I understand that the modifications you have made in the water tank model are not affecting the controller’s response.
There are few possible causes for the same:
1. Variables defined in MATLAB workspace h0 andhf must be linked to Simulink model blocks. You can link them by following the below steps:
  • Open your Simulink model.
  • Check the source of initial state. e.g., an Integrator block's initial condition parameter.
  • Make sure it is set to h0 and not a hardcoded value.
  • Similarly, make sure the goal state used in reward calculation is set to hf.
2. The reward seems to depend on the difference between H and xg (goal) so if xg (goal) is not updated according to hf, the reward will not reflect your intended target and if “H” (system state) is not initialized to h0, the agent will always start from the same (possibly wrong) state.
3. Set doTraining = true and retrain after every change to initial/final states or reward.
4. Additionally, you can log the values of h0, hf, H, and xg during simulation to ensure they match your expectations.
Feel free to go through the following documentation to learn more about water tank model:

Produkte


Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by