- Open your Simulink model.
- Check the source of initial state. e.g., an Integrator block's initial condition parameter.
- Make sure it is set to “h0” and not a hardcoded value.
- Similarly, make sure the goal state used in reward calculation is set to “hf”.
Reinforcement learning based control not working for a positioning system.
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I've adapted the MATLAB water tank example(openExample('rl/GenerateRewardFunctionFromAModelVerificationBlockExample')) for my project by replacing the water tank model and reward function with those from my own system. Despite adjusting every possible parameter to match my system's requirements, the controller remains unresponsive. Specifically, altering the initial and final positions (h0 and hf) doesn't influence the system's behavior, and notably, the system is not even approaching the final or goal position. Could anyone shed some light on why these modifications are not affecting the controller's response as anticipated?
Thank you for any guidance or suggestions!
Attached are the simulink code, reward function and the physical model, and here is the code:
% Initial and final position
h0 = 0;
hf = 200;
% Simulation and sample times
Tf = 10;
Ts = 0.01;
open_system('PositionerStepInput');
numObs = 6;
numAct = 1;
oinfo = rlNumericSpec([numObs 1]); % Observation space specification remains unchanged
ainfo = rlNumericSpec([numAct 1], 'LowerLimit', 0, 'UpperLimit', 100); % Define action space with lower and upper limits
env = rlSimulinkEnv('PositionerStepInput','PositionerStepInput/RL Agent',oinfo,ainfo);
rng(100);
% Critic
cnet = [
featureInputLayer(numObs,'Normalization','none','Name', 'State')
fullyConnectedLayer(128, 'Name', 'fc1')
concatenationLayer(1,2,'Name','concat')
reluLayer('Name','relu1')
fullyConnectedLayer(128, 'Name', 'fc3')
reluLayer('Name','relu2')
fullyConnectedLayer(1, 'Name', 'CriticOutput')];
actionPath = [
featureInputLayer(numAct,'Normalization','none', 'Name', 'Action')
fullyConnectedLayer(8, 'Name', 'fc2')];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObs,'Normalization','none','Name','State')
fullyConnectedLayer(128, 'Name','actorFC1')
reluLayer('Name','relu1')
fullyConnectedLayer(128, 'Name','actorFC2')
reluLayer('Name','relu2')
fullyConnectedLayer(numAct,'Name','Action')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},actorOptions);
agentOpts = rlTD3AgentOptions("SampleTime",Ts, ...
"DiscountFactor",0.99, ...
"ExperienceBufferLength",1e6, ...
"MiniBatchSize",256);
agentOpts.ExplorationModel.StandardDeviation = 0.5;
agentOpts.ExplorationModel.StandardDeviationDecayRate = 1e-5;
agentOpts.ExplorationModel.StandardDeviationMin = 0;
agent = rlTD3Agent(actor,[critic1,critic2],agentOpts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',1500, ...
'MaxStepsPerEpisode',ceil(Tf/Ts), ...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',-5,...
'ScoreAveragingWindowLength',20);
doTraining = false;
if doTraining
trainingStats = train(agent,env,trainOpts);
save('myTrainedAgent.mat', 'agent')
else
load('myTrainedAgent.mat', 'agent')
end



0 Kommentare
Antworten (1)
TARUN
am 23 Apr. 2025
I understand that the modifications you have made in the water tank model are not affecting the controller’s response.
There are few possible causes for the same:
1. Variables defined in MATLAB workspace “h0” and “hf” must be linked to Simulink model blocks. You can link them by following the below steps:
2. The reward seems to depend on the difference between “H” and “xg” (goal) so if “xg” (goal) is not updated according to “hf”, the reward will not reflect your intended target and if “H” (system state) is not initialized to “h0”, the agent will always start from the same (possibly wrong) state.
3. Set “doTraining = true” and retrain after every change to initial/final states or reward.
4. Additionally, you can log the values of “h0”, “hf”, “H”, and “xg” during simulation to ensure they match your expectations.
Feel free to go through the following documentation to learn more about water tank model:
0 Kommentare
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!