Training with Deep Q-Network Agent

5 Ansichten (letzte 30 Tage)
wathek gatri
wathek gatri am 7 Jun. 2022
Beantwortet: Anay am 21 Aug. 2025
Hello,
I'm trying to use the approach used here to train my system using basic reward function and descrete actions. The states are (x,y) coordinates of a certain moving point. Is there a ready made function to execute the training and update of ϕ? Or do I have to implement it manually?
Also the guide mentions 'Initialize the critic Q(s,a;ϕ) with random parameter values ϕ, and initialize the target critic parameters ϕt with the same values. ϕt=ϕ.'
But I set my critic in the following way and cant seem to find where the ϕvalues are stored ??!
-------------------------------------------------------------------------------------------------------------------------
obsInfo = rlNumericSpec([4 1]);
actInfo = rlFiniteSetSpec([2 1 -1]);
net = [featureInputLayer(4,'Normalization','none')
fullyConnectedLayer(3,'Name','value')];
net = dlnetwork(net);
critic = rlVectorQValueFunction(net,obsInfo,actInfo)
opt = rlDQNAgentOptions
agent = rlDQNAgent(critic,opt)
------------------------------------------------------------------------
Thanks!

Antworten (1)

Anay
Anay am 21 Aug. 2025
Hi wathek,
The parameters ϕ are the weights and biases of the neural network. As mentioned in the documentation for the rlVectorQValueFunction” in MATLAB, the learnable parameters of the critic are the weights of the deep neural network.
You do not need to implement the training and update of the parameters ϕ. The Reinforcement Learning Toolbox provides support for target networks in DQN. You don't need to manually implement the target network update. You just need to configure it properly in the agent options. Here is a sample code for your reference:
% Configure DQN agent options with target network settings
opt = rlDQNAgentOptions();
opt.TargetSmoothFactor = 1e-3; % Controls how quickly target network updates (τ)
opt.TargetUpdateFrequency = 4; % How often to update target network (in steps)
opt.ExperienceBufferLength = 10000; % Replay buffer size
opt.MiniBatchSize = 64; % Mini-batch size for training
opt.DiscountFactor = 0.99; % Discount factor (γ)
% Create the DQN agent
agent = rlDQNAgent(critic, opt);
To train the agent, you'll need to create an environment and then use the train function. You can refer to the below example:
% Training options
trainOpts = rlTrainingOptions();
trainOpts.MaxEpisodes = 1000;
trainOpts.MaxStepsPerEpisode = 500;
trainOpts.Verbose = true;
trainOpts.Plots = "training-progress";
trainOpts.StopTrainingCriteria = "AverageReward";
trainOpts.StopTrainingValue = 200;
% Train the agent (env is your environment)
trainingStats = train(agent, env, trainOpts);
You can create an environment in MATLAB or Simulink. Refer to the below link to know more about this in MATLAB documentation:
You can follow the below links to refer to the MATLAB documentation to learn more about configuring and training a DQN agent:
Hope this helps!

Produkte


Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by