rlRepresentationOptions
(Not recommended) Options set for reinforcement learning agent representations (critics and actors)
rlRepresentationOptions
is not recommended. Use an rlOptimizerOptions
object within an agent options object instead. For more information, see rlRepresentationOptions is not recommended.
Description
Use an rlRepresentationOptions
object to specify an options object for critics (rlValueRepresentation
,
rlQValueRepresentation
)
and actors (rlDeterministicActorRepresentation
, rlStochasticActorRepresentation
).
Creation
Description
creates a
default option set to use as a last argument when creating a reinforcement learning actor
or critic. You can modify the object properties using dot notation.repOpts
= rlRepresentationOptions
creates an options object with the specified Properties using one or more
name-value pair arguments.repOpts
= rlRepresentationOptions(Name,Value
)
Properties
Learning rate for the representation, specified as a positive scalar. If the learning rate is too low, then training takes a long time. If the learning rate is too high, then training might reach a suboptimal result or diverge.
Example: 'LearnRate',0.025
Optimizer for training the network of the representation, specified as one of the following values.
"adam"
— Use the Adam optimizer. You can specify the decay rates of the gradient and squared gradient moving averages using theGradientDecayFactor
andSquaredGradientDecayFactor
fields of theOptimizerParameters
option."sgdm"
— Use the stochastic gradient descent with momentum (SGDM) optimizer. You can specify the momentum value using theMomentum
field of theOptimizerParameters
option."rmsprop"
— Use the RMSProp optimizer. You can specify the decay rate of the squared gradient moving average using theSquaredGradientDecayFactor
fields of theOptimizerParameters
option.
For more information about these optimizers, see the Algorithms section
of trainingOptions
in Deep Learning Toolbox™.
Example: 'Optimizer',"sgdm"
Applicable parameters for the optimizer, specified as an
OptimizerParameters
object with the following parameters.
Parameter | Description |
---|---|
Momentum | Contribution of previous step, specified as a scalar from 0 to 1. A value of 0 means no contribution from the previous step. A value of 1 means maximal contribution. This parameter applies only when
|
Epsilon | Denominator offset, specified as a positive scalar. The optimizer adds this offset to the denominator in the network parameter updates to avoid division by zero. This parameter applies only when
|
GradientDecayFactor | Decay rate of gradient moving average, specified as a positive scalar from 0 to 1. This parameter applies only when
|
SquaredGradientDecayFactor | Decay rate of squared gradient moving average, specified as a positive scalar from 0 to 1. This parameter applies only when
|
When a particular property of OptimizerParameters
is not
applicable to the optimizer type specified in the Optimizer
option,
that property is set to "Not applicable"
.
To change the default values, create an rlRepresentationOptions
set
and use dot notation to access and change the properties of
OptimizerParameters
.
repOpts = rlRepresentationOptions; repOpts.OptimizerParameters.GradientDecayFactor = 0.95;
Threshold value for the representation gradient, specified as Inf
or a positive scalar. If the gradient exceeds this value, the gradient is clipped as
specified by the GradientThresholdMethod
option. Clipping the
gradient limits how much the network parameters change in a training iteration.
Example: 'GradientThreshold',1
Gradient threshold method used to clip gradient values that exceed the gradient threshold, specified as one of the following values.
"l2norm"
— If the L2 norm of the gradient of a learnable parameter is larger thanGradientThreshold
, then scale the gradient so that the L2 norm equalsGradientThreshold
."global-l2norm"
— If the global L2 norm, L, is larger thanGradientThreshold
, then scale all gradients by a factor ofGradientThreshold/
L. The global L2 norm considers all learnable parameters."absolute-value"
— If the absolute value of an individual partial derivative in the gradient of a learnable parameter is larger thanGradientThreshold
, then scale the partial derivative to have magnitude equal toGradientThreshold
and retain the sign of the partial derivative.
For more information, see Gradient Clipping in the
Algorithms section of trainingOptions
in Deep Learning Toolbox.
Example: 'GradientThresholdMethod',"absolute-value"
Factor for L2 regularization (weight
decay), specified as a nonnegative scalar. For more information, see L2 Regularization in the Algorithms section of trainingOptions
in Deep Learning Toolbox.
To avoid overfitting when using a representation with many parameters, consider
increasing the L2RegularizationFactor
option.
Example: 'L2RegularizationFactor',0.0005
Computation device used to perform deep neural network operations such as gradient
computation, parameter update and prediction during training. It is specified as either
"cpu"
or "gpu"
.
The "gpu"
option requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).
You can use gpuDevice
(Parallel Computing Toolbox) to query or select a local GPU device to be used
with MATLAB®.
Note
Training or simulating an agent on a GPU involves device-specific numerical round off errors. These errors can produce different results compared to performing the same operations a CPU.
Note that if you want to use parallel processing to speed up training, you do not
need to set UseDevice
. Instead, when training your agent, use an
rlTrainingOptions
object in which the UseParallel
option is set to
true
. For more information about training using multicore
processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.
Example: 'UseDevice',"gpu"
Object Functions
rlValueRepresentation | (Not recommended) Value function critic representation for reinforcement learning agents |
rlQValueRepresentation | (Not recommended) Q-Value function critic representation for reinforcement learning agents |
rlDeterministicActorRepresentation | (Not recommended) Deterministic actor representation for reinforcement learning agents |
rlStochasticActorRepresentation | (Not recommended) Stochastic actor representation for reinforcement learning agents |
Examples
Create an options object for creating a critic or actor representation for a reinforcement learning agent. Set the learning rate for the representation to 0.05, and set the gradient threshold to 1. You can set the options using Name,Value pairs when you create the options object. Any options that you do not explicitly set have their default values.
repOpts = rlRepresentationOptions(
LearnRate=5e-2, ...
GradientThreshold=1)
repOpts = rlRepresentationOptions with properties: LearnRate: 0.0500 GradientThreshold: 1 GradientThresholdMethod: "l2norm" L2RegularizationFactor: 1.0000e-04 UseDevice: "cpu" Optimizer: "adam" OptimizerParameters: [1x1 rl.option.OptimizerParameters]
Alternatively, create a default options object and use dot notation to change some of the values.
repOpts = rlRepresentationOptions; repOpts.LearnRate = 5e-2; repOpts.GradientThreshold = 1
repOpts = rlRepresentationOptions with properties: LearnRate: 0.0500 GradientThreshold: 1 GradientThresholdMethod: "l2norm" L2RegularizationFactor: 1.0000e-04 UseDevice: "cpu" Optimizer: "adam" OptimizerParameters: [1x1 rl.option.OptimizerParameters]
If you want to change the properties of the OptimizerParameters
option, use dot notation to access them.
repOpts.OptimizerParameters.Epsilon = 1e-7; repOpts.OptimizerParameters
ans = OptimizerParameters with properties: Momentum: "Not applicable" Epsilon: 1.0000e-07 GradientDecayFactor: 0.9000 SquaredGradientDecayFactor: 0.9990
Version History
Introduced in R2019arlRepresentationOptions
objects are no longer recommended. To specify
optimization options for actors and critics, use rlOptimizerOptions
objects instead.
Specifically, you can create an agent options object and set its
CriticOptimizerOptions
and ActorOptimizerOptions
properties to suitable rlOptimizerOptions
objects. Then you pass the agent
options object to the function that creates the agent. This workflow is shown in the
following table.
rlRepresentationOptions : Not Recommended | rlOptimizerOptions : Recommended |
---|---|
crtOpts = rlRepresentationOptions(... 'GradientThreshold',1); critic = rlValueRepresentation(... net,obsInfo,'Observation',{'obs'},ctrOpts) |
criticOpts = rlOptimizerOptions(... 'GradientThreshold',1); agentOpts = rlACAgentOptions(... 'CriticOptimizerOptions',crtOpts); agent = rlACAgent(actor,critic,agentOpts) |
Alternatively, you can create the agent and then use dot notation to access the
optimization options for the agent actor and critic, for example:
agent.AgentOptions.ActorOptimizerOptions.GradientThreshold = 1;
.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Website auswählen
Wählen Sie eine Website aus, um übersetzte Inhalte (sofern verfügbar) sowie lokale Veranstaltungen und Angebote anzuzeigen. Auf der Grundlage Ihres Standorts empfehlen wir Ihnen die folgende Auswahl: .
Sie können auch eine Website aus der folgenden Liste auswählen:
So erhalten Sie die bestmögliche Leistung auf der Website
Wählen Sie für die bestmögliche Website-Leistung die Website für China (auf Chinesisch oder Englisch). Andere landesspezifische Websites von MathWorks sind für Besuche von Ihrem Standort aus nicht optimiert.
Amerika
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)