getGreedyPolicy
Description
Examples
Extract Policy Object from Agent
For this example, load the PG agent trained in Train PG Agent to Balance Cart-Pole System.
load("MATLABCartpolePG.mat","agent")
Extract the agent greedy policy using getGreedyPolicy
.
policyDtr = getGreedyPolicy(agent)
policyDtr = rlStochasticActorPolicy with properties: Actor: [1x1 rl.function.rlDiscreteCategoricalActor] UseMaxLikelihoodAction: 1 ObservationInfo: [1x1 rl.util.rlNumericSpec] ActionInfo: [1x1 rl.util.rlFiniteSetSpec] SampleTime: 1
Note that, in the extracted policy object, the UseMaxLikelihoodAction
property is set to true
. This means that the policy object always generates the maximum likelihood action in response to a given observation, and is therefore greedy (and deterministic).
Alternatively, you can extract a stochastic policy using getExplorationPolicy
.
policyXpl = getExplorationPolicy(agent)
policyXpl = rlStochasticActorPolicy with properties: Actor: [1x1 rl.function.rlDiscreteCategoricalActor] UseMaxLikelihoodAction: 0 ObservationInfo: [1x1 rl.util.rlNumericSpec] ActionInfo: [1x1 rl.util.rlFiniteSetSpec] SampleTime: 1
This time, the extracted policy object has the UseMaxLikelihoodAction
property is set to false
. This means that the policy object generates a random action, given an observation. The policy is therefore stochastic and useful for exploration.
Input Arguments
agent
— Reinforcement learning agent
reinforcement learning agent object
Reinforcement learning agent that contains a critic, specified as one of the following objects:
rlPGAgent
(when using a critic to estimate a baseline value function)
Note
if agent
is an rlMBPOAgent
object, to extract the greedy policy, use
getGreedyPolicy(agent.BaseAgent)
.
Output Arguments
policy
— Reinforcement learning policy object
rlMaxQPolicy
object | rlDeterministicActorPolicy
object | rlStochasticActorPolicy
object
Policy object, returned as one of the following:
rlMaxQPolicy
object — Returned whenagent
is anrlQAgent
,rlSARSAAgent
, orrlDQNAgent
object.rlDeterministicActorPolicy
object — Returned whenagent
is anrlDDPGAgent
orrlTD3Agent
object.rlStochasticActorPolicy
object, with theUseMaxLikelihoodAction
set totrue
— Returned whenagent
is anrlACAgent
,rlPGAgent
,rlPPOAgent
,rlTRPOAgent
orrlSACAgent
object. Since the returned policy object has theUseMaxLikelihoodAction
property set totrue
, it always generates the deterministic maximum likelihood action as a response to given observation.
Version History
Introduced in R2022a
See Also
Functions
Objects
rlMaxQPolicy
|rlEpsilonGreedyPolicy
|rlAdditiveNoisePolicy
|rlDeterministicActorPolicy
|rlStochasticActorPolicy
Blocks
Beispiel öffnen
Sie haben eine geänderte Version dieses Beispiels. Möchten Sie dieses Beispiel mit Ihren Änderungen öffnen?
MATLAB-Befehl
Sie haben auf einen Link geklickt, der diesem MATLAB-Befehl entspricht:
Führen Sie den Befehl durch Eingabe in das MATLAB-Befehlsfenster aus. Webbrowser unterstützen keine MATLAB-Befehle.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)