getGreedyPolicy

Extract greedy (deterministic) policy object from agent

Since R2022a

Syntax

policy = getGreedyPolicy(agent)

Description

policy = getGreedyPolicy(agent) returns a deterministic policy object from the specified reinforcement learning agent.

example

Examples

collapse all

Extract Policy Object from Agent

Open Live Script

For this example, load the PG agent trained in Train PG Agent to Balance Discrete Cart-Pole System.

load("MATLABCartpolePG.mat","agent")

Extract the agent greedy policy using getGreedyPolicy.

policyDtr = getGreedyPolicy(agent)

policyDtr = 
  rlStochasticActorPolicy with properties:

                     Actor: [1x1 rl.function.rlDiscreteCategoricalActor]
    UseMaxLikelihoodAction: 1
             Normalization: "none"
           ObservationInfo: [1x1 rl.util.rlNumericSpec]
                ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
                SampleTime: 1

Note that, in the extracted policy object, the UseMaxLikelihoodAction property is set to true. This means that the policy object always generates the maximum likelihood action in response to a given observation, and is therefore greedy (and deterministic).

Alternatively, you can extract a stochastic policy using getExplorationPolicy.

policyXpl = getExplorationPolicy(agent)

policyXpl = 
  rlStochasticActorPolicy with properties:

                     Actor: [1x1 rl.function.rlDiscreteCategoricalActor]
    UseMaxLikelihoodAction: 0
             Normalization: "none"
           ObservationInfo: [1x1 rl.util.rlNumericSpec]
                ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
                SampleTime: 1

This time, the extracted policy object has the UseMaxLikelihoodAction property is set to false. This means that the policy object generates a random action, given an observation. The policy is therefore stochastic and useful for exploration.

Input Arguments

collapse all

`agent` — Reinforcement learning agent
reinforcement learning agent object

Reinforcement learning agent that contains a critic, specified as one of the following objects:

rlQAgent
rlSARSAAgent
rlDQNAgent
rlPGAgent (when using a critic to estimate a baseline value function)
rlDDPGAgent
rlTD3Agent
rlACAgent
rlSACAgent
rlPPOAgent
rlTRPOAgent
rlMBPOAgent

Note

if agent is an rlMBPOAgent object, to extract the greedy policy, use getGreedyPolicy(agent.BaseAgent).

Output Arguments

collapse all

`policy` — Reinforcement learning policy object
`rlMaxQPolicy` object | `rlDeterministicActorPolicy` object | `rlStochasticActorPolicy` object

Policy object, returned as one of the following:

rlMaxQPolicy object — Returned when agent is an rlQAgent, rlSARSAAgent, or rlDQNAgent object.
rlDeterministicActorPolicy object — Returned when agent is an rlDDPGAgent or rlTD3Agent object.
rlStochasticActorPolicy object, with the UseMaxLikelihoodAction set to true — Returned when agent is an rlACAgent, rlPGAgent, rlPPOAgent, rlTRPOAgent or rlSACAgent object. Since the returned policy object has the UseMaxLikelihoodAction property set to true, it always generates the deterministic maximum likelihood action as a response to given observation.

Version History

Introduced in R2022a

getGreedyPolicy

Syntax

Description

Examples

Extract Policy Object from Agent

Input Arguments

`agent` — Reinforcement learning agent
reinforcement learning agent object

Output Arguments

`policy` — Reinforcement learning policy object
`rlMaxQPolicy` object | `rlDeterministicActorPolicy` object | `rlStochasticActorPolicy` object

Version History

See Also

Functions

Objects

Blocks

Topics

getGreedyPolicy

Syntax

Description

Examples

Extract Policy Object from Agent

Input Arguments

agent — Reinforcement learning agent reinforcement learning agent object

Output Arguments

policy — Reinforcement learning policy object rlMaxQPolicy object | rlDeterministicActorPolicy object | rlStochasticActorPolicy object

Version History

See Also

Functions

Objects

Blocks

Topics

`agent` — Reinforcement learning agent
reinforcement learning agent object

`policy` — Reinforcement learning policy object
`rlMaxQPolicy` object | `rlDeterministicActorPolicy` object | `rlStochasticActorPolicy` object