Main Content

getExplorationPolicy

Extract exploratory (stochastic) policy object from agent

Since R2023a

    Description

    policy = getExplorationPolicy(agent) returns a stochastic policy object from the specified reinforcement learning agent. Stochastic polices are useful for exploration.

    example

    Examples

    collapse all

    For this example, load the PG agent trained in Train PG Agent to Balance Discrete Cart-Pole System.

    load("MATLABCartpolePG.mat","agent")

    Extract the agent greedy policy using getGreedyPolicy.

    policyDtr = getGreedyPolicy(agent)
    policyDtr = 
      rlStochasticActorPolicy with properties:
    
                         Actor: [1x1 rl.function.rlDiscreteCategoricalActor]
        UseMaxLikelihoodAction: 1
                 Normalization: "none"
               ObservationInfo: [1x1 rl.util.rlNumericSpec]
                    ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
                    SampleTime: 1
    
    

    Note that, in the extracted policy object, the UseMaxLikelihoodAction property is set to true. This means that the policy object always generates the maximum likelihood action in response to a given observation, and is therefore greedy (and deterministic).

    Alternatively, you can extract a stochastic policy using getExplorationPolicy.

    policyXpl = getExplorationPolicy(agent)
    policyXpl = 
      rlStochasticActorPolicy with properties:
    
                         Actor: [1x1 rl.function.rlDiscreteCategoricalActor]
        UseMaxLikelihoodAction: 0
                 Normalization: "none"
               ObservationInfo: [1x1 rl.util.rlNumericSpec]
                    ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
                    SampleTime: 1
    
    

    This time, the extracted policy object has the UseMaxLikelihoodAction property is set to false. This means that the policy object generates a random action, given an observation. The policy is therefore stochastic and useful for exploration.

    Input Arguments

    collapse all

    Reinforcement learning agent that contains a critic, specified as one of the following objects:

    Note

    if agent is an rlMBPOAgent object, to extract the greedy policy, use getGreedyPolicy(agent.BaseAgent).

    Output Arguments

    collapse all

    Policy object, returned as one of the following:

    • rlEpsilonGreedyPolicy object — Returned when agent is an rlQAgent, rlSARSAAgent, or rlDQNAgent object.

    • rlAdditiveNoisePolicy object — Returned when agent is an rlDDPGAgent or rlTD3Agent object.

    • rlStochasticActorPolicy object, with the UseMaxLikelihoodAction set to false — Returned when agent is an rlACAgent, rlPGAgent, rlPPOAgent, rlTRPOAgent or rlSACAgent object. Since the returned policy object has the UseMaxLikelihoodAction property set to false, it always generates a random action (according to the policy probability distribution) as a response to a given observation, and is therefore exploratory (and stochastic).

    Version History

    Introduced in R2023a