Main Content

rlMBPOAgentOptions

Options for MBPO agent

Since R2022a

    Description

    Use an rlMBPOAgentOptions object to specify options for model-based policy optimization (MBPO) agents. To create an MBPO agent, use rlMBPOAgent.

    For more information, see Model-Based Policy Optimization (MBPO) Agents.

    Creation

    Description

    example

    opt = rlMBPOAgentOptions creates an option object for use as an argument when creating an MBPO agent using all default options. You can modify the object properties using dot notation.

    opt = rlMBPOAgentOptions(Name=Value) creates the options set opt and sets its properties using one or more name-value arguments. For example, rlMBPOAgentOptions(DiscountFactor=0.95) creates an option set with a discount factor of 0.95. You can specify multiple name-value pair arguments.

    Properties

    expand all

    Number of epochs for training the environment model, specified as a positive integer.

    Example: NumEpochForTrainingModel=2

    Number of mini-batches used in each environment model training epoch, specified as a positive scalar or "all". When you specify NumMiniBatches to "all", the agent selects the number of mini-batches such that all samples in the base agents experience buffer are used to train the model.

    Example: NumMiniBatches=20

    Size of random experience mini-batch for training environment model, specified as a positive integer. During each model training episode, the agent randomly samples experiences from the experience buffer when computing gradients for updating the environment model properties. Large mini-batches reduce the variance when computing gradients but increase the computational effort.

    Example: MiniBatchSize=256

    Transition function optimizer options, specified as one of the following:

    • rlOptimizerOptions object — When your neural network environment has a single transition function or if you want to use the same options for multiple transition functions, specify a single options object.

    • Array of rlOptimizerOptions objects — When your neural network environment agent has multiple transition functions and you want to use different optimizer options for the transition functions, specify an array of options objects with length equal to the number of transition functions.

    Using these objects, you can specify training parameters for the transition deep neural network approximators as well as the optimizer algorithms and parameters.

    If you have previously trained transition models and do not want the MBPO agent to modify these models during training, set TransitionOptimizerOptions.LearnRate to 0.

    Reward function optimizer options, specified as an rlOptimizerOptions object. Using this object, you can specify training parameters for the reward deep neural network approximator as well as the optimizer algorithm and its parameters.

    If you specify a ground-truth reward function using a custom function, the MBPO agent ignores these options.

    If you have a previously trained reward model and do not want the MBPO agent to modify the model during training, set RewardOptimizerOptions.LearnRate to 0.

    Is-done function optimizer options, specified as an rlOptimizerOptions object. Using this object, you can specify training parameters for the is-done deep neural network approximator as well as the optimizer algorithm and its parameters.

    If you specify a ground-truth is-done function using a custom function, the MBPO agent ignores these options.

    If you have a previously trained is-done model and do not want the MBPO agent to modify the model during training, set IsDoneOptimizerOptions.LearnRate to 0.

    Generated experience buffer size, specified as a positive integer. When the agent generates experiences, they are added to the model experience buffer.

    Example: ModelExperienceBufferLength=50000

    Model roll-out options for controlling the number and length of generated experience trajectories, specified as an rlModelRolloutOptions object with the following fields. At the start of each epoch, the agent generates the roll-out trajectories and adds them to the model experience buffer. To modify the roll-out options, use dot notation.

    Number of trajectories for generating samples, specified as a positive integer.

    Example: NumRollout=4000

    Initial trajectory horizon, specified as a positive integer.

    Example: Horizon=2

    Option for increasing the horizon length, specified as one of the following values.

    • "none" — Do not increase the horizon length.

    • "piecewise" — Increase the horizon length by one after every N model training epochs, where N is equal to HorizonUpdateFrequency.

    Example: HorizonUpdateSchedule="piecewise"

    Number of epochs after which the horizon increases, specified as a positive integer. When RolloutHorizonSchedule is "none" this option is ignored.

    Example: RolloutHorizonUpdateFrequency=200

    Maximum horizon length, specified as a positive integer greater than or equal to RolloutHorizon. When RolloutHorizonSchedule is "none" this option is ignored.

    Example: HorizonMax=5

    Training epoch at which to start generating trajectories, specified as a positive integer.

    Example: HorizonUpdateStartEpoch=100

    Exploration model options for generating experiences using the internal environment model, specified as one of the following:

    • [] — Use the exploration policy of the base agent. You must use this option when training a SAC base agent.

    • EpsilonGreedyExploration object — You can use this option when training a DQN base agent.

    • GaussianActionNoise object — You can use this option when training a DDPG or TD3 base agent.

    The exploration model uses only the initial noise option values and does not update the values during training.

    To specify NoiseOptions, create a default model object. Then, specify any nondefault model properties using dot notation.

    • Specify epsilon greedy exploration options.

      opt = rlMBPOAgentOptions;
      opt.ModelRolloutOptions.NoiseOptions = ...
          rl.option.EpsilonGreedyExploration;
      opt.ModelRolloutOptions.NoiseOptions.EpsilonMin = 0.03;
    • Specify Gaussian action noise options.

      opt = rlMBPOAgentOptions;
      opt.ModelRolloutOptions.NoiseOptions = ...
          rl.option.GaussianActionNoise;
      opt.ModelRolloutOptions.NoiseOptions.StandardDeviation = sqrt(0.15);

    For more information on noise models, see Noise Models.

    Ratio of real experiences in a mini-batch for agent training, specified as a nonnegative scalar less than or equal to 1.

    Example: RealSampleRatio=0.1

    Options to save additional agent data, specified as a structure containing the Optimizer field.

    You can save an agent object in several ways, for example:

    • Using the save command

    • Specifying saveAgentCriteria and saveAgentValue in an rlTrainingOptions object

    • Specifying an appropriate logging function within a FileLogger object.

    When you save an agent using any method, the fields in the InfoToSave structure determine whether the corresponding data is saved with the agent. For example, if you set the Optimizer field to true, then the transition, reward, and is-done functions optimizers are saved along with the agent.

    You can modify the InfoToSave property only after the agent options object is created.

    Example: options.InfoToSave.Optimizer=true

    Option to save the agent optimizer, specified as a logical value. If the Optimizer field is set to false, then the transition, reward, and is-done functions optimizers (which are hidden properties of the agent and can contain internal states) are not saved along with the agent, therefore saving disk space and memory. However, when the optimizers contain internal states, the state of the saved agent is not identical to the state of the original agent.

    Example: true

    Object Functions

    rlMBPOAgentModel-based policy optimization (MBPO) reinforcement learning agent

    Examples

    collapse all

    Create an MBPO agent options object, specifying the ratio of real experiences to use for training the agent as 30%.

    opt = rlMBPOAgentOptions(RealSampleRatio=0.3)
    opt = 
      rlMBPOAgentOptions with properties:
    
           NumEpochForTrainingModel: 1
                     NumMiniBatches: 10
                      MiniBatchSize: 128
         TransitionOptimizerOptions: [1x1 rl.option.rlOptimizerOptions]
             RewardOptimizerOptions: [1x1 rl.option.rlOptimizerOptions]
             IsDoneOptimizerOptions: [1x1 rl.option.rlOptimizerOptions]
        ModelExperienceBufferLength: 100000
                ModelRolloutOptions: [1x1 rl.option.rlModelRolloutOptions]
                    RealSampleRatio: 0.3000
                         InfoToSave: [1x1 struct]
    
    

    You can modify options using dot notation. For example, set the mini-batch size to 64.

    opt.MiniBatchSize = 64;

    Algorithms

    expand all

    Version History

    Introduced in R2022a