Main Content


Reset environment, agent, experience buffer, or policy object



    initialObs = reset(env) resets the specified MATLAB®environment to an initial state and returns the resulting initial observation value.

    Do not use reset for Simulink® environments, which are implicitly reset when running a new simulation. Instead, customize the reset behavior using the ResetFcn property of the environment.


    agent = reset(agent) resets and returns the specified agent. Resetting a built-in agent performs the following actions, if applicable.

    • Empty experience buffer

    • Set recurrent neural network states of actor and critic networks set to zero

    • Reset the states of any noise models used by the agent


    reset(buffer) resets the specified replay memory buffer by removing all the experiences.


    resetPolicy = reset(policy) sets any recurrent neural network states of the specified policy object to zero and resets any noise model states. This syntax has no effect if the policy object does not use a recurrent neural network and does not have a noise model with state.


    collapse all

    Create a reinforcement learning environment. For this example, create a continuous-time cart-pole system.

    env = rlPredefinedEnv("CartPole-Continuous");

    Reset the environment and return the initial observation.

    initialObs = reset(env)
    initialObs = 4×1

    Create observation and action specifications.

    obsInfo = rlNumericSpec([4 1]);
    actInfo = rlNumericSpec([1 1]);

    Create a default DDPG agent using these specifications.

    initOptions = rlAgentInitializationOptions(UseRNN=true);
    agent = rlDDPGAgent(obsInfo,actInfo,initOptions);

    Reset the agent.

    agent = reset(agent);

    Create observation and action specifications.

    obsInfo = rlNumericSpec([4 1]);
    actInfo = rlNumericSpec([1 1]);

    Create a replay memory experience buffer.

    buffer = rlReplayMemory(obsInfo,actInfo,10000);

    Add experiences to the buffer. For this example, add 20 random experiences.

    for i = 1:20
        expBatch(i).Observation = {obsInfo.UpperLimit.*rand(4,1)};
        expBatch(i).Action = {actInfo.UpperLimit.*rand(1,1)};
        expBatch(i).NextObservation = {obsInfo.UpperLimit.*rand(4,1)};
        expBatch(i).Reward = 10*rand(1);
        expBatch(i).IsDone = 0;
    expBatch(20).IsDone = 1;

    Reset and clear the buffer..


    Create a observation and action specifications.

    obsInfo = rlNumericSpec([4 1]);
    actInfo = rlFiniteSetSpec([-1 0 1]);

    Create a deep neural network.

    obsPath = [featureInputLayer(4,'Normalization','none') 
    actPath = [featureInputLayer(1,'Normalization','none') 
    comPath = [additionLayer(2,'Name','add')  ...
    net = addLayers(layerGraph(obsPath),actPath); 
    net = addLayers(net,comPath);
    net = connectLayers(net,'obsout','add/in1');
    net = connectLayers(net,'actout','add/in2');
    net = dlnetwork(net);

    Create an epsilon-greedy policy object using a Q-value function approximator.

    critic = rlQValueFunction(net,obsInfo,actInfo);
    policy = rlEpsilonGreedyPolicy(critic);

    Reset the policy.

    policy = reset(policy);

    Input Arguments

    collapse all

    Reinforcement learning environment, specified as one of the following objects.

    Reinforcement learning agent, specified as one of the following objects.

    Experience buffer, specified as an rlReplayMemory object.

    Policy object, specified as one of the following objects.

    • rlDeterministicActorPolicy

    • rlAdditiveNoisePolicy

    • rlEpsilonGreedyPolicy

    • rlMaxQPolicy

    • rlStochasticActorPolicy

    For more information on a policy object, at the MATLAB command line, type help followed by the policy object name.

    Output Arguments

    collapse all

    Initial environment observation after reset, returned as one of the following:

    • Array with dimensions matching the observation specification for an environment with a single observation channel.

    • Cell array with length equal to the number of observation channel for an environment with multiple observation channels. Each element of the cell array contains an array with dimensions matching the corresponding element of the environment observation specifications.

    Reset policy, returned as a policy object of the same type as agent but with its recurrent neural network states set to zero.

    Version History

    Introduced in R2022a