Main Content

rlValueRepresentation

Value function critic representation for reinforcement learning agents

Description

This object implements a value function approximator to be used as a critic within a reinforcement learning agent. A value function is a function that maps an observation to a scalar value. The output represents the expected total long-term reward when the agent starts from the given observation and takes the best possible action. Value function critics therefore only need observations (but not actions) as inputs. After you create an rlValueRepresentation critic, use it to create an agent relying on a value function critic, such as an rlACAgent, rlPGAgent, or rlPPOAgent. For an example of this workflow, see Create Actor and Critic Representations. For more information on creating representations, see Create Policy and Value Function Representations.

Creation

Description

example

critic = rlValueRepresentation(net,observationInfo,'Observation',obsName) creates the value function based critic from the deep neural network net. This syntax sets the ObservationInfo property of critic to the input observationInfo. obsName must contain the names of the input layers of net.

example

critic = rlValueRepresentation(tab,observationInfo) creates the value function based critic with a discrete observation space, from the value table tab, which is an rlTable object containing a column array with as many elements as the possible observations. This syntax sets the ObservationInfo property of critic to the input observationInfo.

example

critic = rlValueRepresentation({basisFcn,W0},observationInfo) creates the value function based critic using a custom basis function as underlying approximator. The first input argument is a two-elements cell in which the first element contains the handle basisFcn to a custom basis function, and the second element contains the initial weight vector W0. This syntax sets the ObservationInfo property of critic to the input observationInfo.

critic = rlValueRepresentation(___,options) creates the value function based critic using the additional option set options, which is an rlRepresentationOptions object. This syntax sets the Options property of critic to the options input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

Deep neural network used as the underlying approximator within the critic, specified as one of the following:

The network input layers must be in the same order and with the same data type and dimensions as the signals defined in ObservationInfo. Also, the names of these input layers must match the observation names listed in obsName.

rlValueRepresentation objects support recurrent deep neural networks.

For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policy and Value Function Representations.

Observation names, specified as a cell array of strings or character vectors. The observation names must be the names of the input layers in net. These network layers must be in the same order and with the same data type and dimensions as the signals defined in ObservationInfo.

Example: {'my_obs'}

Value table, specified as an rlTable object containing a column vector with length equal to the number of observations. The element i is the expected cumulative long-term reward when the agent starts from the given observation s and takes the best possible action. The elements of this vector are the learnable parameters of the representation.

Custom basis function, specified as a function handle to a user-defined function. The user defined function can either be an anonymous function or a function on the MATLAB path. The output of the critic is c = W'*B, where W is a weight vector and B is the column vector returned by the custom basis function. c is the expected cumulative long term reward when the agent starts from the given observation and takes the best possible action. The learnable parameters of this representation are the elements of W.

When creating a value function critic representation, your basis function must have the following signature.

B = myBasisFunction(obs1,obs2,...,obsN)

Here obs1 to obsN are observations in the same order and with the same data type and dimensions as the signals defined in ObservationInfo.

Example: @(obs1,obs2,obs3) [obs3(1)*obs1(1)^2; abs(obs2(5)+obs1(2))]

Initial value of the basis function weights, W, specified as a column vector having the same length as the vector returned by the basis function.

Properties

expand all

Representation options, specified as an rlRepresentationOptions object. Available options include the optimizer used for training and the learning rate.

Observation specifications, a reinforcement learning specification object or an array of specification objects defining properties such as the dimensions, data types, and names of the observation signals.

rlValueRepresentation sets the ObservationInfo property of critic to the input observationInfo.

You can extract ObservationInfo from an existing environment or agent using getObservationInfo. You can also construct the specs manually using a specification command such as rlFiniteSetSpec or rlNumericSpec.

Object Functions

rlACAgentActor-critic reinforcement learning agent
rlPGAgentPolicy gradient reinforcement learning agent
rlPPOAgentProximal policy optimization reinforcement learning agent
getValueObtain estimated value function representation

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing 4 doubles.

obsInfo = rlNumericSpec([4 1]);

Create a deep neural network to approximate the value function within the critic. The input of the network (here called myobs) must accept a four-element vector (the observation vector defined by obsInfo), and the output must be a scalar (the value, representing the expected cumulative long-term reward when the agent starts from the given observation).

net = [featureInputLayer(4, 'Normalization','none','Name','myobs') 
       fullyConnectedLayer(1,'Name','value')];

Create the critic using the network, observation specification object, and name of the network input layer.

critic = rlValueRepresentation(net,obsInfo,'Observation',{'myobs'})
critic = 
  rlValueRepresentation with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your critic, use the getValue function to return the value of a random observation, using the current network weights.

v = getValue(critic,{rand(4,1)})
v = single
    0.7904

You can now use the critic (along with an actor) to create an agent relying on a value function critic (such as rlACAgent or rlPGAgent).

Create an actor representation and a critic representation that you can use to define a reinforcement learning agent such as an Actor Critic (AC) agent.

For this example, create actor and critic representations for an agent that can be trained against the cart-pole environment described in Train AC Agent to Balance Cart-Pole System. First, create the environment. Then, extract the observation and action specifications from the environment. You need these specifications to define the agent and critic representations.

env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

For a state-value-function critic such as those used for AC or PG agents, the inputs are the observations and the output should be a scalar value, the state value. For this example, create the critic representation using a deep neural network with one output, and with observation signals corresponding to x, xdot, theta, and thetadot as described in Train AC Agent to Balance Cart-Pole System. You can obtain the number of observations from the obsInfo specification. Name the network layer input 'observation'.

numObservation = obsInfo.Dimension(1);
criticNetwork = [
    featureInputLayer(numObservation,'Normalization','none','Name','observation')
    fullyConnectedLayer(1,'Name','CriticFC')];

Specify options for the critic representation using rlRepresentationOptions. These options control the learning of the critic network parameters. For this example, set the learning rate to 0.05 and the gradient threshold to 1.

repOpts = rlRepresentationOptions('LearnRate',5e-2,'GradientThreshold',1);

Create the critic representation using the specified neural network and options. Also, specify the action and observation information for the critic. Set the observation name to 'observation', which is the of the criticNetwork input layer.

critic = rlValueRepresentation(criticNetwork,obsInfo,'Observation',{'observation'},repOpts)
critic = 
  rlValueRepresentation with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

Similarly, create a network for the actor. An AC agent decides which action to take given observations using an actor representation. For an actor, the inputs are the observations, and the output depends on whether the action space is discrete or continuous. For the actor of this example, there are two possible discrete actions, –10 or 10. To create the actor, use a deep neural network with the same observation input as the critic, that can output these two values. You can obtain the number of actions from the actInfo specification. Name the output 'action'.

numAction = numel(actInfo.Elements); 
actorNetwork = [
    featureInputLayer(numObservation,'Normalization','none','Name','observation')
    fullyConnectedLayer(numAction,'Name','action')];

Create the actor representation using the observation name and specification and the same representation options.

actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'observation'},repOpts)
actor = 
  rlStochasticActorRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

Create an AC agent using the actor and critic representations.

agentOpts = rlACAgentOptions(...
    'NumStepsToLookAhead',32,...
    'DiscountFactor',0.99);
agent = rlACAgent(actor,critic,agentOpts)
agent = 
  rlACAgent with properties:

    AgentOptions: [1x1 rl.option.rlACAgentOptions]

For additional examples showing how to create actor and critic representations for different agent types, see:

Create a finite set observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment with a discrete observation space). For this example, define the observation space as a finite set consisting of 4 possible values.

obsInfo = rlFiniteSetSpec([1 3 5 7]);

Create a table to approximate the value function within the critic.

vTable = rlTable(obsInfo);

The table is a column vector in which each entry stores the expected cumulative long-term reward for each possible observation as defined by obsInfo. You can access the table using the Table property of the vTable object. The initial value of each element is zero.

vTable.Table
ans = 4×1

     0
     0
     0
     0

You can also initialize the table to any value, in this case, an array containing all the integers from 1 to 4.

vTable.Table = reshape(1:4,4,1)
vTable = 
  rlTable with properties:

    Table: [4x1 double]

Create the critic using the table and the observation specification object.

critic = rlValueRepresentation(vTable,obsInfo)
critic = 
  rlValueRepresentation with properties:

    ObservationInfo: [1x1 rl.util.rlFiniteSetSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your critic, use the getValue function to return the value of a given observation, using the current table entries.

v = getValue(critic,{7})
v = 4

You can now use the critic (along with an actor) to create an agent relying on a value function critic (such as rlACAgent or rlPGAgent agent).

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing 4 doubles.

obsInfo = rlNumericSpec([4 1]);

Create a custom basis function to approximate the value function within the critic. The custom basis function must return a column vector. Each vector element must be a function of the observations defined by obsInfo.

myBasisFcn = @(myobs) [myobs(2)^2; myobs(3)+exp(myobs(1)); abs(myobs(4))]
myBasisFcn = function_handle with value:
    @(myobs)[myobs(2)^2;myobs(3)+exp(myobs(1));abs(myobs(4))]

The output of the critic is the scalar W'*myBasisFcn(myobs), where W is a weight column vector which must have the same size of the custom basis function output. This output is the expected cumulative long term reward when the agent starts from the given observation and takes the best possible action. The elements of W are the learnable parameters.

Define an initial parameter vector.

W0 = [3;5;2];

Create the critic. The first argument is a two-element cell containing both the handle to the custom function and the initial weight vector. The second argument is the observation specification object.

critic = rlValueRepresentation({myBasisFcn,W0},obsInfo)
critic = 
  rlValueRepresentation with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your critic, use the getValue function to return the value of a given observation, using the current parameter vector.

v = getValue(critic,{[2 4 6 8]'})
v = 
  1x1 dlarray

  130.9453

You can now use the critic (along with an with an actor) to create an agent relying on a value function critic (such as rlACAgent or rlPGAgent).

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a recurrent deep neural network for the critic. To create a recurrent neural network, use a sequenceInputLayer as the input layer and include at least one lstmLayer.

criticNetwork = [
    sequenceInputLayer(numObs,'Normalization','none','Name','state')
    fullyConnectedLayer(8, 'Name','fc')
    reluLayer('Name','relu')
    lstmLayer(8,'OutputMode','sequence','Name','lstm')
    fullyConnectedLayer(1,'Name','output')];

Create a value function representation object for the critic.

criticOptions = rlRepresentationOptions('LearnRate',1e-2,'GradientThreshold',1);
critic = rlValueRepresentation(criticNetwork,obsInfo,...
    'Observation','state',criticOptions);
Introduced in R2020a