rlVectorQValueFunction
Vector Q-Value function approximator for reinforcement learning agents
Description
This object implements a vector Q-value function approximator to be used as a
critic with a discrete action space for a reinforcement learning agent. A vector Q-value
function is a function that maps an environment state to a vector in which each elements
represents the predicted discounted cumulative long-term reward when the agent starts from the
given state and executes the action corresponding to the element number. A Q-value function
critic therefore needs only the environment state as input. After you create an
rlVectorQValueFunction
critic, use it to create an agent such as an rlQAgent
, rlDQNAgent
, rlSARSAAgent
, rlDDPGAgent
, or rlTD3Agent
. For more
information on creating representations, see Create Policies and Value Functions.
Creation
Syntax
Description
creates the multi-output Q-value function critic
= rlVectorQValueFunction(net
,observationInfo
,actionInfo
)critic
with a discrete action space. Here, net
is the
deep neural network used as an approximator, and must have only the observations as input
and a single output layer having as many elements as the number of possible discrete
actions. The network input layers are automatically associated with the environment
observation channels according to the dimension specifications in
observationInfo
. This function sets the
ObservationInfo
and ActionInfo
properties of
critic
to the observationInfo
and
actionInfo
input arguments, respectively.
specifies the names of the network input layers to be associated with the environment
observation channels. The function assigns, in sequential order, each environment
observation channel specified in critic
= rlVectorQValueFunction(net
,observationInfo
,ObservationInputNames=netObsNames
)observationInfo
to the layer
specified by the corresponding name in the string array netObsNames
.
Therefore, the network input layers, ordered as the names in
netObsNames
, must have the same data type and dimensions as the
observation channels, as ordered in observationInfo
.
creates the multi-output Q-value function critic
= rlVectorQValueFunction({basisFcn
,W0
},observationInfo
,actionInfo
)critic
with a discrete action space using a custom basis function as
underlying approximator. The first input argument is a two-element cell array whose first
element is the handle basisFcn
to a custom basis function and whose
second element is the initial weight matrix W0
. Here the basis
function must have only the observations as inputs, and W0
must have
as many columns as the number of possible actions. The function sets the ObservationInfo
and ActionInfo
properties of critic
to the input arguments
observationInfo
and actionInfo
,
respectively.
specifies the device used to perform computational operations on the
critic
= rlVectorQValueFunction(___,UseDevice=useDevice
)critic
object, and sets the UseDevice
property
of critic
to the useDevice
input argument. You
can use this syntax with any of the previous input-argument combinations.
Input Arguments
Properties
Object Functions
rlDQNAgent | Deep Q-network reinforcement learning agent |
rlQAgent | Q-learning reinforcement learning agent |
rlSARSAAgent | SARSA reinforcement learning agent |
getValue | Obtain estimated value from a critic given environment observations and actions |
getMaxQValue | Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations |