Output between generated Policy and trained Agent different

1 Ansicht (letzte 30 Tage)
Victor Bayer
Victor Bayer am 22 Sep. 2021
Bearbeitet: Victor Bayer am 22 Sep. 2021
Dear Mathworks Team,
I have trained a DDPG-Agent which recieves 2 observations.
By using the approach described in:
i generated a function evaluatePolicy.m which accepts an input of shape (2,1,1) and outputs a scalar. However the output differs from that of my Agent during training.
During the training, the following lines define certain action-properties in the definition of the Environment and Training (createSineAgent.m) process (not in the neural-Net definition of the Agent (createDDGPNetworks.m).
numAct = 1;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',0 ,'UpperLimit', 1);
actionInfo.Name = 'sine_amplitude';
This prevents that agent outputs bigger than 1 and smaller than 0 are applied. The output during training is always between 1 and 0 and clipped at those values.
However, the output of the corresponding evaluatePolicy.m seems to range between -1 and 1 and not 0 and 1. Why is that?
Examples:
>> evaluatePolicy(reshape([-0.1515581,-0.1515581],2,1,1))
ans = 0.9986
>> evaluatePolicy(reshape([-0.1515581,-0.6],2,1,1))
ans = -1
>> evaluatePolicy(reshape([-0.1515581,100],2,1,1))
ans = -1
>> evaluatePolicy(reshape([-0.1515581,-100],2,1,1))
ans = -1
I was expecting the output to be between 0 and 1 as defined in
numAct = 1;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',0 ,'UpperLimit', 1);
actionInfo.Name = 'sine_amplitude';
.
Does the approach described in:
not consider the ActionInfo ?
The output for
type evaluatePolicy.m
returns
>> type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 22-Sep-2021 19:49:51
action1 = localEvaluate(observation1);
end
%% Local Functions
function action1 = localEvaluate(observation1)
persistent policy
if isempty(policy)
policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
action1 = predict(policy, observation1);
while it states in
that the output should be something more similar to
function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 23-Feb-2021 18:52:32
actionSet = [-10 10];
% Select action from sampled probabilities
probabilities = localEvaluate(observation1);
% Normalize the probabilities
p = probabilities(:)'/sum(probabilities);
% Determine which action to take
edges = min([0 cumsum(p)],1);
edges(end) = 1;
[~,actionIndex] = histc(rand(1,1),edges); %#ok<HISTC>
action1 = actionSet(actionIndex);
end
%% Local Functions
function probabilities = localEvaluate(observation1)
persistent policy
if isempty(policy)
policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
observation1 = observation1(:)';
probabilities = predict(policy, observation1);
end
.
In this output i can see a parameter
actionSet = [-10 10];
which considers the action boundaries as it seems.
In my example this is missing.

Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by