Output between generated Policy and trained Agent different

Question

Victor Bayer am 22 Sep. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1458549-output-between-generated-policy-and-trained-agent-different

Bearbeitet: Victor Bayer am 22 Sep. 2021

In MATLAB Online öffnen

Dear Mathworks Team,

I have trained a DDPG-Agent which recieves 2 observations.

By using the approach described in:

https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rldqnagent.generatepolicyfunction.html

i generated a function evaluatePolicy.m which accepts an input of shape (2,1,1) and outputs a scalar. However the output differs from that of my Agent during training.

During the training, the following lines define certain action-properties in the definition of the Environment and Training (createSineAgent.m) process (not in the neural-Net definition of the Agent (createDDGPNetworks.m).

numAct = 1;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',0 ,'UpperLimit', 1);
actionInfo.Name = 'sine_amplitude';

This prevents that agent outputs bigger than 1 and smaller than 0 are applied. The output during training is always between 1 and 0 and clipped at those values.

However, the output of the corresponding evaluatePolicy.m seems to range between -1 and 1 and not 0 and 1. Why is that?

Examples:

>> evaluatePolicy(reshape([-0.1515581,-0.1515581],2,1,1))
ans = 0.9986
>> evaluatePolicy(reshape([-0.1515581,-0.6],2,1,1))
ans = -1
>> evaluatePolicy(reshape([-0.1515581,100],2,1,1))
ans = -1
>> evaluatePolicy(reshape([-0.1515581,-100],2,1,1))
ans = -1

I was expecting the output to be between 0 and 1 as defined in

numAct = 1;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',0 ,'UpperLimit', 1);
actionInfo.Name = 'sine_amplitude';

.

Does the approach described in:

https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rldqnagent.generatepolicyfunction.html

not consider the ActionInfo ?

The output for

type evaluatePolicy.m

returns

>> type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 22-Sep-2021 19:49:51
action1 = localEvaluate(observation1);
end
%% Local Functions
function action1 = localEvaluate(observation1)
persistent policy
if isempty(policy)
	policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
action1 = predict(policy, observation1);

while it states in

https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rldqnagent.generatepolicyfunction.html

that the output should be something more similar to

function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 23-Feb-2021 18:52:32
actionSet = [-10 10];
% Select action from sampled probabilities
probabilities = localEvaluate(observation1);
% Normalize the probabilities
p = probabilities(:)'/sum(probabilities);
% Determine which action to take
edges = min([0 cumsum(p)],1);
edges(end) = 1;
[~,actionIndex] = histc(rand(1,1),edges); %#ok<HISTC>
action1 = actionSet(actionIndex);
end
%% Local Functions
function probabilities = localEvaluate(observation1)
persistent policy
if isempty(policy)
	policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
observation1 = observation1(:)';
probabilities = predict(policy, observation1);
end

.

In this output i can see a parameter

actionSet = [-10 10];

which considers the action boundaries as it seems.

In my example this is missing.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Output between generated Policy and trained Agent different

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Output between generated Policy and trained Agent different

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden