why I get a different action result every new time with same sample observations after deploying trained RL policies?

Question

de y am 23 Feb. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/753454-why-i-get-a-different-action-result-every-new-time-with-same-sample-observations-after-deploying-tra

Bearbeitet: liang zhang am 2 Mär. 2022

Akzeptierte Antwort: Emmanouil Tzorakoleftherakis

load("agent0218_300016_40000.mat","agent");

obsInfo = getObservationInfo(agent);

actInfo = getActionInfo(agent);

ResetHandle = @() myResetFunction(test_sss);

StepHandle = @(Action,LoggedSignals) myStepFunction(Action,LoggedSignals,test_sss);

envT = rlFunctionEnv(obsInfo,actInfo,StepHandle,ResetHandle);

simOpts = rlSimulationOptions('MaxSteps',size(test_sss,1));

experience = sim(envT,agent,simOpts);

ac3=squeeze(experience.Action.bs.Data);

%******************************************************************************

generatePolicyFunction(agent);

%******************************************************************************

for iii=1:size(ac3,1)

observation1=test_sss{iii,:};

action1(iii,1) = evaluatePolicy(observation1);

end

sum(abs(ac3-action1))

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Emmanouil Tzorakoleftherakis am 23 Feb. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/753454-why-i-get-a-different-action-result-every-new-time-with-same-sample-observations-after-deploying-tra#answer_631234

Which agent are you using? Some agents are stochastic, meaning that the output is sampled based on probability distributions so by construction they won't give you the same result.

Another possible reason is the reset function. It seems you are saving simulation data and running inference again, but every time you call 'sim', the reset function is called first. So if there are any components that randomize initial conditions/parameters, then you are not comparing with the same data.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

liang zhang am 2 Mär. 2022

Bearbeitet: liang zhang am 2 Mär. 2022

I also encountered the same problem when I used the DDPG agent for verification, my reset function doesn't randomize initial any conditions/parameters，I guess if the trained DDPG agent also has its own noise? Shouldn't a trained agent be a fixed set of neural network parameters?

Melden Sie sich an, um zu kommentieren.

Answer 2

de y am 24 Feb. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/753454-why-i-get-a-different-action-result-every-new-time-with-same-sample-observations-after-deploying-tra#answer_631674

Thanks @Emmanouil Tzorakoleftherakis a lot.

I am using PPO and SAC agent, the same question came out. My codes indicated the agent had trainned to a satisfied and balanced result, I want to use it to decide action. But my wonder is that SIM is one of simulation way,whereas generatePolicyFunction() and evaluatePolicy is another way, my observations of every step is the same,why every running evaluatePolicy with the same observations happened , the different action result with SIM() came out. It confused me because that there didn't had any components that randomize initial conditions/parameters

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

why I get a different action result every new time with same sample observations after deploying trained RL policies?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

why I get a different action result every new time with same sample observations after deploying trained RL policies?

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden