I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works? Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q} Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+... Any help is highly appreciated.

number of look ahead steps in DDPG Agent Options

3 Ansichten (letzte 30 Tage)

ALOK RANJAN SWAIN am 21 Feb. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/506744-number-of-look-ahead-steps-in-ddpg-agent-options

Kommentiert: Dingshan Sun am 1 Sep. 2022

I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works?

Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q}
Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+...

Any help is highly appreciated.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Antworten (1)

Anh Tran am 1 Mär. 2020

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/506744-number-of-look-ahead-steps-in-ddpg-agent-options#answer_417996

I am not sure what does reward sampling mean. "NumStepsToLookAhead" in rlDDPGAgentOptions changes the critic's target values in step 5 of DDPG training algorithm.

Assume g is the discount factor, the critic target will be as followed

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

ALOK RANJAN SWAIN am 4 Mär. 2020

Thanks for your help.??

Dingshan Sun am 1 Sep. 2022

Could you give a hint how R_t,R_t_1,,R_t+2,...,R_t+n-1 can be obtained in an online off-policy algorithm? Especially for DRL methods that use an experience replay?

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Kategorien

Control Systems Reinforcement Learning Toolbox Environments

Mehr zu Environments finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by