Any RL Toolbox A3C example?

Question

0 Stimmen

Hi.

I'm currently trying to implement an actor-critic-based model with pixel input on the R2021a version.

Since I want to consider temporal context as well, I'm trying to combine it with LSTM.

I came up with three options: (1) DDPG with LSTM, (2) A3C with LSTM, and (3) Batched A2C with LSTM.

I've tried all of them, but all of them were not possible.

The reasons they failed are the following.

(1) DDPG with LSTM

The sequenceInputLayer does not allow another input path. The LSTM requires sequenceInputLayer, and DDPG requires multiple inputs (state and action) for the critic network. There is a conflict.

(2) A3C with LSTM

There's no A3C example or guideline of how to implement A3C. In the A2C agent guideline, it's saying this also supports A3C but I cannot find anything about A3C.

(3) Batched A2C with LSTM

There's no option to set a batch. The training without batch (replay buffer + mini-batch) is not giving successful training.

So my further questions including the titled questions are:

(1) Is there any way or example of DDPG + LSTM?

(2) Is there any example of A3C?

(3) Is there any way to set a batch option for A2C?

Thanks for reading the long questions.

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis am 5 Apr. 2021

Bearbeitet: Emmanouil Tzorakoleftherakis am 5 Apr. 2021

In MATLAB Online öffnen

1 Stimme

Hello,

To get an idea of what an actor/critic architecture may look like, you can use the 'default agent' feature that creates a default network architecture for you by parsing observation and action info.

To your questions

1) The following creates an agent with multiple input channels

% load predefined environment
env = rlPredefinedEnv('SimplePendulumModel-Continuous')
% obtain observation and action specifications
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
agentDDPG = rlDDPGAgent(obsInfo,actInfo,rlAgentInitializationOptions('UseRNN',true))

2)

agentAC = rlACAgent(obsInfo,actInfo,rlAgentInitializationOptions('UseRNN',true))

Note that "getModel" does not work for RNN agents that have multiple input paths yet. You can get an idea of the architectures by typing

help rlDDPGAgent
help rlACAgent

3) The AC implementation in RL Toolbox is on-policy, which explains the lack of a replay buffer. You can select a 'mini-batch' size though by tuning the StepsUntilDataIsSent parameter here

Hope this helps

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Heesu Kim am 5 Apr. 2021

Thaks for the useful information!

I thought the multiple inputs for critic network is not supported because the "getModel" was not working.

I can run the DDPG+LSTM model now.

Thanks a lot.

Melden Sie sich an, um zu kommentieren.

Any RL Toolbox A3C example?

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

Any RL Toolbox A3C example?

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden