What is the best activation function to get action between 0 and 1 in DDPG network?

24 Ansichten (letzte 30 Tage)

Sayak Mukherjee am 13 Okt. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/613031-what-is-the-best-activation-function-to-get-action-between-0-and-1-in-ddpg-network

Kommentiert: awcii am 28 Jul. 2023

Akzeptierte Antwort: Emmanouil Tzorakoleftherakis

I am using DDPG network to run a control algorithm which has inputs (actions of RL agent, 23 in total) varying between 0 and 1. I an defining this using rlNumericSpec

actInfo = rlNumericSpec([numAct 1],'LowerLimit',0,'UpperLimit', 1);

Then I am using tanhLayer in the actor network (similar to bipedal robot example) and then using

actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, 'GradientThreshold',1,'L2RegularizationFactor',1e-5);
actor = rlRepresentation(actorNetwork,env.getObservationInfo,env.getActionInfo, 'Observation',{'observation'},  'Action',{'ActorTanh1'},actorOptions);

But i feel that the model is only taking the extreme options ie mostly 0 and 1.

Will it be better to use a sigmoid function to get better action estimates?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Akzeptierte Antwort

Emmanouil Tzorakoleftherakis am 15 Okt. 2020

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/613031-what-is-the-best-activation-function-to-get-action-between-0-and-1-in-ddpg-network#answer_514888

Hello,

With DDPG, a common thing to do in the final 3 layers of the actor is to use a fully connected layer, a tanh layer and a scaling layer. Tanh will get the ouput of that layer between -1 and 1 and then you can use the scaling layer to scale/shift values as needed based on the specifications of the actuator in your problem.

It seems the problem here is due to noise that is being added during training with DDPG to allow sufficient exploration (for example see step 1 here). The default noise options have a pretty high variance, so when this is added to the output of the tanh layer, it ends up outside the [0, 1] range and is being clipped. This is why you are only getting the two extremes.

Try adjusting the DDPG noise options, and particularly the variance (make it smaller, e.g. <=0.1). Also, see here for some best practices when choosing noise parameters.

Hope that helps

12 Kommentare
10 ältere Kommentare anzeigen10 ältere Kommentare ausblenden

Sayak Mukherjee am 15 Okt. 2020

Bearbeitet: Sayak Mukherjee am 15 Okt. 2020

I should have been clearer

actInfo = rlNumericSpec([numAct 1],'LowerLimit',0,'UpperLimit', 1);
actInfo.Name = 'STIM'
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);

And then I am defining the actornetwork

actorNetwork = [
    imageInputLayer([numObs 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(actorLayerSizes(1), 'Name', 'ActorFC1', ...
            'Weights',2/sqrt(numObs)*(rand(actorLayerSizes(1),numObs)-0.5), ... 
            'Bias',2/sqrt(numObs)*(rand(actorLayerSizes(1),1)-0.5))
    reluLayer('Name', 'ActorRelu1')
    fullyConnectedLayer(actorLayerSizes(2), 'Name', 'ActorFC2', ... 
            'Weights',2/sqrt(actorLayerSizes(1))*(rand(actorLayerSizes(2),actorLayerSizes(1))-0.5), ... 
            'Bias',2/sqrt(actorLayerSizes(1))*(rand(actorLayerSizes(2),1)-0.5))
    reluLayer('Name', 'ActorRelu2')
    fullyConnectedLayer(numAct, 'Name', 'ActorFC3', ... 
            'Weights',2*5e-3*(rand(numAct,actorLayerSizes(2))-0.5), ... 
            'Bias',2*5e-5*(rand(numAct,1)-0.5))                       
    tanhLayer('Name','ActorTanh1')
    ];
% Create actor representation
actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, ...
                                       'GradientThreshold',1,'L2RegularizationFactor',1e-5);
actor = rlRepresentation(actorNetwork,env.getObservationInfo,env.getActionInfo, ... 
                         'Observation',{'observation'}, ...
                         'Action',{'ActorTanh1'},actorOptions);

So my question is do I need a separate scaling layer after tanh layer even though I have defined lowerlimit as 0 in actInfo. My actions fluctuated between -1 and 1 with this architecture. If I use sigmoid function then I get the action between 0 and 1.