DDPG does not converge

Question

Esan freedom am 17 Mai 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2119806-ddpg-does-not-converge

Kommentiert: Alan am 27 Jun. 2024

simulink.PNG

Hello

I am using a DDPG agent that generates 4 continuous actions (2 positive values- 2negative values). The summation of 2 positive action values must be equal to the positive part of a reference value, and the summation of 2 negative action values must be equal to the negative part of the reference value. However, the agent can't learn to track the reference. I have tried different reward functions and hyperparameters, but after a while it always chooses the maximum values of defined action ranges ([-1 -1 1 1]).

Any suggestion I appreciate

open_system(mdl)

obsInfo = rlNumericSpec([2 1]);

obsInfo.Name = 'observations';

numObservations = obsInfo.Dimension(1);

actInfo = rlNumericSpec([4 1],...

LowerLimit=[-1 -1 0 0]',...

UpperLimit=[0 0 1 1]');

numActions = actInfo.Dimension(1);

%Build the environment interface object

agentblk = 'MEMG_RL/RL Agent';

env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);

Ts = 2e-2;

Tf = 60;

statepath = [featureInputLayer(numObservations , Name = 'stateinp')

fullyConnectedLayer(96,Name = 'stateFC1')

reluLayer

fullyConnectedLayer(74,Name = 'stateFC2')

reluLayer

fullyConnectedLayer(36,Name = 'stateFC3')];

actionpath = [featureInputLayer(numActions, Name = 'actinp')

fullyConnectedLayer(72,Name = 'actFC1')

reluLayer

fullyConnectedLayer(36,Name = 'actFC2')];

commonpath = [additionLayer(2,Name = 'add')

fullyConnectedLayer(96,Name = 'FC1')

reluLayer

fullyConnectedLayer(72,Name = 'FC2')

reluLayer

fullyConnectedLayer(24,Name = 'FC3')

reluLayer

fullyConnectedLayer(1,Name = 'output')];

critic_network = layerGraph();

critic_network = addLayers(critic_network,actionpath);

critic_network = addLayers(critic_network,statepath);

critic_network = addLayers(critic_network,commonpath);

critic_network = connectLayers(critic_network,'actFC2','add/in1');

critic_network = connectLayers(critic_network,'stateFC3','add/in2');

plot(critic_network)

critic = dlnetwork(critic_network);

criticOptions = rlOptimizerOptions('LearnRate',3e-04,'GradientThreshold',1);

critic = rlQValueFunction(critic,obsInfo,actInfo,...

'ObservationInputNames','stateinp','ActionInputNames','actinp');

%% actor

actorNetwork = [featureInputLayer(numObservations,Name = 'observation')

fullyConnectedLayer(72,Name = 'actorFC1')

reluLayer

fullyConnectedLayer(48,Name='actorFc2')

reluLayer

fullyConnectedLayer(36,Name='actorFc3')

reluLayer

fullyConnectedLayer(numActions,Name='output')

tanhLayer

scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))];

actorNetwork = dlnetwork(actorNetwork);

actorOptions = rlOptimizerOptions('LearnRate',3e-04,'GradientThreshold',1);

actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);

%% agent

agentOptions = rlDDPGAgentOptions(...

'SampleTime',Ts,...

'ActorOptimizerOptions',actorOptions,...

'CriticOptimizerOptions',criticOptions,...

'ExperienceBufferLength',1e6,...

'MiniBatchSize',128);

agentOptions.NoiseOptions.StandardDeviation = 0.1; %.07/sqrt(Ts) ;

agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-6;

maxepisodes = 5000;

maxsteps = ceil(Tf/Ts);

trainOpts = rlTrainingOptions(...

'MaxEpisodes',maxepisodes, ...

'MaxStepsPerEpisode',maxsteps, ...

'ScoreAveragingWindowLength',20, ...

'Verbose',false, ...

'Plots','training-progress',...

'StopTrainingCriteria','EpisodeCount',...

'StopTrainingValue',5000);

agent = rlDDPGAgent(actor,critic,agentOptions);

2 Kommentare
Keine anzeigenKeine ausblenden

Esan freedom am 20 Mai 2024

Captura de pantalla 2024-05-20 131632.png

@ Emmanouil Tzorakoleftherakis

It learns and get lost again as reward plot shows.

I aapreciate at once

Alan am 27 Jun. 2024

Hi Esan,

Could you provide the .slx file of the Simulink model.

Regards.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

DDPG does not converge

2 Kommentare
Keine anzeigenKeine ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

DDPG does not converge

2 Kommentare Keine anzeigenKeine ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

2 Kommentare
Keine anzeigenKeine ausblenden