Constraints on the actor action outputs in DDPG RL LQR type control

Question

Muhammad Nadeem am 17 Okt. 2023

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2034529-constraints-on-the-actor-action-outputs-in-ddpg-rl-lqr-type-control

Kommentiert: Emmanouil Tzorakoleftherakis am 2 Nov. 2023

Hello Everyone,

I am trying to train an agent for LQR type control. My observation are 59x1 vector of states and my control input is 6x1 vector. Now my control inputs are voltage and power setpoints which need to contrainted. My inputs are [voltage1, power1, voltage2, power2, voltage3, power3]. Now the voltages need to be constrained betweem 0.95--1.05 and each power should be positive with Pmax seperately for each one. I am bit confused on how to enforce these constaints in actor neural network. Any help will be appreciated. My sample code is as follows:

%% Critic neural network

obsPath = featureInputLayer(obsInfo.Dimension(1),Name="obsIn");

actPath = featureInputLayer(actInfo.Dimension(1),Name="actIn");

commonPath = [

concatenationLayer(1,2,Name="concat")

quadraticLayer

fullyConnectedLayer(1,Name="value", ...

BiasLearnRateFactor=0,Bias=0)

];

% Add layers to layerGraph object

criticNet = layerGraph(obsPath);

criticNet = addLayers(criticNet,actPath);

criticNet = addLayers(criticNet,commonPath);

% Connect layers

criticNet = connectLayers(criticNet,"obsIn","concat/in1");

criticNet = connectLayers(criticNet,"actIn","concat/in2");

criticNet = dlnetwork(criticNet);

critic = rlQValueFunction(criticNet, ...

obsInfo,actInfo, ...

ObservationInputNames="obsIn",ActionInputNames="actIn");

getValue(critic,{rand(obsInfo.Dimension)},{rand(actInfo.Dimension)})

%% Actor neural network

Biass = zeros(6,1); % no biasing linear actor

actorNet = [

featureInputLayer(obsInfo.Dimension(1))

fullyConnectedLayer(actInfo.Dimension(1), ...

BiasLearnRateFactor=0,Bias=Biass)

];

actorNet = dlnetwork(actorNet);

actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);

agent = rlDDPGAgent(actor,critic);

getAction(agent,{rand(obsInfo.Dimension)}) %% getting error while executing this line of command

%%

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Shivansh am 25 Okt. 2023

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2034529-constraints-on-the-actor-action-outputs-in-ddpg-rl-lqr-type-control#answer_1340181

In MATLAB Online öffnen

Hi Muhammad,

I understand that you are trying to put constraints on values for voltages and power in the actor neural network. I can provide you an example approach and you can modify it according to your model environment and problem requirement.

Since the voltage and power constraints are different, it's better to separate them into two different layers in your actor network. This way, you can apply different constraints to each output.

Now, to enforce the voltage constraints between 0.95 and 1.05, you can add a custom activation layer after the voltage output layer. This activation layer should clamp the output values within the desired range. Here's an example of how you can add the voltage constraints:

voltageOutputLayer = fullyConnectedLayer(2, ...
'BiasLearnRateFactor', 0, 'Bias', Biass(1:2));
voltageActivationLayer = customClampLayer(0.95, 1.05, 'VoltageClamp');
powerOutputLayer = fullyConnectedLayer(4, ...
'BiasLearnRateFactor', 0, 'Bias', Biass(3:6));
actorNet = [
featureInputLayer(obsInfo.Dimension(1))
voltageOutputLayer
voltageActivationLayer
powerOutputLayer
];

In this example, customClampLayer is a custom layer that clamps the values between a specified range. You can implement it as follows:

classdef customClampLayer < nnet.layer.Layer
    properties
        LowerBound
        UpperBound
    end
    methods
        function layer = customClampLayer(lowerBound, upperBound, name)
            layer.LowerBound = lowerBound;
            layer.UpperBound = upperBound;
            layer.Name = name;
        end
        function Z = predict(layer, X)
            Z = max(layer.LowerBound, min(layer.UpperBound, X));
        end
    end
end

Similarly, you can enforce the power constraints individually for each power output, you can add a custom activation layer after each power output layer. This activation layer should ensure that the power values are positive and do not exceed the maximum power limit.

The given custom activation layers can achieve the required behaviour and constraint the power and voltage values in the given actor neural network. The above approach and code snippet is a starting point, and you can modify it to fit your requirement.

For more information regarding the custom deep learning layer, you can refer to the following documentation https://www.mathworks.com/help/deeplearning/ug/define-custom-deep-learning-intermediate-layers.html.

Hope it helps!

2 Kommentare
Keine anzeigenKeine ausblenden

Muhammad Nadeem am 29 Okt. 2023

Perfect. Thank you so much for the detailed response

Emmanouil Tzorakoleftherakis am 2 Nov. 2023

By the way, another way would be to add upper and lower limits in the action space definition. That said, in order to avoid that agent outputs always hitting the constraints you set, you need to also add a tanh and a scaling layer as final layers of your actor. That way you can scale the output in the desired range before any final clamping happens

Melden Sie sich an, um zu kommentieren.

Constraints on the actor action outputs in DDPG RL LQR type control

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Constraints on the actor action outputs in DDPG RL LQR type control

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden