How to add stability constraint (eigenvalues of closed loop system A-BK < 0) in DDPG-based LQR controller

11 Ansichten (letzte 30 Tage)
Hello Everyone,
I am trying to train an agent for LQR type control. The system is in state space formate as: Ax+Bu, I wanted to know, is there any way I can add the stability constraint during training?. Meaning I wanted to ensure that the feedback "K" (the weights of the actor network) the actor compute, for that the eigenvalues of the closed loop system are always less then zero. My actor and critic models are given as follows:
%% Critic neural network
obsPath = featureInputLayer(obsInfo.Dimension(1),Name="obsIn");
actPath = featureInputLayer(actInfo.Dimension(1),Name="actIn");
commonPath = [
concatenationLayer(1,2,Name="concat")
quadraticLayer
fullyConnectedLayer(1,Name="value", ...
BiasLearnRateFactor=0,Bias=0)
];
% Add layers to layerGraph object
criticNet = layerGraph(obsPath);
criticNet = addLayers(criticNet,actPath);
criticNet = addLayers(criticNet,commonPath);
% Connect layers
criticNet = connectLayers(criticNet,"obsIn","concat/in1");
criticNet = connectLayers(criticNet,"actIn","concat/in2");
criticNet = dlnetwork(criticNet);
critic = rlQValueFunction(criticNet, ...
obsInfo,actInfo, ...
ObservationInputNames="obsIn",ActionInputNames="actIn");
%% Actor neural network
Biass = zeros(actInfo.Dimension(1),1); % no biasing linear actor
actorNet = [
featureInputLayer(obsInfo.Dimension(1))
fullyConnectedLayer(actInfo.Dimension(1), ...
BiasLearnRateFactor=0,Bias=Biass)
];
actorNet = dlnetwork(actorNet);
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
getAction(actor,{rand(obsInfo.Dimension)})
Any suggestions would be much appreciated.
Thanks,
Nadeem

Antworten (1)

Kartik Saxena
Kartik Saxena am 23 Nov. 2023
Hi,
I understand you want to ensure that the stability of a linear system by constraining the 'eigen' values of the closed-loop system (A - BK) have negative real parts.
There are some approaches you can take to encourage the stability of the learned control policy:
1. Reward Shaping
Modify the reward function to penalize unstable behavior. This can be done by detecting when the system is becoming unstable (e.g., by monitoring the magnitude of the states or their growth rate) and assigning a large negative reward.
2. Custom Training Loop
Implement a custom training loop where you can check the eigenvalues of (A - BK) after each update of the actor network. If the updated policy results in an unstable system, you can revert to the previous policy or apply a penalty to the reward.
3. Projection Layer
Add a custom layer to the actor network that projects the output (the control matrix K) onto the set of stabilizing controllers. This is a complex approach because the set of stabilizing controllers is not convex, but some approximate methods might be applied.
Use a Lyapunov function to ensure stability. You can design a neural network to approximate a Lyapunov function and train it alongside the actor to ensure that the learned policy decreases the Lyapunov function, implying stability.
5. Pre-Training with LQR
Pre-train the actor network to output the optimal LQR gain K before starting the reinforcement learning training. This provides a good starting point that is known to be stable, and subsequent training can fine-tune the policy from there.
6. Constrained Optimization
Use constrained optimization techniques during training to ensure that the policy updates satisfy the stability constraint. This is a more advanced approach and may require significant modification to the training algorithm.
Refer to the following MathWorks documentations link to know more about these:
I hope this resolves your issue.
  4 Kommentare
Kartik Saxena
Kartik Saxena am 24 Nov. 2023
To integrate Lyapunov stability into the reward function of a reinforcement learning (RL) algorithm in MATLAB, you would need to define a Lyapunov function that is suitable for your system. The Lyapunov function should be positive definite and its derivative along the system trajectories should be negative definite.
Below is a conceptual example of how you might incorporate a Lyapunov function into the reward function. This example assumes you have a linear system and a quadratic Lyapunov function 'V(x) = x'Px', where 'P' is a positive definite matrix. The goal is to ensure that the eigenvalues of the closed-loop system matrix (A-BK) have negative real parts, where 'K' is the control gain matrix.
This code is not a complete solution but rather a starting point to illustrate how you might begin to integrate a Lyapunov function into your RL framework. You will likely need to refine the Lyapunov function and adjust the reward computation to fit your specific system and learning algorithm.
Refer to the following code snippet to get an idea of how this can be achieved:
function reward = lyapunovStabilityReward(x, u, A, B, K, P)
% x: Current state vector
% u: Control action (output of the actor network)
% A, B: System matrices
% K: Control gain matrix (reshaped from the actor's output)
% P: Positive definite matrix for Lyapunov function
% Closed-loop system matrix
A_cl = A - B * K;
% Check if the closed-loop system is stable
eigVals = eig(A_cl);
isStable = all(real(eigVals) < 0);
% Compute the Lyapunov function value
V = x' * P * x;
% Compute the derivative of the Lyapunov function along the system trajectory
V_dot = x' * (A_cl' * P + P * A_cl) * x;
% Check if the derivative of the Lyapunov function is negative definite
isLyapunovDecreasing = V_dot < 0;
% Define the reward
if isStable && isLyapunovDecreasing
% Positive reward for stability and decreasing Lyapunov function
reward = 100;
else
% Negative reward for instability or non-decreasing Lyapunov function
reward = -1000;
end
end
To use this reward function, you would need to call it at each step of your training process, passing in the current state 'x', the action 'u', the system matrices 'A' and 'B', and the control gain matrix 'K'. The positive definite matrix 'P' can be found by solving the Lyapunov equation for a given 'A' matrix:
% Assuming a stable A matrix for demonstration purposes
A = [-1 0; 0 -2];
Q = eye(size(A)); % Choose a positive definite Q matrix
P = lyap(A, Q); % Solve Lyapunov equation to find P
Please note that this code is meant to serve as a conceptual guide and will need to be adapted to fit into your specific RL training framework. The reward function must be integrated with the rest of your RL training loop, and you may need to adjust the parameters and the Lyapunov function to ensure that it is appropriate for your system. Additionally, the reward values (100 and -1000 in the example) are arbitrary and should be tuned based on the specifics of your problem and learning algorithm.
I hope this resolves your issue.
Muhammad Nadeem
Muhammad Nadeem am 24 Nov. 2023
Thank you for the response. The main issue is actually how to get the 'K' (the actor network weights) in every step of the episode? as I cannot pass the 'agent' to the 'step' function and then use
actor = getActor(agent);
params = getLearnableParameters(actor);
and so on as this doesnt work.
Any idea how can I extract K in every episode step?
Thanks again,

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Matrix Computations finden Sie in Help Center und File Exchange

Produkte


Version

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by