How to do right code to simulate adaptive optimal control using value iteration

Vina Putri

31 Jan. 2023

0 Antworten

Aktualisiert 20 Mär. 2024

12 Ansichten (30 Tage)

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Ältere Kommentare anzeigen

In MATLAB Online öffnen

0 Stimmen

Hello, i am putri. i am a beginner user of matlab, and i try to simulate adaptive optimal control using value iteration.

but i kinda lost and has difficulties to define my initialize value function and code my iteration.

if someone can teach me how to do it, i would be appreciate.

here is my code i try to build. but totally far from define what i want to try to do.

close all; 
clc;
clear all;
% To simulate the system
A=[-0.0665 8 0 0; 0 -3.663 3.663 0; -6.86 0 -13.736 -13.736; 0.6 0 0 0]; %matrix A
B=[0; 0; 13.7335; 0]; %matrix B
C=eye(4); %output matrix
D=0; 
x=[0;0.1;0;0]; %initial state
% Weights component
R=eye(1);
Q=eye(4);
    
% Discretize the system
Ts=0.01;%sampling time   
sysc=ss(A,B,C,D); %continuouse system
sysd=c2d(sysc,Ts,'zoh'); %discretize with zero order hold
% Solution of Discrete-ARE
[P,K,L] = idare(sysd.A,sysd.B,Q,R,[],[]); 
P_DARE=P/100; %correction factor
x0 = [0;0.1;0;0];
t = 0:0.01:6;
x = initial(sysd,x0,t);
% Define the parameters for the value iteration algorithm
gamma = 1; % discount factor
max_iterations = 100; % maximum number of iterations
tolerance = 1e-6; % tolerance for convergence
% Define the state space of the MDP
P =  [48.0537812864796	47.7268296813581	6.03847752285629	47.7170406287952;
47.7268296813581	78.9295235076091	12.4038561385049	38.5521293459321;
6.03847752285629	12.4038561385049	5.66971382690978	3.03582894965622;
47.7170406287952	38.5521293459321	3.03582894965622	235.144841887140];
%P = idare(sysd.A,sysd.B,Q,R,[],[]); 
% Define the initial weight vector for the value function
W = [P(1,1); 2*P(1,2);	2*P(1,3); 2*P(1,4); P(2,2); 2*P(2,3); 2*P(3,2);	P(3,3);	 2*P(4,3);  P(4,4)];
WP=[W; 0];
% Value Update
x1 = x0(1);
x2 = x0(2);
x3 = x0(3);
x4 = x0(4);
state = [x1 x2 x3 x4]; % define the state
%action= [u(1) u(2) u(3) u(4)]; % define action
% Define the features of the state space
psi = [x1^2; x1*x2; x1*x2; x1*x2; x2^2; x2*x3; x2*x4; x3^2; x3*x4; x4^2];
% Initialize the value function
V = W'*psi;
% Iteration for Paremeter P
% Define the current policy
theta = [x1; x2; x3; x4];
k = idare(sysd.A,sysd.B,Q,R,[],[]);
u = -k'*theta;
h = u;
% Define the reward function
r = Q + u'*R*u;
% Define the dynamics of the MDP
g = [0; 0; 13.7335; 0];
% Initialize a vector to store the value function at each iteration
V_iterations = zeros(max_iterations, 1);
% batch Leaset Square
Fsamples=60; %length of the simulation in samples 
T=0.15; % sample time
  dphi=[2*x1  0  0   0; 
            x2  x1  0   0;
            x3   0  x1  0;
            x4   0  0  x1;
            0  2*x2  0  0;
            0   x3  x2  0;
            0   x4  0  x2;
            0   0  2*x3 0;
            0   0  x4  x3;
            0   0  0  2*x4];
% Iterate until convergence or maximum number of iterations is reached
for k=1:Fsamples
    i = 1:max_iterations;
    % Compute the value update
    V_new = r + gamma * V;
    %W_new' == inv(V_new)*psi;
    % Compute the policy improvement
    h_new = -gamma/2 * R^(-1) * g'*V_new;
    % Check for convergence
    if norm(V_new - V) < tolerance
        break;
    end
    % Update the value function and policy
    V = V_new;
    h = h_new;
end
% Plot the value function over the number of iterations
figure;
plot(1:i, V_iterations(1:i));
xlabel('Iteration');
ylabel('Value Function');

seems i confused how to represeant my algorithm into the code

looking forwad for some clue.

Thank you.

4 Kommentare
2 ältere Kommentare anzeigen 2 ältere Kommentare ausblenden

Dinh Tuan am 20 Mär. 2024

Have you solved this problem yet? I'm also having the same problem as you. Can you help me?

Sam Chak am 20 Mär. 2024

Hi @Dinh Tuan, Could you please open a new thread and post your control problem there? Simply click on 'Ask' to initiate the process. Provide the control algorithm and share the code by clicking the indentation icon