Generate Policy Block for Deployment
This example shows how to generate a Policy block ready for deployment from an agent object. You generate the policy from the agent in Train TD3 Agent for PMSM Control, then simulate it to validate its performance. The policy is simulated to validate performance. If Embedded Coder® is installed, a software-in-the-loop (SIL) simulation is run to validate the generated code of the policy.
In general, the workflow for the deployment of a reinforcement learning policy via a Simulink® model is:
Train the agent (see Train TD3 Agent for PMSM Control).
Generate a Policy block from the trained agent.
Replace the RL Agent block with the Policy block.
Configure the model for code generation.
Simulate the policy and verify policy performance.
Generate code for the policy, simulate the generated code, and verify policy performance.
Deploy to hardware for testing.
In this example, you do steps 2 through 6.
Load the motor parameters along with the trained TD3 agent.
sim_data;
### The Lq is observed to be lower than Ld. ### ### Using the lower of these two for the Ld (internal variable) ### ### and higher of these two for the Lq (internal variable) for computations. ### ### The Lq is observed to be lower than Ld. ### ### Using the lower of these two for the Ld (internal variable) ### ### and higher of these two for the Lq (internal variable) for computations. ### model: 'Maxon-645106' sn: '2295588' p: 7 Rs: 0.2930 Ld: 8.7678e-05 Lq: 7.7724e-05 Ke: 5.7835 J: 8.3500e-05 B: 7.0095e-05 I_rated: 7.2600 QEPSlits: 4096 N_base: 3476 N_max: 4300 FluxPM: 0.0046 T_rated: 0.3471 PositionOffset: 0.1650 model: 'BoostXL-DRV8305' sn: 'INV_XXXX' V_dc: 24 I_trip: 10 Rds_on: 0.0020 Rshunt: 0.0070 CtSensAOffset: 2295 CtSensBOffset: 2286 CtSensCOffset: 2295 ADCGain: 1 EnableLogic: 1 invertingAmp: 1 ISenseVref: 3.3000 ISenseVoltPerAmp: 0.0700 ISenseMax: 21.4286 R_board: 0.0043 CtSensOffsetMax: 2500 CtSensOffsetMin: 1500 model: 'LAUNCHXL-F28379D' sn: '123456' CPU_frequency: 200000000 PWM_frequency: 5000 PWM_Counter_Period: 20000 ADC_Vref: 3 ADC_MaxCount: 4095 SCI_baud_rate: 12000000 V_base: 13.8564 I_base: 21.4286 N_base: 3476 T_base: 1.0249 P_base: 445.3845
load("rlPMSMAgent.mat","agent");
Generate Policy Block
Open the Simulink model used for training the TD3 agent.
mdl_rl = "mcb_pmsm_foc_sim_RL";
open_system(mdl_rl);
Open the subsystem containing the RL Agent block.
agentblk = mdl_rl + ... "/Current Control/Control_System" + ... "/Closed Loop Control/Reinforcement Learning/RL Agent"; open_system(get_param(agentblk,"Parent"));
To create a deployable model, you replace the RL Agent block with a Policy block.
Set the agent's UseExplorationPolicy
property to false so the generated policy takes the greedy action at each time step. Generate the policy block using generatePolicyBlock
and specify the name of the MAT-file containing the policy data for the block.
% To ensure that the generated policy is greedy, % set UseExplorationPolicy to false agent.UseExplorationPolicy = false; % Specify the MAT-file name for the policy data fname = "PMSMPolicyBlockData.mat"; % Delete the file if it already exists if isfile(fname) delete(fname); end % Generate the block and the policy data generatePolicyBlock(agent,MATFileName=fname)
Alternatively, you can generate the policy block can be generated by clicking Generate greedy policy block from the block mask. Use open_system(agentblk)
to open the RL Agent block mask, or simply double-click the block.
Simulate the Policy
For this example, the Policy block has already replaced the RL Agent block inside of the pmsm_current_control
model. This model has been configured for code generation, and the Policy block loads the trained policy from PMSMPolicyBlockData.mat
.
mdl_current_ctrl = "pmsm_current_control";
open_system(mdl_current_ctrl);
Obtain the path of the policy block in the Simulink model.
policyblk = mdl_current_ctrl + ... "/Current Control/Control_System" + ... "/Closed Loop Control/Reinforcement Learning/Policy";
Use open_system(policyblk)
to open the Policy block mask, or simply double-click the block.
The model mcb_pmsm_foc_sim_policy references pmsm_current_control using a model reference block. Simulate the top-level model and plot the responses for the inner and outer control loops.
Setup the Simulation Data Inspector (SDI).
Simulink.sdi.clear; Simulink.sdi.setSubPlotLayout(3,1);
Open the model and get the path to the Current Control
block.
mdl_policy = "mcb_pmsm_foc_sim_policy"; open_system(mdl_policy); current_ctrl_blk = mdl_policy + "/Current Control";
To temporarily change the model and easily run multiple simulations with such changes, Simulink.SimulationInput
(Simulink) object.
in = Simulink.SimulationInput(mdl_policy);
Simulate the model with the current controller, run the simulation in normal mode
in = setBlockParameter(in,current_ctrl_blk, ... "SimulationMode","Normal"); out_sim = sim(in);
Get results from the latest SDI run.
runSim = Simulink.sdi.Run.getLatest;
Extract the outer control loop signals.
speedSim = getSignalsByName(runSim,"Speed_fb" ); speedRefSim = getSignalsByName(runSim,"Speed_Ref");
Plot both signals.
plotOnSubPlot(speedSim ,1,1,true); plotOnSubPlot(speedRefSim,1,1,true);
Extract the inner control loop signals.
idSim = getSignalsByName(runSim,"id" ); iqSim = getSignalsByName(runSim,"iq" ); idRefSim = getSignalsByName(runSim,"id_ref"); iqRefSim = getSignalsByName(runSim,"iq_ref");
Plot the extracted signals.
plotOnSubPlot(idSim ,2,1,true); plotOnSubPlot(idRefSim,2,1,true); plotOnSubPlot(iqSim ,3,1,true); plotOnSubPlot(iqRefSim,3,1,true);
Open the Simulation Data Inspector.
Simulink.sdi.view;
Validate Generated Code for Policy
If Embedded Coder is installed, the current controller model reference can be run in SIL mode. Running the current controller in SIL mode generates code for the current controller model, including the policy block.
The Simulink model is configured only on Windows®, so display a message and stop execution when attempting to run on a different operating system.
if ~ispc disp("The model ""pmsm_current_control"" is configured " + ... "for SIL simulations only on Windows system.") return end
Simulate the model with the current controller.
Enable SIL mode.
in = setBlockParameter(in,current_ctrl_blk, ... "SimulationMode","Software-in-the-loop");
Simulate the model. Use evalc
to capture the text output from code generation, for possible later inspection.
txt_out = evalc("out_sil = sim(in)");
Get results from the latest SDI run.
runSIL = Simulink.sdi.Run.getLatest;
Extract the speed response when run in SIL mode.
speedSIL = getSignalsByName(runSIL,"Speed_fb");
Compare the SIL response to the simulated response. The SIL response should be close to the response simulated in normal mode.
speedSim.AbsTol = 1e-3; cr = Simulink.sdi.compareSignals(speedSim.ID,speedSIL.ID);
Display the largest signal difference.
cr.MaxDifference
ans = 5.0485e-05
Open the Simulation Data Inspector.
Simulink.sdi.view;
Once you are satisfied with the performance of the policy in simulation, you can use an appropriate target/hardware support package to deploy the policy to hardware.