RL Training with Constraint enforcement fails after certain episodes.

Question

Vasu Sharma am 27 Aug. 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2148196-rl-training-with-constraint-enforcement-fails-after-certain-episodes

Beantwortet: Shivansh am 6 Sep. 2024

Hi,

I am trying to train an RL Agent with a simulink environment. I have the constraint enforcement block on the outputs as detailed here:

https://www.mathworks.com/help/slcontrol/ug/train-reinforcement-learning-agent-with-constraint-enforcement.html

My training seems to deterministically fail around 2300 epiodes with the following error:

Episode: 2351/15000 | Episode reward:   -12.93 | Episode steps: 1875 | Average reward:    13.88 | Step Count: 4408125 | Episode Q0:     0.64
Episode: 2352/15000 | Episode reward:    15.98 | Episode steps: 1875 | Average reward:    12.84 | Step Count: 4410000 | Episode Q0:     0.61
Error using rl.internal.train.PPOTrainer/run_internal_/nestedRunEpisode (line 335)
An error occurred while running the simulation for model 'MPC_RL_H2DF' with the following RL agent blocks:
 	MPC_RL_H2DF/RL Agent
Error in rl.internal.train.PPOTrainer/run_internal_ (line 406)
                            out = nestedRunEpisode(policy);
Error in rl.internal.train.PPOTrainer/run_ (line 40)
            result = run_internal_(this);
Error in rl.internal.train.Trainer/run (line 8)
            result = run_(this);
Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 112)
            trainResult = run(trainer);
Error in rl.internal.trainmgr.TrainingManager/run (line 4)
            result = run_(this);
Error in rl.agent.AbstractAgent/train (line 86)
    trainingResult = run(tm);
Error in run_H2DFRL_GRU (line 191)
trainingStats = train(agent, env, trainOpts);
Caused by:
    Error using rl.env.internal.reportSimulinkSimError (line 29)
    'f' must contain only finite values.
    	Error in quadprog.p
    	Error in utilSolveQP.m (line 34)
    	Error in 'MPC_RL_H2DF/Subsystem1/Constraint Enforcement1/computeSafeAction' (line 39)

2 Kommentare
Keine anzeigenKeine ausblenden

Siddharth Jawahar am 28 Aug. 2024

Hi Vasu,

Form the error log, it looks like at least one of the actions goes unbounded around episodes 2300, and thus the QP solver (used by the Constraint Enforcement block) fails to find a solution.

Best,

Sid

Vasu Sharma am 3 Sep. 2024

Hi @Siddharth Jawahar,

The block has been set up so that the constraints can be ignored if the QP is infeasible. Are you referring to the input actions?If so, these simply come from the MATLAB RL Agent block with defined limits. Could there be a reason for this to fail?

Best,

Vasu

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Shivansh am 6 Sep. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2148196-rl-training-with-constraint-enforcement-fails-after-certain-episodes#answer_1512009

Hi Vasu,

If the block is set up to ignore constraints when the quadratic program (QP) is infeasible, and you're encountering this error, it suggests that there might be some other issues in the RL model.

Make sure that the actions from the RL Agent block respect defined limits and are correctly scaled. You can check for any transformations that might introduce non-finite values.

You can verify that the fallback strategy for infeasible QPs produces valid outputs and the logic in "computeSafeAction" handles all cases. You can also log inputs, outputs, and intermediate values in the constraint block to identify patterns leading to errors.

It is difficult to state the exact reason for the error without looking at the model and environment. I will recommend working on the above suggestions and sharing more information about the model in case the issue persists.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

RL Training with Constraint enforcement fails after certain episodes.

2 Kommentare
Keine anzeigenKeine ausblenden

Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

RL Training with Constraint enforcement fails after certain episodes.

2 Kommentare Keine anzeigenKeine ausblenden

Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

2 Kommentare
Keine anzeigenKeine ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden