How to let the reinforcement learning agent know exactly what action it takes?

52 Ansichten (letzte 30 Tage)
Aaron Bramhasta
Aaron Bramhasta am 5 Nov. 2024 um 17:06
Beantwortet: Maneet Kaur Bagga am 21 Nov. 2024 um 6:52
Dear Matlab Experts,
I am currently running a reinforcement learning simulation, integrated with a discrete events system of simulink. My main simulation of the discrete events utilizes bus element containing multiple entites that some will serve as an observation for the RL agent (via conversion entity -> signal) and to impose the action the RL agent chooses (via conversion signal -> entity). I imposed some policy in the DES where given a certain requirements, the entity value will be assigned, that will switch an entity gate to determine which course of action to take. However, my reinforcement learning agent does not seem to understand this rule, as it assigns the entity value randomly from the values available. Is there a way to apply this rule that is present in the DES, to somehow make the same rule understandable by the RL agent?
Thank you so much in advance! I am attaching my model for reference.
Best regards,
Aaron.

Antworten (1)

Maneet Kaur Bagga
Maneet Kaur Bagga am 21 Nov. 2024 um 6:52
Hi,
As per my understanding, the issue is encountered because your DES contains specific policies, such as switching gates based on entity attributes. These rules are likely hard-coded and not inherently part of the RL environment's observation or reward structure. The RL agent explores actions based on the provided observations and the learned policy.
Please refer to the following workaround for the same:
Incorporate Rule into Observations: Add flags or variables that indicate the rule's state (e.g "Gate should switch" = 1/0). Ensure these conditions are dynamically updated during simulation.
Augment Reward Structure: Add a penalty or reward for actions that align with or violate the DES rules. This encourages the RL agent to learn behaviors aligned with the rules.
reward = reward + (agentAction == expectedAction) * rewardFactor;
Pretrain the Agent: Use supervised learning to pretrain the RL agent to follow the DES rules as a baseline policy. Later, fine-tune with reinforcement learning.
Custom Environment Dynamics: Modify the environment (DES model) such that the DES rules are enforced during interaction. For instance, override the agent’s selected action if it violates a rule.
if violatesRule(action, currentState)
action = enforceRule(currentState);
end
Regularization: Include constraints in the training process that mimic the DES rules. For example, ensure that the policy network outputs actions adhering to the rules.
loss = loss + ruleViolationPenalty * countViolations(actions, state);
Rule-Based Hybrid Approach: Use "rlAgent.getAction" to test the agent's action in specific scenarios and compare it against the DES policy to identify mismatches.
Please refer to the following MathWorks documentation of "rlAcAgent.getAction" for better understanding:
Hope this helps!

Kategorien

Mehr zu Discrete-Event Simulation finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by