Collaborative DDPG/Actor-Critic Example
Ältere Kommentare anzeigen
I have currently developed a DDPG model which optimizes traffic in intersections along one direction. I am looking towards implementing four of the same model on each direction, ie North-South, South-North, East-West, and West-East, ie I would like to run 4 DDPG models simultaneously each with its own local reward function. I have attempted to combine all 4 approaches but unfortunately the model appears to confuse actions in one direction with observations in another.
For example, if the agent sends a signal to a certain vehicle in the east-west lane to change its speed while simultaneously doing the same in the north-west direction for another vehicle, the system would consider the sum of all rewards for all actions performed, resulting in optimum actions performed on one approach being overshadowed by subpar actions on another.
It is for this reason that I believe that a collaborative multiagent approach may be ideal but I cannot seem to find anything in the Matlab supporting documents to indicate how this may be done beyond very simple simulink examples. I have noted the following which still leave significant gaps:
My current model utilizes a custom environment which interfaces with another software's COM in order to generate a sample environment from which observations are taken and actions are applied. I am not currently using Simulink as a result of the need for the external traffic simulation software being used. My current system involves a rlNumericspec observation space which uses 10 variables and a continuous action space which performs 2 actions.
I would like to simultaneously run the 4 of the same DDPG agents (or other actor-critic models if necessary) which would each have their own independent reward and action space. Is this possible with the Reinforcement Learning Toolbox as of 2020 and if so how may one approach it? More specifically:
- How would one specify the 4 different sets of observations/actions and how would this be done in the same custom Constructor Function? Each one is of the form rlNumericSpec([10 1]) for a total of 40 observations and an observation space of the form rlNumericSpec([8 1],'LowerLimit',[20;20],'UpperLimit',[40;40]). I have tried following this example (Train Multiple Agents for Path Following Control - MATLAB & Simulink (mathworks.com)) for the actioninfo and obsinfo syntax, ie obsinfo = {obsinfo1, obsinfo2...) whuch thus far has returned an error.
- For applying said actions to the custom environment, how would said actions appear once the model is running? Would it simply be of the form Action1() Action2(), etc?
- How would the individual localized reward function be set within the step function. By default for a single agent the reward is simply stored as "Reward", would there be a form such that the rewards would be discretized into Reward_agent1, Reward_agent2, etc?
- Is it an absolute must to use simulink or can this be done with my existing custom environment setup?
- Are there any additional resources which may help me achieve this that I may have missed?
I understand that this is quite a large question, but I hope that this would also help others looking to use this software for more complex multi-agent applications without simulink. Thank you in advance for your assistance.
Akzeptierte Antwort
Weitere Antworten (0)
Kategorien
Mehr zu Environments finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
