PPO issue with Nans
Ältere Kommentare anzeigen
I am working on a project in which we are attempting to simulate a "flock" and insert a rogue agent that will be trained to match their behavior. Right now there is a simple reward function based on matching the position of the agents. The problem I am having is that after some amount of episodes (the amount changes when I adjust parameters like learn rate and clip factor), Nans are introduced. I have made sure that they are not created inside the plant or any part of the code I introduced, so they are appearing from the policy itself.
Train Multiple Agents to Perform Collaborative Task
This example is modifed from the multi-agent training session on a Simulink® environment example where you train two agents to collaboratively perform the task of moving an object. I changed it to have only one agent (Rogue) that tries to interact with a Flock of agents following the flocking control laws from Tanner et al 2003.
First we set all the parameter values like initial conditions, mass, interaction radius and coeffieicnts for control laws
rng(10); %seeds the random number generator so results are repeatable
rlCollaborativeTaskParams_Esposito % this is just a simple script w parameter values
Open the Simulink model if desired.
mdl = "rlCollaborativeTask_esposito";
open_system(mdl)
Environment
% Number of observations
numObs = N*4+2; % I think this comes from the sum of all the states (4 per agent) plus the inputs (Fx and Fy)
% Number of actions
numAct = 2; %inputs
% I/O specifications for each agent
oinfo = rlNumericSpec([numObs,1]);
ainfo = rlNumericSpec([numAct,1], ...
UpperLimit= maxU, ...
LowerLimit= -maxU);
oinfo.Name = "observations";
ainfo.Name = "forces";
blks = ["rlCollaborativeTask_esposito/Agent A"];
obsInfos = oinfo;
actInfos = ainfo;
env = rlSimulinkEnv(mdl,blks,obsInfos,actInfos);
The reset function resetRobots calls rlCollabroativeTaskParams_Esposito which ensures that the robots start from random initial positions at the beginning of each episode. Inside of this is a plotting function too
env.ResetFcn = @(in) resetRobots_Esposito(in,R,boundaryR, N);
Agents
This example uses a Proximal Policy Optimization (PPO) agents with continuous action spaces. The agents apply external forces on the robot that result in motion. To learn more about PPO agents, see Proximal Policy Optimization Agents.
The agents collect experiences until the experience horizon is reached. After trajectory completion, the agents learn from mini-batches of experiences. An objective function clip factor of 0.2 is used to improve training stability and a discount factor of 0.99 is used to encourage long-term rewards.
Specify the agent options for this example.
agentOptions = rlPPOAgentOptions(...
ExperienceHorizon=600,...
ClipFactor=0.2,...
EntropyLossWeight=0.01,...
MiniBatchSize=300,...
NumEpoch=4,...
AdvantageEstimateMethod="gae",...
GAEFactor=0.95,...
SampleTime=Ts,...
DiscountFactor=0.99);
Set the learning rate for the actor and critic.
agentOptions.ActorOptimizerOptions.LearnRate = .00001;
agentOptions.CriticOptimizerOptions.LearnRate = .00001;
actor.ActionInfo.LowerLimit = -maxU;
actor.ActionInfo.UpperLimit = maxU;
Create the agents using the default agent creation syntax. For more information see rlPPOAgent.
agentA = rlPPOAgent(oinfo, ainfo, ...
rlAgentInitializationOptions(NumHiddenUnit= 20), agentOptions);
Training
With only one agent I modified this significantly from the example
For more information on multi-agent training, type help rlMultiAgentTrainingOptions that example includes options like this
MATLAB.opts = rlMultiAgentTrainingOptions(AgentGroups= {[1,2], 4, [3,5]}, LearningStrategy= ["centralized","decentralized","centralized"])
But with only one agent you use rlTrainingOptions (NOTE I kept getting an error about parrelelization options.)
trainOpts = rlTrainingOptions(...
MaxEpisodes=5000,...
MaxStepsPerEpisode=100,...
ScoreAveragingWindowLength=30,...
StopTrainingCriteria="AverageReward",...
StopTrainingValue=200);
Train the agents using the train function. Training can take several hours to complete depending on the available computational power. To save time, load the MAT-file which contains a pretrained agents. To train the agents yourself, set doTraining to true.
doTraining = true;
if doTraining
centralizedTrainResults = train([agentA],env,trainOpts);
else
load("TrainedRogueToMatchHeading.mat");
end
Antworten (0)
Kategorien
Mehr zu Training and Simulation finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!