How to input action in reinforcement learning template environment?

Question

Yang Chen am 7 Mär. 2023

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1924530-how-to-input-action-in-reinforcement-learning-template-environment

Kommentiert: Emmanouil Tzorakoleftherakis am 9 Mär. 2023

Akzeptierte Antwort: Emmanouil Tzorakoleftherakis

I have modified the template environment to adapt my scenarios. My current action cosists of two vectors. The Action configuration is like the following.

function this = EdgeEnvironment()

% Initialize Observation settings

ObservationInfo(1) = rlNumericSpec([1 10]);

ObservationInfo(1).Name = 'schedule';

ObservationInfo(1).Description = 'schedule';

ObservationInfo(2) = rlNumericSpec([1 20]);

ObservationInfo(2).Name = 'ppath';

ObservationInfo(2).Description = 'ppath';

ObservationInfo(3) = rlNumericSpec([1 1]);

ObservationInfo(3).Name = 'completionTime';

ObservationInfo(3).Description = 'completionTime';

ObservationInfo(4) = rlNumericSpec([1 1]);

ObservationInfo(4).Name = 'computeDuring';

ObservationInfo(4).Description = 'computeDuring';

% Initialize Action settings

ActionInfo(1) = rlNumericSpec([1 10]);

ActionInfo(1).Name = 'schedule';

ActionInfo(2) = rlNumericSpec([1 20]);

ActionInfo(2).Name = 'ppath';

% The following line implements built-in functions of RL env

this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);

end

The step function was designed like the following.

function [Observation,Reward,IsDone,LoggedSignals] = step(this, Action)

LoggedSignals = [];

% distance

node_distance = zeros(this.device_count, this.device_count);

distance = getDistance(this, node_distance);

% parameter list

parameter_list = getstruct(this, distance);

% the parameter list of device

device_list = get_device_list(this);

% Extract action

[schedule_act, ppath_act]=get_act(Action);

% schedule_act = Action{1,1};

% ppath_act = Action{1,2};

% Unpack state vector

last_schedule = schedule_act;

last_ppath = ppath_act;

last_completionTime = this.State{1,3};

last_computeDuring = this.State{1,4};

% Update system states

[schedule, stay_node_list, completionTime] = ComScheduling(last_completionTime,...

last_schedule, last_ppath, device_list, parameter_list);

[ppath, stay_node_list, completionTime, computeDuring] = PathPlanning(last_completionTime,...

last_ppath, schedule, stay_node_list, device_list, parameter_list);

prob = 1 / (1 + exp((completionTime - last_completionTime)/parameter_list.omega));

dice = rand(1);

if dice <= prob

last_ppath = ppath;

last_schedule = schedule;

last_stay_node_list = stay_node_list;

last_completionTime = completionTime;

last_computeDuring = computeDuring;

completionTime_iter(end + 1) = completionTime;

else

completionTimer_iter(end + 1) = last_computeDuring;

end

ppath = last_ppath;

schedule = last_schedule;

stay_node_list = last_stay_node_list;

completionTime = last_completionTime;

computeDuring = last_computeDuring;

Observation = {schedule, ppath, completionTime, computeDuring};

this.State = Observation;

% Check terminal condition

completionTime = Observation(3);

computeDuring = Observation(4);

IsDone = completionTime < this.completionTime_threshold || computeDuring < this.computeDuring_threshold;

this.IsDone = IsDone;

% Get reward

Reward = -completionTime;

end

We caculate the action value by the following function.

function [schedule_act, ppath_act] = get_act(action)

schedule_act = action{1,1};

ppath_act = action{1,2};

end

When I run the validateEnvironment function, the error is like the following.

I want to know how to fix them.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Emmanouil Tzorakoleftherakis am 7 Mär. 2023

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1924530-how-to-input-action-in-reinforcement-learning-template-environment#answer_1187560

Easiest thing you can do is add a break point and display what "action" variable is. It's obviously not a cell array so you cannot access is with braces {} in the "get_act" function. That's why you are getting the error

8 Kommentare
6 ältere Kommentare anzeigen6 ältere Kommentare ausblenden

Yang Chen am 9 Mär. 2023

It is about the size of my discrete action space. For example, my action space is like {[1, 2, 3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}, which follows all random order of 1-3. When we increase the amount of number to 20, the amount of data size is over the system limitation.

Emmanouil Tzorakoleftherakis am 9 Mär. 2023

Thanks for clarifying. This is the curse of dimensionality, not much you can do about that other than using a continuous action space unfortunately.

Melden Sie sich an, um zu kommentieren.

How to input action in reinforcement learning template environment?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

8 Kommentare
6 ältere Kommentare anzeigen6 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

How to input action in reinforcement learning template environment?

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

8 Kommentare 6 ältere Kommentare anzeigen6 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

8 Kommentare
6 ältere Kommentare anzeigen6 ältere Kommentare ausblenden