Clarification on NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency in rlAgentDDPGOptions

Question

Fabián am 13 Sep. 2025

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2179939-clarification-on-numepoch-maxminibatchperepoch-and-learningfrequency-in-rlagentddpgoptions

Kommentiert: Umar am 14 Sep. 2025

Hello,

I'm currently working with the rlAgentDDPGOptions and while I have a solid understanding of most of the configuration parameters, I am having some trouble understanding three specific options: NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency.

My main confusion revolves around how and when the networks are updated, considering the agent's sampling times. Specifically:

NumEpoch: How does this parameter relate to the overall training process, and how does it affect network updates?
MaxMiniBatchPerEpoch: How does this limit the number of mini-batches processed during an epoch, and how does it interact with the sampling process?
LearningFrequency: How does this parameter influence the frequency of updates relative to the agent’s sampling rate?

Any clarification on these points would be greatly appreciated!

Thank you in advance for your help!

Best regards,

Fabián.

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Umar am 14 Sep. 2025

Hi @Fabian,

Thank you for your follow-up and for clarifying your definition of a training step as one environment sampling step.

You're absolutely right that `LearningFrequency` controls how many such environment steps must occur before a learning step is executed — for instance, with `LearningFrequency = 4`, the agent performs one learning step every 4 environment interactions.

Your Main Question:

“Do `NumEpoch` and `MaxMiniBatchPerEpoch` relate to what happens during a learning step or a training step?”

Both `NumEpoch` and `MaxMiniBatchPerEpoch` apply during a learning step, not during individual environment sampling steps (which you're referring to as training steps).

`NumEpoch` specifies how many passes are made over the sampled experience data during a single learning step. `MaxMiniBatchPerEpoch` determines how many mini-batches are processed per epoch during that learning step.

In other words:

Every time a learning step is triggered (based on `LearningFrequency`),
The agent may perform multiple gradient updates, controlled by these two parameters.

Simulation Script + Plot

To help visualize this, please see MATLAB script below.

%% DDPG Learning Schedule Simulation + Plotting
% Author: Umar
% Date: 09-13-25

clc; clear;

% ==== Parameters ====
LearningFrequency = 4;
NumEpoch = 3;
MaxMiniBatchPerEpoch = 2;
MiniBatchSize = 32;
TotalSteps = 20;

% Simulated Replay Buffer
ReplayBuffer = 1:500;

% Tracking metrics
learningSteps = [];          % Steps where learning happened
updateCounts = [];           % Count of updates per learning step
totalUpdateCount = 0;        % Total number of network updates

fprintf('--- DDPG Learning Schedule Simulation ---\n\n');

for step = 1:TotalSteps
  fprintf('Environment Step %d\n', step);

    % Trigger learning
    if mod(step, LearningFrequency) == 0
        fprintf('  > Learning Triggered (Step %d)\n', step);
        learningSteps(end+1) = step;

        % Simulate sampling a batch from the replay buffer
        batchSize = 256;
        largeBatch = datasample(ReplayBuffer, batchSize, 'Replace', false);

        updatePerStep = 0;

        % Epoch loop
        for epoch = 1:NumEpoch
            fprintf('    Epoch %d/%d\n', epoch, NumEpoch);

            for mb = 1:MaxMiniBatchPerEpoch
                miniBatch = datasample(largeBatch, MiniBatchSize, 'Replace', 
                false);
                fprintf('      Updating with MiniBatch %d/%d (Size: %d)\n', mb, 
                MaxMiniBatchPerEpoch, MiniBatchSize);
                updatePerStep = updatePerStep + 1;
                totalUpdateCount = totalUpdateCount + 1;
            end
        end

        updateCounts(end+1) = updatePerStep;
    end
  end

fprintf('\n--- Simulation Complete ---\n');
fprintf('Total Environment Steps: %d\n', TotalSteps);
fprintf('Total Learning Steps: %d\n', numel(learningSteps));
fprintf('Total Network Updates: %d\n', totalUpdateCount);
fprintf('Updates per Learning Step: %s\n', mat2str(updateCounts));

%% ==== Plotting ====

figure;
bar(learningSteps, updateCounts, 0.5, 'FaceColor', [0.2 0.6 0.8]);
xlabel('Environment Step');
ylabel('Network Updates');
title('DDPG Learning Triggers and Network Updates');
grid on;
xticks(learningSteps);
ylim([0 max(updateCounts)+1]);

text(learningSteps, updateCounts + 0.2, ...
   compose('%d updates', updateCounts), ...
   'HorizontalAlignment', 'center', 'FontSize', 9);

This script simulates:

20 environment sampling steps
Learning triggered every 4 steps (`LearningFrequency = 4`)
Each learning step performs 3 epochs, with 2 mini-batches per epoch

Printed Output (Excerpt)

Total Environment Steps: 20 Total Learning Steps: 5 Total Network Updates: 30 Updates per Learning Step: [6 6 6 6 6]

Plot:

The bar chart shows:

Which steps triggered learning (steps 4, 8, 12, 16, 20)
That each learning step performed 6 network updates (`3 epochs × 2 mini-batches`)

This directly illustrates that:

`LearningFrequency` regulates when learning occurs (based on environment steps)
`NumEpoch` and `MaxMiniBatchPerEpoch` regulate how much learning happens within each learning step

Please feel free to experiment with the script by adjusting any of the parameters to match your own setup.

Let me know if you'd like me to clarify anything further.

Fabián am 14 Sep. 2025

Hi @Umar,

Thank you so much for your response. With your clarifications, I fully understand the implementation of the DDPG algorithm in MATLAB. You deserve a cold beer.

Best regards!

Fabián.

Umar am 14 Sep. 2025

Hi @Fabian, Haha, I’ll happily take that cold beer — thanks! 😄 Glad to hear the explanation helped clarify things. Feel free to reach out anytime if more questions come up.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Clarification on NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency in rlAgentDDPGOptions

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Clarification on NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency in rlAgentDDPGOptions

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden