Why does layerNormalizationLayer in Deep Learning Toolbox include T dimension into the batch?

1 Ansicht (letzte 30 Tage)
Hello,
While implementing a ViT transformer in Matlab, I found at that the layerNormalizationLayer does include the T dimension in the statistics calculated for each sample in the batch. This is problematics when implementing a transformer, since tokens correspond to the T dimension and reference implementations calculate the statistics separately for each token.
Thx

Akzeptierte Antwort

John Smith
John Smith am 24 Mär. 2023
It seems Mathworks have listened and changed the behavior of layerNormalizationLayer in R2023a.:
Starting in R2023a, by default, the layer normalizes sequence data over the channel and spatial dimensions. In previous versions, the software normalizes over all dimensions except for the batch dimension (the spatial, time, and channel dimensions). Normalization over the channel and spatial dimensions is usually better suited for this type of data. To reproduce the previous behavior, set OperationDimension to "batch-excluded".

Weitere Antworten (1)

Matt J
Matt J am 13 Mär. 2023
Perhaps you can fold your T dimension into the C dimension and use a groupNormalizationLayer instead, with the groups defined so that different T belong to different groups.
  7 Kommentare
John Smith
John Smith am 15 Mär. 2023
Perhaps lamenting would cause someone from Mathworks to take notice and add the capability to the code base. Sigh ...
Matt J
Matt J am 15 Mär. 2023
That happens sometimes, but usually you have to submit a formal enhancement request.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Image Data Workflows finden Sie in Help Center und File Exchange

Produkte


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by