gelu

Apply Gaussian error linear unit (GELU) activation

Since R2022b

Syntax

Y = gelu(X)

Y = gelu(X,Approximation=method)

Description

The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.

This operation is given by

$GELU (x) = \frac{x}{2} (1 + erf (\frac{x}{\sqrt{2}})),$

where erf denotes the error function.

Note

This function applies the GELU operation to dlarray data. If you want to apply the GELU activation within a dlnetwork object, use geluLayer.

Y = gelu(X) applies the GELU activation to the input data X.

example

Y = gelu(X,Approximation=method) also specifies the approximation method for the GELU operation. For example, Approximation="tanh" specifies the tanh approximation of the underlying error function.

Examples

collapse all

Apply GELU Operation

Open Live Script

Create a formatted dlarray object containing a batch of 128 28-by-28 images with three channels. Specify the format "SSCB" (spatial, spatial, channel, batch).

miniBatchSize = 128;
inputSize = [28 28];
numChannels = 3;
X = rand(inputSize(1),inputSize(2),numChannels,miniBatchSize);
X = dlarray(X,"SSCB");

View the size and format of the input data.

size(X)

ans = 1×4

    28    28     3   128

dims(X)

ans = 
'SSCB'

Apply the GELU activation.

Y = gelu(X);

View the size and format of the output.

size(Y)

ans = 1×4

    28    28     3   128

dims(Y)

ans = 
'SSCB'

Input Arguments

collapse all

`X` — Input data
`dlarray` object

Input data, specified as a formatted or unformatted dlarray object.

`method` — Approximation method
`"none"` (default) | `"tanh"`

Approximation method, specified as one of these values:

"none" — Do not use approximation.
"tanh" — Approximate the underlying error function using

$erf (\frac{x}{\sqrt{2}}) \approx tanh (\sqrt{\frac{2}{π}} (x + 0.044715 x^{3})) .$

Tip

In MATLAB^®, computing the tanh approximation is typically less accurate, and, for large input sizes, slower than computing the GELU activation without using an approximation. Use the tanh approximation when you want to reproduce models that use this approximation, such as BERT and GPT-2.

Output Arguments

collapse all

`Y` — GELU activations
`dlarray` object

GELU activations, returned as a dlarray object. The output Y has the same underlying data type as the input X.

If the input data X is a formatted dlarray object, then Y has the same dimension format as X. If the input data is not a formatted dlarray object, then Y is an unformatted dlarray object with the same dimension order as the input data.

Algorithms

collapse all

Gaussian Error Linear Unit Activation

The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.

This operation is given by

$GELU (x) = \frac{x}{2} (1 + erf (\frac{x}{\sqrt{2}})),$

where erf denotes the error function given by

$erf (x) = \frac{2}{\sqrt{π}} \int_{0}^{x} e^{- t^{2}} d t .$

When the Approximation option is "tanh", the software approximates the error function using

$erf (\frac{x}{\sqrt{2}}) \approx tanh (\sqrt{\frac{2}{π}} (x + 0.044715 x^{3})) .$

References

[1] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (GELUs)." Preprint, submitted June 27, 2016. https://arxiv.org/abs/1606.08415

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The gelu function supports GPU array input with these usage notes and limitations:

When the input argument X is a dlarray with underlying data of type gpuArray, this function runs on the GPU.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2022b

gelu

Syntax

Description

Examples

Apply GELU Operation

Input Arguments

`X` — Input data
`dlarray` object

`method` — Approximation method
`"none"` (default) | `"tanh"`

Output Arguments

`Y` — GELU activations
`dlarray` object

Algorithms

Gaussian Error Linear Unit Activation

References

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

gelu

Syntax

Description

Examples

Apply GELU Operation

Input Arguments

X — Input data dlarray object

method — Approximation method "none" (default) | "tanh"

Output Arguments

Y — GELU activations dlarray object

Algorithms

Gaussian Error Linear Unit Activation

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

`X` — Input data
`dlarray` object

`method` — Approximation method
`"none"` (default) | `"tanh"`

`Y` — GELU activations
`dlarray` object

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.