Main Content

gelu

Apply Gaussian error linear unit (GELU) activation

    Description

    The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.

    This operation is given by

    GELU(x)=x2(1+erf(x2)),

    where erf denotes the error function.

    Note

    This function applies the GELU operation to dlarray data. If you want to apply the GELU activation within a layerGraph object or Layer array, use the following layer:

    example

    Y = gelu(X) applies the GELU activation to the input data X.

    Y = gelu(X,Approximation=method) also specifies the approximation method for the GELU operation. For example, Approximation="tanh" specifies the tanh approximation of the underlying error function.

    Examples

    collapse all

    Create a formatted dlarray object containing a batch of 128 28-by-28 images with three channels. Specify the format "SSCB" (spatial, spatial, channel, batch).

    miniBatchSize = 128;
    inputSize = [28 28];
    numChannels = 3;
    X = rand(inputSize(1),inputSize(2),numChannels,miniBatchSize);
    X = dlarray(X,"SSCB");

    View the size and format of the input data.

    size(X)
    ans = 1×4
    
        28    28     3   128
    
    
    dims(X)
    ans = 
    'SSCB'
    

    Apply the GELU activation.

    Y = gelu(X);

    View the size and format of the output.

    size(Y)
    ans = 1×4
    
        28    28     3   128
    
    
    dims(Y)
    ans = 
    'SSCB'
    

    Input Arguments

    collapse all

    Input data, specified as a formatted or unformatted dlarray object.

    Approximation method, specified as one of these values:

    • "none" — Do not use approximation.

    • "tanh" — Approximate the underlying error function using

      erf(x2)tanh(2π(x+0.044715x3)).

    Tip

    In MATLAB®, computing the tanh approximation is typically less accurate, and, for large input sizes, slower than computing the GELU activation without using an approximation. Use the tanh approximation when you want to reproduce models that use this approximation, such as BERT and GPT-2.

    Output Arguments

    collapse all

    GELU activations, returned as a dlarray object. The output Y has the same underlying data type as the input X.

    If the input data X is a formatted dlarray object, then Y has the same dimension format as X. If the input data is not a formatted dlarray object, then Y is an unformatted dlarray object with the same dimension order as the input data.

    Algorithms

    collapse all

    Gaussian Error Linear Unit Activation

    The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.

    This operation is given by

    GELU(x)=x2(1+erf(x2)),

    where erf denotes the error function given by

    erf(x)=2π0xet2dt.

    When the Approximation option is "tanh", the software approximates the error function using

    erf(x2)tanh(2π(x+0.044715x3)).

    References

    [1] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (GELUs)." Preprint, submitted June 27, 2016. https://arxiv.org/abs/1606.08415

    Extended Capabilities

    Version History

    Introduced in R2022b