Main Content

dlaccelerate

Accelerate deep learning function for custom training loops

Since R2021a

    Description

    Use dlaccelerate to speed up deep learning function evaluation for custom training loops.

    The returned AcceleratedFunction object caches the traces of calls to the underlying function and reuses the cached result when the same input pattern reoccurs.

    Try using dlaccelerate for function calls that:

    • are long-running

    • have dlarray objects, structures of dlarray objects, or dlnetwork objects as inputs

    • do not have side effects like writing to files or displaying output

    Invoke the accelerated function as you would invoke the underlying function. Note that the accelerated function is not a function handle.

    Note

    When using the dlfeval function, the software automatically accelerates the forward and predict functions for dlnetwork input. If you accelerate a deep learning function where the majority of the computation takes place in calls to the forward or predict functions for dlnetwork input, then you might not see an improvement in training time.

    For more information, see Deep Learning Function Acceleration for Custom Training Loops.

    example

    accfun = dlaccelerate(fun) creates an AcceleratedFunction object that retains the underlying traces of the specified function handle fun.

    Caution

    An AcceleratedFunction object is not aware of updates to the underlying function. If you modify the function associated with the accelerated function, then clear the cache using the clearCache object function or alternatively use the command clear functions.

    Examples

    collapse all

    Load the dlnetwork object and class names from the MAT file dlnetDigits.mat.

    s = load("dlnetDigits.mat");
    net = s.net;
    classNames = s.classNames;

    Accelerate the model loss function modelLoss listed at the end of the example.

    fun = @modelLoss;
    accfun = dlaccelerate(fun);

    Clear any previously cached traces of the accelerated function using the clearCache function.

    clearCache(accfun)

    View the properties of the accelerated function. Because the cache is empty, the Occupancy property is 0.

    accfun
    accfun = 
      AcceleratedFunction with properties:
    
              Function: @modelLoss
               Enabled: 1
             CacheSize: 50
               HitRate: 0
             Occupancy: 0
             CheckMode: 'none'
        CheckTolerance: 1.0000e-04
    
    

    The returned AcceleratedFunction object stores the traces of underlying function calls and reuses the cached result when the same input pattern reoccurs. To use the accelerated function in a custom training loop, replace calls to the model gradients function with calls to the accelerated function. You can invoke the accelerated function as you would invoke the underlying function. Note that the accelerated function is not a function handle.

    Evaluate the accelerated model gradients function with random data using the dlfeval function.

    X = rand(28,28,1,128,"single");
    X = dlarray(X,"SSCB");
    
    T = categorical(classNames(randi(10,[128 1])));
    T = onehotencode(T,2)';
    T = dlarray(T,"CB");
    
    [loss,gradients,state] = dlfeval(accfun,net,X,T);

    View the Occupancy property of the accelerated function. Because the function has been evaluated, the cache is nonempty.

    accfun.Occupancy
    ans = 2
    

    Model Loss Function

    The modelLoss function takes a dlnetwork object net, a mini-batch of input data X with corresponding target labels T and returns the loss, the gradients of the loss with respect to the learnable parameters in net, and the network state. To compute the gradients, use the dlgradient function.

    function [loss,gradients,state] = modelLoss(net,X,T)
    
    [Y,state] = forward(net,X);
    loss = crossentropy(Y,T);
    gradients = dlgradient(loss,net.Learnables);
    
    end

    Load the dlnetwork object and class names from the MAT file dlnetDigits.mat.

    s = load("dlnetDigits.mat");
    net = s.net;
    classNames = s.classNames;

    Accelerate the model loss function modelLoss listed at the end of the example.

    fun = @modelLoss;
    accfun = dlaccelerate(fun);

    Clear any previously cached traces of the accelerated function using the clearCache function.

    clearCache(accfun)

    View the properties of the accelerated function. Because the cache is empty, the Occupancy property is 0.

    accfun
    accfun = 
      AcceleratedFunction with properties:
    
              Function: @modelLoss
               Enabled: 1
             CacheSize: 50
               HitRate: 0
             Occupancy: 0
             CheckMode: 'none'
        CheckTolerance: 1.0000e-04
    
    

    The returned AcceleratedFunction object stores the traces of underlying function calls and reuses the cached result when the same input pattern reoccurs. To use the accelerated function in a custom training loop, replace calls to the model gradients function with calls to the accelerated function. You can invoke the accelerated function as you would invoke the underlying function. Note that the accelerated function is not a function handle.

    Evaluate the accelerated model gradients function with random data using the dlfeval function.

    X = rand(28,28,1,128,"single");
    X = dlarray(X,"SSCB");
    
    T = categorical(classNames(randi(10,[128 1])));
    T = onehotencode(T,2)';
    T = dlarray(T,"CB");
    
    [loss,gradients,state] = dlfeval(accfun,net,X,T);

    View the Occupancy property of the accelerated function. Because the function has been evaluated, the cache is nonempty.

    accfun.Occupancy
    ans = 2
    

    Clear the cache using the clearCache function.

    clearCache(accfun)

    View the Occupancy property of the accelerated function. Because the cache has been cleared, the cache is empty.

    accfun.Occupancy
    ans = 0
    

    Model Loss Function

    The modelLoss function takes a dlnetwork object net, a mini-batch of input data X with corresponding target labels T and returns the loss, the gradients of the loss with respect to the learnable parameters in net, and the network state. To compute the gradients, use the dlgradient function.

    function [loss,gradients,state] = modelLoss(net,X,T)
    
    [Y,state] = forward(net,X);
    loss = crossentropy(Y,T);
    gradients = dlgradient(loss,net.Learnables);
    
    end

    This example shows how to check that the outputs of accelerated functions match the outputs of the underlying function.

    In some cases, the outputs of accelerated functions differ to the outputs of the underlying function. For example, you must take care when accelerating functions that use random number generation, such as a function that generates random noise to add to the network input. When caching the trace of a function that generates random numbers that are not dlarray objects, the accelerated function caches resulting random numbers in the trace. When reusing the trace, the accelerated function uses the cached random values. The accelerated function does not generate new random values.

    To check that the outputs of the accelerated function match the outputs of the underlying function, use the CheckMode property of the accelerated function. When the CheckMode property of the accelerated function is 'tolerance' and the outputs differ by more than a specified tolerance, the accelerated function throws a warning.

    Accelerate the function myUnsupportedFun, listed at the end of the example using the dlaccelerate function. The function myUnsupportedFun generates random noise and adds it to the input. This function does not support acceleration because the function generates random numbers that are not dlarray objects.

    accfun = dlaccelerate(@myUnsupportedFun)
    accfun = 
      AcceleratedFunction with properties:
    
              Function: @myUnsupportedFun
               Enabled: 1
             CacheSize: 50
               HitRate: 0
             Occupancy: 0
             CheckMode: 'none'
        CheckTolerance: 1.0000e-04
    
    

    Clear any previously cached traces using the clearCache function.

    clearCache(accfun)

    To check that the outputs of reused cached traces match the outputs of the underlying function, set the CheckMode property to 'tolerance'.

    accfun.CheckMode = 'tolerance'
    accfun = 
      AcceleratedFunction with properties:
    
              Function: @myUnsupportedFun
               Enabled: 1
             CacheSize: 50
               HitRate: 0
             Occupancy: 0
             CheckMode: 'tolerance'
        CheckTolerance: 1.0000e-04
    
    

    Evaluate the accelerated function with an array of ones as input, specified as a dlarray input.

    dlX = dlarray(ones(3,3));
    dlY = accfun(dlX)
    dlY = 
      3×3 dlarray
    
        1.8147    1.9134    1.2785
        1.9058    1.6324    1.5469
        1.1270    1.0975    1.9575
    
    

    Evaluate the accelerated function again with the same input. Because the accelerated function reuses the cached random noise values instead of generating new random values, the outputs of the reused trace differs from the outputs of the underlying function. When the CheckMode property of the accelerated function is 'tolerance' and the outputs differ, the accelerated function throws a warning.

    dlY = accfun(dlX)
    Warning: Accelerated outputs differ from underlying function outputs.
    
    dlY = 
      3×3 dlarray
    
        1.8147    1.9134    1.2785
        1.9058    1.6324    1.5469
        1.1270    1.0975    1.9575
    
    

    Random number generation using the 'like' option of the rand function with a dlarray object supports acceleration. To use random number generation in an accelerated function, ensure that the function uses the rand function with the 'like' option set to a traced dlarray object (a dlarray object that depends on an input dlarray object).

    Accelerate the function mySupportedFun, listed at the end of the example. The function mySupportedFun adds noise to the input by generating noise using the 'like' option with a traced dlarray object.

    accfun2 = dlaccelerate(@mySupportedFun);

    Clear any previously cached traces using the clearCache function.

    clearCache(accfun2)

    To check that the outputs of reused cached traces match the outputs of the underlying function, set the CheckMode property to 'tolerance'.

    accfun2.CheckMode = 'tolerance';

    Evaluate the accelerated function twice with the same input as before. Because the outputs of the reused cache match the outputs of the underlying function, the accelerated function does not throw a warning.

    dlY = accfun2(dlX)
    dlY = 
      3×3 dlarray
    
        1.7922    1.0357    1.6787
        1.9595    1.8491    1.7577
        1.6557    1.9340    1.7431
    
    
    dlY = accfun2(dlX)
    dlY = 
      3×3 dlarray
    
        1.3922    1.7060    1.0462
        1.6555    1.0318    1.0971
        1.1712    1.2769    1.8235
    
    

    Checking the outputs match requires extra processing and increases the time required for function evaluation. After checking the outputs, set the CheckMode property to 'none'.

    accfun1.CheckMode = 'none';
    accfun2.CheckMode = 'none';

    Example Functions

    The function myUnsupportedFun generates random noise and adds it to the input. This function does not support acceleration because the function generates random numbers that are not dlarray objects.

    function out = myUnsupportedFun(dlX)
    
    sz = size(dlX);
    noise = rand(sz);
    out = dlX + noise;
    
    end

    The function mySupportedFun adds noise to the input by generating noise using the 'like' option with a traced dlarray object.

    function out = mySupportedFun(dlX)
    
    sz = size(dlX);
    noise = rand(sz,'like',dlX);
    out = dlX + noise;
    
    end

    Input Arguments

    collapse all

    Deep learning function to accelerate, specified as a function handle.

    To learn more about developing deep learning functions for acceleration, see Deep Learning Function Acceleration for Custom Training Loops.

    Example: @modelLoss

    Data Types: function_handle

    Output Arguments

    collapse all

    Accelerated deep learning function, returned as an AcceleratedFunction object.

    More About

    collapse all

    Acceleration Considerations

    Because of the nature of caching traces, not all functions support acceleration.

    The caching process can cache values that you might expect to change or that depend on external factors. You must take care when you accelerate functions that:

    • have inputs with random or frequently changing values

    • have outputs with frequently changing values

    • generate random numbers

    • use if statements and while loops with conditions that depend on the values of dlarray objects

    • have inputs that are handles or that depend on handles

    • Read data from external sources (for example, by using a datastore or a minibatchqueue object)

    Because the caching process requires extra computation, acceleration can lead to longer running code in some cases. This scenario can happen when the software spends time creating new caches that do not get reused often. For example, when you pass multiple mini-batches of different sequence lengths to the function, the software triggers a new trace for each unique sequence length.

    Accelerated functions can do the following when calculating a new trace only.

    • modify the global state such as, the random number stream or global variables

    • use file input or output

    • display data using graphics or the command line display

    When using accelerated functions in parallel, such as when using a parfor loop, then each worker maintains its own cache. The cache is not transferred to the host.

    Functions and custom layers used in accelerated functions must also support acceleration.

    For more information, see Deep Learning Function Acceleration for Custom Training Loops.

    dlode45 Does Not Support Acceleration When GradientMode is "direct"

    The dlaccelerate function does not support accelerating the dlode45 function when the GradientMode option is "direct". To accelerate the code that calls the dlode45 function, set the GradientMode option to "adjoint" or accelerate parts of your code that do not call the dlode45 function with the GradientMode option set to "direct".

    Version History

    Introduced in R2021a