Main Content

Custom Layer Function Acceleration

If you do not specify a backward function when you define a custom layer, then the software automatically determines the gradients using automatic differentiation.

When you train a network with a custom layer without a backward function, the software traces each input dlarray object of the custom layer forward function to determine the computation graph used for automatic differentiation. This tracing process can take some time and can end up recomputing the same trace. By optimizing, caching, and reusing the traces, you can speed up gradient computation when training a network. The software can also reuse these traces to speed up network predictions after training.

The trace depends on the size, format, and underlying data type of the layer inputs. That is, the layer triggers a new trace for inputs with a size, format, or underlying data type not contained in the cache. Any inputs differing only by value to a previously cached trace do not trigger a new trace.

To indicate that the custom layer supports acceleration, also inherit from the nnet.layer.Acceleratable class when defining the custom layer. When a custom layer inherits from nnet.layer.Acceleratable, the software automatically caches traces when passing data through a dlnetwork object.

For example, to indicate that the custom layer myLayer supports acceleration, use this syntax

classdef myLayer < nnet.layer.Layer & nnet.layer.Acceleratable

Acceleration Considerations

Because of the nature of caching traces, not all functions support acceleration.

The caching process can cache values or code structures that you might expect to change or that depend on external factors. You must take care when accelerating custom layers that:

  • Generate random numbers.

  • Use if statements and while loops with conditions that depend on the values of dlarray objects.

Because the caching process requires extra computation, acceleration can lead to longer running code in some cases. This scenario can happen when the software spends time creating new caches that do not get reused often. For example, when you pass multiple mini-batches of different sequence lengths to the function, the software triggers a new trace for each unique sequence length.

When custom layer acceleration causes slowdown, you can disable acceleration by removing the Acceleratable mixin or by disabling acceleration of the dlnetwork object functions predict and forward by setting the Acceleration option to "none".

Functions with Random Number Generation

You must take care when accelerating functions that use random number generation, such as functions that generate random noise to add to the input. When the software caches the trace of a function that generates random numbers that are not dlarray objects, the software caches the resulting random samples in the trace. When reusing the trace, the accelerated function uses the cached random sample. The accelerated function does not generate new random values.

Random number generation using the "like" option of the rand function with a dlarray object supports acceleration. To use random number generation in an accelerated function, ensure that the function uses the rand function with the "like" option set to a traced dlarray object (a dlarray object that depends on an input dlarray object).

For example, consider the following layer predict function, which adds random noise to the input.

function Z = predict(layer,X)

sz = size(X);
noise = rand(sz);
Z = X + noise;


To ensure that the rand function generates a new value for each evaluation, use the "like" option with the traced dlarray object X.

function Z = predict(layer,X)

sz = size(X);
noise = rand(sz,"like",X);
Z = X + noise;


Functions with if Statements and while Loops

You must take care when accelerating functions that use if statements and while loops. In particular, you can get unexpected results when you accelerate functions with if statements or while loops that yield different code paths for function inputs of the same size and format.

Accelerating functions with if statement or while loop conditions that depend on the values of the function input or values from external sources (for example, results of random number generation) can lead to unexpected behavior. When the accelerated function caches a new trace, if the function contains an if statement or while loop, then the software caches the trace of the resulting code path given by the if statement or while loop condition for that particular trace. Because changes in the value of the dlarray input does not trigger a new trace, when reusing the trace with different values, the software uses the same cached trace (which contains the same cached code path) even when a difference in value should result in a different code path.

Usually, accelerating functions that contain if statements or while loops with conditions that do not depend on the values of the function input or external factors (for example, while loops that iterate over elements in an array) does not result in unexpected behavior. For example, because changes in the size of a dlarray input triggers a new trace, when reusing the trace with inputs of the same size, the cached code path for inputs of that size remain consistent, even when there are differences in values.

To avoid unexpected behavior from caching code paths of if statements, you can refactor your code so that it determines the correct result by combining the results of all branches and extracting the desired solution.

For example, consider this code.

if tf
  Y = funcA(X);
  Y = funcB(X);
To support acceleration, you can replace it with code of the following form.
Y = tf*funcA(X) + ~tf*funcB(X);
Alternatively, to avoid unnecessary multiply operations, you can also use this replacement.
Y = cat(3,funcA(X),funcB(X));
Y = Y(:,:,[tf ~tf]);
Note that these techniques can result in longer running code because they require executing the code used in both branches of the if statement.

dlode45 Does Not Support Acceleration When GradientMode Is "direct"

The software does not support accelerating the dlode45 function when the GradientMode option is "direct". The resulting layer output might return unexpected results. To accelerate the code that calls the dlode45 function, set the GradientMode option to "adjoint".

See Also


Related Topics