Optimize Loops in Generated Code

Transform loops in the generated code to suit your execution speed and memory requirements. Loop control objects instruct the code generator to optimize loops in the generated code.

You can append these transforms to a loop control object to optimize the generated loops:

interchange: Interchanging nested loops can improve cache performance when accessing array elements.
Usually, accessing an array element involves storing an entire block of data from memory to cache. Interchanging loops can help improve execution speed since the subsequent array elements stored in cache are readily available to the processor.
parallelize: Parallelized loop execution might improve execution speed by utilizing available threads.
Each available thread is assigned to sequentially access a data structure and operate on its indices one-by-one. Use this optimization when your loop sequentially accesses array elements and operations are independent of other array elements.
reverse: Reverse loop iteration order.
Use this transform when you know the upper bound of the loop iterator.
tile: Tiling loop nests can reduce memory access latency.
Tiling partitions the iteration space of a loop into smaller blocks which helps data remain in cache until it is reused. This involves partitioning a large array from memory into smaller blocks that fit into your cache size. Use this transform when you have limited cache availability.
unrollAndJam: Unroll and jam loops can improve cache locality.
Unroll and jam transforms are usually applied to perfectly nested loops, or where all the data elements are accessed within the inner loop. This transform unrolls the body of the inner loop according to the loop index of the outer loop.
vectorize: Generate code for loops that use SIMD instructions to apply multiple operations simultaneously.

Instruct the code generator to optimize loops in the generated code by:

Using objects of coder.loop.Control
Calling its member functions

Optimize Loops By Using Objects of `coder.loop.Control`

Create objects of coder.loop.Control in your MATLAB^® code and append the required transformations to the object. For example, to apply the vectorize transform by using a coder.loop.Control object, follow this pattern:

function out = applyVectorize
out = zeros(1,100);

loopObj = coder.loop.Control;
loopObj = loopObj.vectorize('loopId');
loopObj.apply;

for loopId = 1:100
    out = out + loopId;
end

To generate code for this function for an Intel^® target processor, use these commands:

cfg = coder.config('lib');
cfg.InstructionSetExtensions = "SSE2";
codegen -config cfg applyVectorize -launchreport

The generated code uses the SSE2 SIMD instruction set.

void applyVectorize(double out[100])
{
  int i;
  int loopId;
  memset(&out[0], 0, 100U * sizeof(double));
  for (loopId = 0; loopId < 100; loopId++) {
    for (i = 0; i <= 98; i += 2) {
      __m128d r;
      r = _mm_loadu_pd(&out[i]);
      _mm_storeu_pd(&out[i], _mm_add_pd(r, _mm_set1_pd((double)loopId + 1.0)));
    }
  }
}

You can append multiple transforms to the same loop control object. Call the apply method before defining the loops in your code. For example, you can add a parallelize transform to the loop if a variable inputVal is greater than some threshold value.

...
loopObj = coder.loop.Control;
loopObj = loopObj.parallelize('i');
if inputVal > threshold
    loopObj = loopObj.vectorize('inputVal');
end
...
loopObj.apply;
for i = 1:10    
    for inputVal = 1:10
        ...
    end
end...

Optimize Loops By Calling Member Functions Independently

You can apply loop transformations by calling the loop optimization functions immediately before the loop itself. You can apply these functions to the loops in your code:

Follow the pattern shown here:

function out = applyInterchange
out = rand(10,7);

coder.loop.interchange('loopA','loopB');
for loopA = 1:10
    for loopB = 1:7
        out(loopA,loopB) = out(loopA,loopB) + loopA;
    end
end

Alternatively, if you want to declare the loop transform prior to the loop, you can store the loop object returned by the coder.loop.interchange function call. However, you must call the apply method for the returned object before defining the loop.

function out = applyInterchange
out = rand(10,7);

loopObj = coder.loop.interchange('loopA','loopB');
...

loopObj.apply;
for loopA = 1:10
    for loopB = 1:7
        out(loopA,loopB) = out(loopA,loopB) + loopA;
    end
end

Generate code for these functions by running this command:

codegen -config:lib applyInterchange -launchreport

The generated code is shown here:

void applyInterchange(double out[70])
{
  int loopA;
  int loopB;
  if (!isInitialized_applyInterchange) {
    applyInterchange_initialize();
  }
  b_rand(out);
  for (loopB = 0; loopB < 7; loopB++) {
    for (loopA = 0; loopA < 10; loopA++) {
      int out_tmp;
      out_tmp = loopA + 10 * loopB;
      out[out_tmp] += (double)loopA + 1.0;
    }
  }
}