Loop Rolling
One of the optimization features of the Target Language Compiler is the intrinsic support for loop rolling. Based on a specified threshold, code generation for looping operations can be unrolled or left as a loop (rolled).
Coupled with loop rolling is the concept of noncontiguous signals. Consider the following model:
The input to the timestwo
S-function comes from two arrays located at
two different memory locations, one for the output of source1
and one for
the output of block source2
. This is because of an optimization that makes
the Mux block virtual, meaning that code is not
explicitly generated for the Mux block and thus processor cycles are not spent
evaluating it (i.e., it becomes a pure graphical convenience for the block diagram). So this
is represented in the
file in this
case asmodel
.rtw
Block { Type "S-Function" MaskType "S-function: timestwo" BlockIdx [0, 0, 2] SL_BlockIdx 2 GrSrc [0, 1] ExprCommentInfo { SysIdxList [] BlkIdxList [] PortIdxList [] } ExprCommentSrcIdx { SysIdx -1 BlkIdx -1 PortIdx -1 } Name "<Root>/timestwo C-MEX S-Function" SLName "<Root>/timestwo \nC-MEX S-Function" Identifier timestwoCMEXSFunction TID 0 RollRegions [0:19, 20:49] NumDataInputPorts 1 DataInputPort { SignalSrc [b0@20, b1@30] SignalOffset [0:19, 0:29] Width 50 RollRegions [0:19, 20:49] } NumDataOutputPorts 1 DataOutputPort { SignalSrc [b2@50] SignalOffset [0:49] Width 50 } Connections { InputPortContiguous [no] InputPortConnected [yes] OutputPortConnected [yes] OutputPortBeingMerged [no] DirectSrcConn [no] DirectDstConn [yes] DataOutputPort { NumConnPoints 1 ConnPoint { SrcSignal [0, 50] DstBlockAndPortEl [0, 4, 0, 0] } } } . . .
From this fragment of the
file you
can see that the block and input port model
.rtwRollRegion
entries are not just one
number, but two groups of numbers. This denotes two groupings in memory for the input signal.
The generated code looks like this:
/* S-Function Block: <Root>/timestwo C-MEX S-Function */ /* Multiply input by two */ { int_T i1; const real_T *u0 = &contig_sample_B.u[0]; real_T *y0 = contig_sample_B.timestwoCMEXSFunction_m; for (i1=0; i1 < 20; i1++) { y0[i1] = u0[i1] * 2.0; } u0 = &contig_sample_B.u_o[0]; y0 = &contig_sample_B.timestwoCMEXSFunction_m[20]; for (i1=0; i1 < 30; i1++) { y0[i1] = u0[i1] * 2.0; } }
Notice that two loops are generated and between them the input signal is redirected from
the first base address, &contig_sample_B.u[0]
, to the second base
address of the signals, &contig_sample_B.u_o[0]
. If you do not want to
support this in your S-function or your generated code, you can use
ssSetInputPortRequiredContiguous(S, 1);
in the mdlInitializeSizes
function to cause Simulink® to implicitly generate code that performs a buffering operation. This option
uses both extra memory and CPU cycles at run-time, but might be worth it if your algorithm
performance increases enough to offset the overhead of the
buffering.
Use the %roll
directive to generate loops. See also %roll for the
reference entry for %roll
, and Input Signal Functions for a discussion on the behavior of
%roll
.