Main Content

Distributed Pipelining: Speed Optimization

This example shows how to use distributed pipelining to optimize a design for speed in HDL Coder™.

Introduction

Distributed pipelining is a subsystem-wide optimization supported by HDL Coder for achieving high clock speed hardware. By turning on distributed pipelining, HDL Coder redistributes the input pipeline registers, output pipeline register of the subsystem, and the registers in the subsystem to appropriate positions to minimize the combinatorial logic between registers and maximize the clock speed of the chip synthesized from the generated HDL code.

Consider the following example model of a symmetric FIR filter. The combinatorial logic from an input or a register to an output or another register contains a product block and an adder tree. Distributed pipelining moves the output registers set at the subsystem level to reduce the levels of the combinatorial logic.

bdclose all;
load_system('sfir_fixed');
open_system('sfir_fixed/symmetric_fir');

Set Output Pipeline Stage

To increase the clock speed, you can set a number of pipeline stages for any subsystem. Without turning on distributed pipelining, the specified number of registers are added to each of the output ports of the subsystem. Some synthesis tools support optimizations, such as retiming, that optimize the position of the registers during synthesis.

To see the effects of distributed pipelining in reducing the critical path and increasing the clock frequency, enable CriticalPathEstimation to estimate a critical path for your design. When you enable critical path estimation, HDL Coder uses a target-specific timing database to estimate the critical path. If you do not set a Synthesis Tool Chip Family and Synthesis Tool Speed Value to specify the specific target timing database, HDL Coder sets default values for both parameters for critical path estimation, and generates a warning when generating HDL code. To prevent the warning, specify the Synthesis Tool Chip Family and Synthesis Tool Speed Value. You can do this with or without a Synthesis Tool specified. For more information, see Critical Path Estimation Without Running Synthesis.

hdlset_param('sfir_fixed', 'CriticalPathEstimation', 'on');
hdlset_param('sfir_fixed', 'SynthesisToolChipFamily', 'virtex7', 'SynthesisToolSpeedValue', '-1')

In this example, the subsystem output pipeline register is set to 2.

The code generation model explicitly reflects the inserted register at output ports of the subsystem (highlighted in orange).

hdlset_param('sfir_fixed/symmetric_fir', 'OutputPipeline', 2);
makehdl('sfir_fixed/symmetric_fir');
open_system('gm_sfir_fixed/symmetric_fir');
set_param('gm_sfir_fixed', 'SimulationCommand', 'update');
### Generating HDL for 'sfir_fixed/symmetric_fir'.
### Using the config set for model <a href="matlab:configset.showParameterGroup('sfir_fixed', { 'HDL Code Generation' } )">sfir_fixed</a> for HDL code generation parameters.
### Running HDL checks on the model 'sfir_fixed'.
### Begin compilation of the model 'sfir_fixed'...
### Applying HDL optimizations on the model 'sfir_fixed'...
### The code generation and optimization options you have chosen have introduced additional pipeline delays.
### The delay balancing feature has automatically inserted matching delays for compensation.
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 1: 2 cycles.
### Output port 2: 2 cycles.
### Begin model generation.
### Model generation complete.
### Estimated critical path for design: <a href="matlab:run('hdlsrc\sfir_fixed\criticalPathEstimated')">hdlsrc\sfir_fixed\criticalPathEstimated.m</a>
### To clear highlighting, click the following MATLAB script: <a href="matlab:run('hdlsrc\sfir_fixed\clearhighlighting.m')">hdlsrc\sfir_fixed\clearhighlighting.m</a>
### Begin VHDL Code Generation for 'sfir_fixed'.
### Working on sfir_fixed/symmetric_fir as hdlsrc\sfir_fixed\symmetric_fir.vhd.
### Generating package file hdlsrc\sfir_fixed\symmetric_fir_pkg.vhd.
### Code Generation for 'sfir_fixed' completed.
### Generating HTML files for code generation report at <a href="matlab:web('C:\Users\clewis\OneDrive - MathWorks\Documents\MATLAB\Examples\hdlcoder-ex37495842\hdlsrc\sfir_fixed\html\sfir_fixed_codegen_rpt.html');">sfir_fixed_codegen_rpt.html</a>
### Creating HDL Code Generation Check Report file://C:\Users\clewis\OneDrive - MathWorks\Documents\MATLAB\Examples\hdlcoder-ex37495842\hdlsrc\sfir_fixed\symmetric_fir_report.html
### HDL check for 'sfir_fixed' complete with 0 errors, 0 warnings, and 1 messages.
### HDL code generation complete.

The critical path estimated without distributed pipelining is 10.269 ns. The estimated critical path appears in the Code Generation Report > Timing and Area Report > Critical Path Estimation tab.

Set Distributed Pipelining

Distributed pipelining is one of the subsystem block options. After you turn it on, the registers in the subsystem, including output pipeline registers and input pipeline registers, are repositioned to achieve best clock speed. It is equivalent to retiming at the subsystem level.

The code generation model explicitly reflects the distributed registers in the subsystem (highlighted in orange).

hdlset_param('sfir_fixed/symmetric_fir', 'DistributedPipelining', 'on');
makehdl('sfir_fixed/symmetric_fir', 'GeneratedModelNamePrefix', 'gm2_');
open_system('gm2_sfir_fixed/symmetric_fir');
set_param('gm2_sfir_fixed', 'SimulationCommand', 'update');
### Generating HDL for 'sfir_fixed/symmetric_fir'.
### Using the config set for model <a href="matlab:configset.showParameterGroup('sfir_fixed', { 'HDL Code Generation' } )">sfir_fixed</a> for HDL code generation parameters.
### Running HDL checks on the model 'sfir_fixed'.
### Begin compilation of the model 'sfir_fixed'...
### Applying HDL optimizations on the model 'sfir_fixed'...
### The code generation and optimization options you have chosen have introduced additional pipeline delays.
### The delay balancing feature has automatically inserted matching delays for compensation.
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 1: 2 cycles.
### Output port 2: 2 cycles.
### Begin model generation.
### Model generation complete.
### Estimated critical path for design: <a href="matlab:run('hdlsrc\sfir_fixed\criticalPathEstimated')">hdlsrc\sfir_fixed\criticalPathEstimated.m</a>
### To clear highlighting, click the following MATLAB script: <a href="matlab:run('hdlsrc\sfir_fixed\clearhighlighting.m')">hdlsrc\sfir_fixed\clearhighlighting.m</a>
### Begin VHDL Code Generation for 'sfir_fixed'.
### Working on sfir_fixed/symmetric_fir as hdlsrc\sfir_fixed\symmetric_fir.vhd.
### Generating package file hdlsrc\sfir_fixed\symmetric_fir_pkg.vhd.
### Code Generation for 'sfir_fixed' completed.
### Generating HTML files for code generation report at <a href="matlab:web('C:\Users\clewis\OneDrive - MathWorks\Documents\MATLAB\Examples\hdlcoder-ex37495842\hdlsrc\sfir_fixed\html\sfir_fixed_codegen_rpt.html');">sfir_fixed_codegen_rpt.html</a>
### Creating HDL Code Generation Check Report file://C:\Users\clewis\OneDrive - MathWorks\Documents\MATLAB\Examples\hdlcoder-ex37495842\hdlsrc\sfir_fixed\symmetric_fir_report.html
### HDL check for 'sfir_fixed' complete with 0 errors, 0 warnings, and 1 messages.
### HDL code generation complete.

The critical path estimated with distributed pipelining on is now 7.443 ns.

Use Synthesis Timing Estimates for Distributed Pipelining

To more accurately reflect how components function on hardware to better distribute pipelines in your design and maximize the clock frequency for your target device, enable the model configuration parameter Use synthesis timing estimates for Distributed Pipelining.

hdlset_param('sfir_fixed', 'UseSynthesisEstimatesForDistributedPipelining', 'on');
makehdl('sfir_fixed/symmetric_fir', 'GeneratedModelNamePrefix', 'gm3_');
open_system('gm3_sfir_fixed/symmetric_fir');
set_param('gm3_sfir_fixed', 'SimulationCommand', 'update');
### Generating HDL for 'sfir_fixed/symmetric_fir'.
### Using the config set for model <a href="matlab:configset.showParameterGroup('sfir_fixed', { 'HDL Code Generation' } )">sfir_fixed</a> for HDL code generation parameters.
### Running HDL checks on the model 'sfir_fixed'.
### Begin compilation of the model 'sfir_fixed'...
### Applying HDL optimizations on the model 'sfir_fixed'...
### The code generation and optimization options you have chosen have introduced additional pipeline delays.
### The delay balancing feature has automatically inserted matching delays for compensation.
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 1: 2 cycles.
### Output port 2: 2 cycles.
### Begin model generation.
### Model generation complete.
### Estimated critical path for design: <a href="matlab:run('hdlsrc\sfir_fixed\criticalPathEstimated')">hdlsrc\sfir_fixed\criticalPathEstimated.m</a>
### To clear highlighting, click the following MATLAB script: <a href="matlab:run('hdlsrc\sfir_fixed\clearhighlighting.m')">hdlsrc\sfir_fixed\clearhighlighting.m</a>
### Begin VHDL Code Generation for 'sfir_fixed'.
### Working on sfir_fixed/symmetric_fir as hdlsrc\sfir_fixed\symmetric_fir.vhd.
### Generating package file hdlsrc\sfir_fixed\symmetric_fir_pkg.vhd.
### Code Generation for 'sfir_fixed' completed.
### Generating HTML files for code generation report at <a href="matlab:web('C:\Users\clewis\OneDrive - MathWorks\Documents\MATLAB\Examples\hdlcoder-ex37495842\hdlsrc\sfir_fixed\html\sfir_fixed_codegen_rpt.html');">sfir_fixed_codegen_rpt.html</a>
### Creating HDL Code Generation Check Report file://C:\Users\clewis\OneDrive - MathWorks\Documents\MATLAB\Examples\hdlcoder-ex37495842\hdlsrc\sfir_fixed\symmetric_fir_report.html
### HDL check for 'sfir_fixed' complete with 0 errors, 0 warnings, and 1 messages.
### HDL code generation complete.

Using synthesis timing estimates causes an increase in run time of the function makehdl, but it can reduce the critical path and increase the clock frequency of your design. For more information, see Distributed Pipelining Using Synthesis Timing Estimates.

The critical path estimation report shows that critical path is now 6.320 ns. The critical path estimation is just an estimated critical path based on a target-specific timing database. If you want the actual critical path on your target hardware, you must run the model through synthesis.

Synthesis Comparison of Distributed Pipelining With and Without Synthesis Timing Estimates

In the HDL Workflow Advisor, generate HDL code and perform FPGA synthesis using the Generic ASIC/FPGA workflow with these settings:

  • Synthesis tool set to Xilinx Vivado

  • Family of the synthesis tool set to virtex7

  • Speed of the synthesis tool set to -1

  • Target Frequency (MHz) set to 120

For more information on the code generation and synthesis steps, see HDL Code Generation and FPGA Synthesis from Simulink Model.

The synthesis results without distributed pipelining enabled are:

The synthesis results shows negative slack, indicating that timing constraints are not met. The clock frequency is not calculated when timing constraints are not met.

The synthesis results with distributed pipelining enabled are:

The synthesis results show positive slack and a clock frequency of 178.05 MHz, indicating that the timing constraints are met and the target clock frequency of 120 MHz is met.

The synthesis results with distributed pipelining using synthesis timing estimates are:

The synthesis results show a clock frequency of 222.40 MHz, indicating that the timing constraints are met. The clock speed has increased by using synthesis timing estimates for distributed pipelining.

Distributed Pipelining Across Subsystem Hierarchies

Because distributed pipelining is a subsystem-level parameter, different subsystems at different levels of the hierarchy can specify different pipeline stage values and different distributed pipelining settings. By default, HDL Coder distributes only registers of the specified subsystem in this subsystem, not through the lower-level subsystems. If you want cross-hierarchy distribution, you can set the DistributedPipelining for lower subsystems to on and enable the global option Hierarchical Distributed Pipelining. When the local and global options are on, the entire subsystem, including the lower-level subsystems, are considered as a single subsystem when registers are distributed.