Main Content

Deploy YAMNet Networks to FPGAs with and Without Cross-Layer Equalization

This example shows how to deploy a YAMNet network with and without cross-layer equalization to an FPGA. Cross -layer equalization can improve quantized network performance by reducing the variance of the network learnable parameters in the channels while maintaining the original network mapping. You can compare the accuracies of the network with and without cross-layer equalization.

Prerequisites

  • Xilinx® Zynq® Ultrascale+™ ZCU102 SoC development kit

  • Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC

  • Audio Toolbox™

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

  • Deep Learning Toolbox Model Quantization Library Support Package

  • GPU Coder™

  • GPU Coder Interface for Deep Learning Libraries

Load Pretrained YAMNet Network and Download Data

The YAMNet sound classification network is trained on the AudioSet data set to predict audio events from the AudioSet ontology. This example classifies sounds from air compressor. The AudioSet data set includes recordings from air compressors[1].

The data set is classified into one healthy state and seven faulty states, for a total of eight classes.

To download and load the pretrained YAMNet network and a set of air compressor sounds, run these commands.

url = 'https://ssd.mathworks.com/supportfiles/audio/YAMNetTransferLearning.zip';
AirCompressorLocation = pwd;
dataFolder = fullfile(AirCompressorLocation,'YAMNetTransferLearning');

if ~exist(dataFolder,'dir')
    disp('Downloading pretrained network ...')
    unzip(url,AirCompressorLocation)
end
Downloading pretrained network ...
addpath(fullfile(AirCompressorLocation,'YAMNetTransferLearning'))

The final element of the Layers property is the classification output layer. The Classes property of this layer contains the names of the classes learned by the network.

load("airCompressorNet.mat");
net = airCompressorNet;
net.Layers(end).Classes
ans = 8×1 categorical
     Bearing 
     Flywheel 
     Healthy 
     LIV 
     LOV 
     NRV 
     Piston 
     Riderbelt 

Use the analyzeNetwork function to obtain information about the network layers. The function returns a graphical representation of the network that contains detailed parameter information for every layer in the network.

analyzeNetwork(net)

Create Calibration Data

The YAMNet network accepts inputs of size 96-by-64-by-1. Use the getyamnet_CLEInput helper function to obtain preprocessed mel spectograms of size 96-by-64-by-1. See Helper Functions. For best quantization results, the calibration data must be representative of actual inputs that are predicted by the YAMNet network. Expedite the calibration process by reducing the calibration data set to 20 mel spectograms.

numSpecs = 20;
[imOutCal, imdsCal] = getyamnet_CLEInput(numSpecs);

Calculate Accuracy of Quantized YAMNet Without Cross-Layer Equalization

Create a YAMNet network that does not use cross-layer equalization, quantize the network, and calculate its accuracy.

Calibrate and Quantize Network

Create a quantized network by using the dlquantizer object. Set the target execution environment to FPGA.

dlQuantObj = dlquantizer(net,ExecutionEnvironment='FPGA');

Use the calibrate function to exercise the network with sample inputs, collect the dynamic ranges of the weights and biases, and returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.

calibrate(dlQuantObj,imdsCal);

Deploy the Quantized Network

Define the target FPGA board programming interface by using the dlhdl.Target object. Specify that the interface is for a Xilinx board with an Ethernet interface.

hTarget = dlhdl.Target('Xilinx',Interface='Ethernet');

Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and the bitstream name. Ensure that the bitstream matches the data type and the FPGA board. In this example, the target FPGA is the Xilinx ZCU102 SOC board. The bitstream uses an int8 data type.

hW = dlhdl.Workflow(Network=dlQuantObj,Bitstream='zcu102_int8',Target=hTarget);

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment. Because the total number of frames exceeds the default value of 30, set the InputFrameNumberLimit to 100 to run predictions in chunks of 100 frames to prevent timeouts.

dn = compile(hW,InputFrameNumberLimit=100)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_int8.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### The network includes the following layers:
     1   'input_1'                    Image Input                  96×64×1 images                                                             (SW Layer)
     2   'conv2d'                     2-D Convolution              32 3×3×1 convolutions with stride [2  2] and padding 'same'                (HW Layer)
     3   'activation'                 ReLU                         ReLU                                                                       (HW Layer)
     4   'depthwise_conv2d'           2-D Grouped Convolution      32 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     5   'activation_1'               ReLU                         ReLU                                                                       (HW Layer)
     6   'conv2d_1'                   2-D Convolution              64 1×1×32 convolutions with stride [1  1] and padding 'same'               (HW Layer)
     7   'activation_2'               ReLU                         ReLU                                                                       (HW Layer)
     8   'depthwise_conv2d_1'         2-D Grouped Convolution      64 groups of 1 3×3×1 convolutions with stride [2  2] and padding 'same'    (HW Layer)
     9   'activation_3'               ReLU                         ReLU                                                                       (HW Layer)
    10   'conv2d_2'                   2-D Convolution              128 1×1×64 convolutions with stride [1  1] and padding 'same'              (HW Layer)
    11   'activation_4'               ReLU                         ReLU                                                                       (HW Layer)
    12   'depthwise_conv2d_2'         2-D Grouped Convolution      128 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    13   'activation_5'               ReLU                         ReLU                                                                       (HW Layer)
    14   'conv2d_3'                   2-D Convolution              128 1×1×128 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    15   'activation_6'               ReLU                         ReLU                                                                       (HW Layer)
    16   'depthwise_conv2d_3'         2-D Grouped Convolution      128 groups of 1 3×3×1 convolutions with stride [2  2] and padding 'same'   (HW Layer)
    17   'activation_7'               ReLU                         ReLU                                                                       (HW Layer)
    18   'conv2d_4'                   2-D Convolution              256 1×1×128 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    19   'activation_8'               ReLU                         ReLU                                                                       (HW Layer)
    20   'depthwise_conv2d_4'         2-D Grouped Convolution      256 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    21   'activation_9'               ReLU                         ReLU                                                                       (HW Layer)
    22   'conv2d_5'                   2-D Convolution              256 1×1×256 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    23   'activation_10'              ReLU                         ReLU                                                                       (HW Layer)
    24   'depthwise_conv2d_5'         2-D Grouped Convolution      256 groups of 1 3×3×1 convolutions with stride [2  2] and padding 'same'   (HW Layer)
    25   'activation_11'              ReLU                         ReLU                                                                       (HW Layer)
    26   'conv2d_6'                   2-D Convolution              512 1×1×256 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    27   'activation_12'              ReLU                         ReLU                                                                       (HW Layer)
    28   'depthwise_conv2d_6'         2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    29   'activation_13'              ReLU                         ReLU                                                                       (HW Layer)
    30   'conv2d_7'                   2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    31   'activation_14'              ReLU                         ReLU                                                                       (HW Layer)
    32   'depthwise_conv2d_7'         2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    33   'activation_15'              ReLU                         ReLU                                                                       (HW Layer)
    34   'conv2d_8'                   2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    35   'activation_16'              ReLU                         ReLU                                                                       (HW Layer)
    36   'depthwise_conv2d_8'         2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    37   'activation_17'              ReLU                         ReLU                                                                       (HW Layer)
    38   'conv2d_9'                   2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    39   'activation_18'              ReLU                         ReLU                                                                       (HW Layer)
    40   'depthwise_conv2d_9'         2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    41   'activation_19'              ReLU                         ReLU                                                                       (HW Layer)
    42   'conv2d_10'                  2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    43   'activation_20'              ReLU                         ReLU                                                                       (HW Layer)
    44   'depthwise_conv2d_10'        2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    45   'activation_21'              ReLU                         ReLU                                                                       (HW Layer)
    46   'conv2d_11'                  2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    47   'activation_22'              ReLU                         ReLU                                                                       (HW Layer)
    48   'depthwise_conv2d_11'        2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [2  2] and padding 'same'   (HW Layer)
    49   'activation_23'              ReLU                         ReLU                                                                       (HW Layer)
    50   'conv2d_12'                  2-D Convolution              1024 1×1×512 convolutions with stride [1  1] and padding 'same'            (HW Layer)
    51   'activation_24'              ReLU                         ReLU                                                                       (HW Layer)
    52   'depthwise_conv2d_12'        2-D Grouped Convolution      1024 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'  (HW Layer)
    53   'activation_25'              ReLU                         ReLU                                                                       (HW Layer)
    54   'conv2d_13'                  2-D Convolution              1024 1×1×1024 convolutions with stride [1  1] and padding 'same'           (HW Layer)
    55   'activation_26'              ReLU                         ReLU                                                                       (HW Layer)
    56   'global_average_pooling2d'   2-D Global Average Pooling   2-D global average pooling                                                 (HW Layer)
    57   'dense'                      Fully Connected              8 fully connected layer                                                    (HW Layer)
    58   'softmax'                    Softmax                      softmax                                                                    (SW Layer)
    59   'Sounds'                     Classification Output        crossentropyex with 'Bearing' and 7 other classes                          (SW Layer)
                                                                                                                                            
### Notice: The layer 'input_1' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Sounds' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv2d>>activation_26 ...
### Compiling layer group: conv2d>>activation_26 ... complete.
### Compiling layer group: global_average_pooling2d ...
### Compiling layer group: global_average_pooling2d ... complete.
### Compiling layer group: dense ...
### Compiling layer group: dense ... complete.

### Allocating external memory buffers:

          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________

    "InputDataOffset"           "0x00000000"     "8.0 MB"        
    "OutputResultOffset"        "0x00800000"     "4.0 MB"        
    "SchedulerDataOffset"       "0x00c00000"     "4.0 MB"        
    "SystemBufferOffset"        "0x01000000"     "28.0 MB"       
    "InstructionDataOffset"     "0x02c00000"     "4.0 MB"        
    "ConvWeightDataOffset"      "0x03000000"     "32.0 MB"       
    "FCWeightDataOffset"        "0x05000000"     "4.0 MB"        
    "EndOffset"                 "0x05400000"     "Total: 84.0 MB"

### Network compilation complete.
dn = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {}
             ddrInfo: [1×1 struct]

To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the programming file to program the FPGA board and downloads the network weights and biases. The deploy function programs the FPGA device and displays progress messages, and the required time to deploy the network.

deploy(hW)
### Programming FPGA Bitstream using Ethernet...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101...
### Copying FPGA programming files to SD card...
### Setting FPGA bitstream and devicetree for boot...
# Copying Bitstream zcu102_int8.bit to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/zcu102_int8.bit
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'
### Rebooting Xilinx SoC at 192.168.1.101...
### Reboot may take several seconds...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 06-Jul-2023 16:07:40
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 06-Jul-2023 16:07:40

Test Network

Prepare the test data for prediction. Use the entire data set consisting of preprocessed mel spectrograms. Compare the predictions of the quantized network to the predictions of Deep Learning Toolbox.

[imOutPred, imdPred] = getyamnet_CLEInput(88);

Calculate the accuracy of the predictions of the quantized network with respect to the predictions from Deep Learning Toolbox by using the getNetworkAccuracy helper function. See Helper Functions.

quantizedAccuracy = getNetworkAccuracy(hW,imOutPred,net)
### Finished writing input activations.
### Running in multi-frame mode with 88 inputs.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    7978514                  0.03191                      88          701741839             31.4
    conv2d                   27553                  0.00011 
    depthwise_conv2d         26068                  0.00010 
    conv2d_1                 80316                  0.00032 
    depthwise_conv2d_1       27113                  0.00011 
    conv2d_2                 69613                  0.00028 
    depthwise_conv2d_2       32615                  0.00013 
    conv2d_3                126269                  0.00051 
    depthwise_conv2d_3       21334                  0.00009 
    conv2d_4                 90407                  0.00036 
    depthwise_conv2d_4       27856                  0.00011 
    conv2d_5                171756                  0.00069 
    depthwise_conv2d_5       22016                  0.00009 
    conv2d_6                312844                  0.00125 
    depthwise_conv2d_6       29744                  0.00012 
    conv2d_7                617984                  0.00247 
    depthwise_conv2d_7       29674                  0.00012 
    conv2d_8                617664                  0.00247 
    depthwise_conv2d_8       30024                  0.00012 
    conv2d_9                617954                  0.00247 
    depthwise_conv2d_9       30084                  0.00012 
    conv2d_10               617534                  0.00247 
    depthwise_conv2d_10      29724                  0.00012 
    conv2d_11               617444                  0.00247 
    depthwise_conv2d_11      26122                  0.00010 
    conv2d_12              1207736                  0.00483 
    depthwise_conv2d_12      38636                  0.00015 
    conv2d_13              2409106                  0.00964 
    global_average_pooling2d     20015                  0.00008 
    dense                     3231                  0.00001 
 * The clock frequency of the DL processor is: 250MHz
quantizedAccuracy = 0.6477

Calculate Accuracy of Quantized YAMNet with Cross Layer Equalization

Create a YAMNet network with cross-layer equalization, quantize the network, and calculate its accuracy.

Create, Calibrate and Quantize a Cross-Layer Equalized Network

Create a cross layer equalized network by using the equalizeLayers function.

netCLE = equalizeLayers(net);

Create a quantized YAMNet with cross layer equalization by using the dlquantizer object. Set the target execution environment to FPGA.

dlQuantObjCLE = dlquantizer(netCLE,ExecutionEnvironment='FPGA');

Use the calibrate function to exercise the network with sample inputs, collect the dynamic ranges of the weights and biases, and returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.

calibrate(dlQuantObjCLE,imdsCal);

Deploy Quantized Network

Define the target FPGA board programming interface by using the dlhdl.Target object. Specify that the interface is for a Xilinx board with an Ethernet interface.

hTargetCLE = dlhdl.Target('Xilinx',Interface='Ethernet');

Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and the bitstream name. Ensure that the bitstream matches the data type and the FPGA board. In this example, the target FPGA is the Xilinx ZCU102 SOC board. The bitstream uses an int8 data type.

hWCLE = dlhdl.Workflow(Network=dlQuantObjCLE,Bitstream='zcu102_int8',Target=hTargetCLE);

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment. Because the total number of frames exceeds the default value of 30, set the InputFrameNumberLimit to 100 to run predictions in chunks of 100 frames to prevent timeouts.

dnCLE = compile(hWCLE,InputFrameNumberLimit=100)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_int8.
### The network includes the following layers:
     1   'input_1'                    Image Input                  96×64×1 images                                                             (SW Layer)
     2   'conv2d'                     2-D Convolution              32 3×3×1 convolutions with stride [2  2] and padding 'same'                (HW Layer)
     3   'activation'                 ReLU                         ReLU                                                                       (HW Layer)
     4   'depthwise_conv2d'           2-D Grouped Convolution      32 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     5   'activation_1'               ReLU                         ReLU                                                                       (HW Layer)
     6   'conv2d_1'                   2-D Convolution              64 1×1×32 convolutions with stride [1  1] and padding 'same'               (HW Layer)
     7   'activation_2'               ReLU                         ReLU                                                                       (HW Layer)
     8   'depthwise_conv2d_1'         2-D Grouped Convolution      64 groups of 1 3×3×1 convolutions with stride [2  2] and padding 'same'    (HW Layer)
     9   'activation_3'               ReLU                         ReLU                                                                       (HW Layer)
    10   'conv2d_2'                   2-D Convolution              128 1×1×64 convolutions with stride [1  1] and padding 'same'              (HW Layer)
    11   'activation_4'               ReLU                         ReLU                                                                       (HW Layer)
    12   'depthwise_conv2d_2'         2-D Grouped Convolution      128 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    13   'activation_5'               ReLU                         ReLU                                                                       (HW Layer)
    14   'conv2d_3'                   2-D Convolution              128 1×1×128 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    15   'activation_6'               ReLU                         ReLU                                                                       (HW Layer)
    16   'depthwise_conv2d_3'         2-D Grouped Convolution      128 groups of 1 3×3×1 convolutions with stride [2  2] and padding 'same'   (HW Layer)
    17   'activation_7'               ReLU                         ReLU                                                                       (HW Layer)
    18   'conv2d_4'                   2-D Convolution              256 1×1×128 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    19   'activation_8'               ReLU                         ReLU                                                                       (HW Layer)
    20   'depthwise_conv2d_4'         2-D Grouped Convolution      256 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    21   'activation_9'               ReLU                         ReLU                                                                       (HW Layer)
    22   'conv2d_5'                   2-D Convolution              256 1×1×256 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    23   'activation_10'              ReLU                         ReLU                                                                       (HW Layer)
    24   'depthwise_conv2d_5'         2-D Grouped Convolution      256 groups of 1 3×3×1 convolutions with stride [2  2] and padding 'same'   (HW Layer)
    25   'activation_11'              ReLU                         ReLU                                                                       (HW Layer)
    26   'conv2d_6'                   2-D Convolution              512 1×1×256 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    27   'activation_12'              ReLU                         ReLU                                                                       (HW Layer)
    28   'depthwise_conv2d_6'         2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    29   'activation_13'              ReLU                         ReLU                                                                       (HW Layer)
    30   'conv2d_7'                   2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    31   'activation_14'              ReLU                         ReLU                                                                       (HW Layer)
    32   'depthwise_conv2d_7'         2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    33   'activation_15'              ReLU                         ReLU                                                                       (HW Layer)
    34   'conv2d_8'                   2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    35   'activation_16'              ReLU                         ReLU                                                                       (HW Layer)
    36   'depthwise_conv2d_8'         2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    37   'activation_17'              ReLU                         ReLU                                                                       (HW Layer)
    38   'conv2d_9'                   2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    39   'activation_18'              ReLU                         ReLU                                                                       (HW Layer)
    40   'depthwise_conv2d_9'         2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    41   'activation_19'              ReLU                         ReLU                                                                       (HW Layer)
    42   'conv2d_10'                  2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    43   'activation_20'              ReLU                         ReLU                                                                       (HW Layer)
    44   'depthwise_conv2d_10'        2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'   (HW Layer)
    45   'activation_21'              ReLU                         ReLU                                                                       (HW Layer)
    46   'conv2d_11'                  2-D Convolution              512 1×1×512 convolutions with stride [1  1] and padding 'same'             (HW Layer)
    47   'activation_22'              ReLU                         ReLU                                                                       (HW Layer)
    48   'depthwise_conv2d_11'        2-D Grouped Convolution      512 groups of 1 3×3×1 convolutions with stride [2  2] and padding 'same'   (HW Layer)
    49   'activation_23'              ReLU                         ReLU                                                                       (HW Layer)
    50   'conv2d_12'                  2-D Convolution              1024 1×1×512 convolutions with stride [1  1] and padding 'same'            (HW Layer)
    51   'activation_24'              ReLU                         ReLU                                                                       (HW Layer)
    52   'depthwise_conv2d_12'        2-D Grouped Convolution      1024 groups of 1 3×3×1 convolutions with stride [1  1] and padding 'same'  (HW Layer)
    53   'activation_25'              ReLU                         ReLU                                                                       (HW Layer)
    54   'conv2d_13'                  2-D Convolution              1024 1×1×1024 convolutions with stride [1  1] and padding 'same'           (HW Layer)
    55   'activation_26'              ReLU                         ReLU                                                                       (HW Layer)
    56   'global_average_pooling2d'   2-D Global Average Pooling   2-D global average pooling                                                 (HW Layer)
    57   'dense'                      Fully Connected              8 fully connected layer                                                    (HW Layer)
    58   'softmax'                    Softmax                      softmax                                                                    (SW Layer)
    59   'Sounds'                     Classification Output        crossentropyex with 'Bearing' and 7 other classes                          (SW Layer)
                                                                                                                                            
### Notice: The layer 'input_1' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Sounds' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv2d>>activation_26 ...
### Compiling layer group: conv2d>>activation_26 ... complete.
### Compiling layer group: global_average_pooling2d ...
### Compiling layer group: global_average_pooling2d ... complete.
### Compiling layer group: dense ...
### Compiling layer group: dense ... complete.

### Allocating external memory buffers:

          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________

    "InputDataOffset"           "0x00000000"     "8.0 MB"        
    "OutputResultOffset"        "0x00800000"     "4.0 MB"        
    "SchedulerDataOffset"       "0x00c00000"     "4.0 MB"        
    "SystemBufferOffset"        "0x01000000"     "28.0 MB"       
    "InstructionDataOffset"     "0x02c00000"     "4.0 MB"        
    "ConvWeightDataOffset"      "0x03000000"     "32.0 MB"       
    "FCWeightDataOffset"        "0x05000000"     "4.0 MB"        
    "EndOffset"                 "0x05400000"     "Total: 84.0 MB"

### Network compilation complete.
dnCLE = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {}
             ddrInfo: [1×1 struct]

To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the programming file to program the FPGA board and downloads the network weights and biases. The deploy function programs the FPGA device and displays progress messages, and the required time to deploy the network.

deploy(hWCLE)
### Programming FPGA Bitstream using Ethernet...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101...
### Copying FPGA programming files to SD card...
### Setting FPGA bitstream and devicetree for boot...
# Copying Bitstream zcu102_int8.bit to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/zcu102_int8.bit
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'
### Rebooting Xilinx SoC at 192.168.1.101...
### Reboot may take several seconds...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 06-Jul-2023 16:13:55
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 06-Jul-2023 16:13:55

Test Network

Prepare the test data for prediction. Use the entire data set consisting of preprocessed mel spectrograms. Compare the predictions of the quantized network to the predictions of Deep Learning Toolbox.

[imOutPredCLE, imdPredCLE] = getyamnet_CLEInput(88);

Compare the accuracy of the quantized network predictions against the accuracy of the predictions from Deep Learning Toolbox, by using the getNetworkAccuracy helper function. See Helper Functions.

quantizedAccuracyCLE = getNetworkAccuracy(hWCLE, imOutPredCLE, netCLE)
### Finished writing input activations.
### Running in multi-frame mode with 88 inputs.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    7977295                  0.03191                      88          701748593             31.4
    conv2d                   27523                  0.00011 
    depthwise_conv2d         26158                  0.00010 
    conv2d_1                 80284                  0.00032 
    depthwise_conv2d_1       27109                  0.00011 
    conv2d_2                 69334                  0.00028 
    depthwise_conv2d_2       32244                  0.00013 
    conv2d_3                126831                  0.00051 
    depthwise_conv2d_3       20814                  0.00008 
    conv2d_4                 90320                  0.00036 
    depthwise_conv2d_4       27841                  0.00011 
    conv2d_5                171883                  0.00069 
    depthwise_conv2d_5       22036                  0.00009 
    conv2d_6                312914                  0.00125 
    depthwise_conv2d_6       29674                  0.00012 
    conv2d_7                618094                  0.00247 
    depthwise_conv2d_7       29604                  0.00012 
    conv2d_8                617444                  0.00247 
    depthwise_conv2d_8       30064                  0.00012 
    conv2d_9                617944                  0.00247 
    depthwise_conv2d_9       30054                  0.00012 
    conv2d_10               617384                  0.00247 
    depthwise_conv2d_10      29704                  0.00012 
    conv2d_11               617474                  0.00247 
    depthwise_conv2d_11      26122                  0.00010 
    conv2d_12              1207626                  0.00483 
    depthwise_conv2d_12      38466                  0.00015 
    conv2d_13              2408966                  0.00964 
    global_average_pooling2d     20127                  0.00008 
    dense                     3179                  0.00001 
 * The clock frequency of the DL processor is: 250MHz
quantizedAccuracyCLE = 0.8636

Using the cross-layer equalization improves the prediction accuracy of the network. The accuracy of the network with cross-layer equalization (netCLE) is 86.36% and the accuracy of the network without cross-layer equalization (net) is 64.77%.

Helper Functions

The getyamnet_CLEInput helper function obtains preprocessed mel spectograms of size 96-by-64-by-1.

function [imOut,imds] = getyamnet_CLEInput(NumOfImg)
    if NumOfImg > 88
        error('Provide an input less than or equal to the size of this dataset of 88 images.');
    end
    
    spectData = load('yamnetInput.mat');
    spectData = spectData.melSpectYam;
    imOut = spectData(:,:,:,1:NumOfImg);
    imds = augmentedImageDatastore([96 64],imOut);
end

The getNetworkAccuracy helper function retrieves predictions from the FPGA and compares them to the predictions from Deep Learning Toolbox™.

function accuracy = getNetworkAccuracy(workFlow, data, network) % Function gets accuracy of predictions by workflow object with respect to predictions by DL toolbox
% Predictions by workflow object
hwPred = workFlow.predict(data,Profile="on"); % Matrix of probability for each class
[hwValue, hwIdx] = max(hwPred,[],2); % Get value and index of class with max probability
hwClasses = network.Layers(end).Classes(hwIdx); % Get class names from index

% predictions by DL toolbox object
dlPred = predict(network, data); % Matrix of probability for each class
[dlValue, dlIdx] = max(dlPred,[],2); % Get value and index of class with max probability
dlClasses = network.Layers(end).Classes(dlIdx); % Get class names from index

accuracy = nnz(dlClasses == hwClasses)/size(data,4); % Number of same predictions/total predictions

end

References

[1] Verma, Nishchal K., et al. “Intelligent Condition Based Monitoring Using Acoustic Signals for Air Compressors.” IEEE Transactions on Reliability, vol. 65, no. 1, Mar. 2016, pp. 291–309. DOI.org (Crossref), doi:10.1109/TR.2015.2459684.

See Also

| | | | | | |