Quantisierung, Projektion und Pruning

Komprimieren eines tiefen neuronalen Netzes durch Quantisierung, Projektion oder Pruning

Verwenden Sie die Deep Learning Toolbox™ zusammen mit dem Support-Paket für die Model Quantization Library der Deep Learning Toolbox, um den Speicherbedarf und die Rechenanforderungen eines tiefen neuronalen Netzes zu reduzieren durch:

Pruning von Filtern aus Faltungsschichten durch Taylor-Approximation erster Ordnung. Sie können dann C/C++ oder CUDA^®-Code aus diesem beschnittenen Netz generieren.
Projektion von Schichten durch Durchführung einer Hauptkomponentenanalyse (PCA) auf die Schichtaktivierungen unter Verwendung eines für die Trainingsdaten repräsentativen Datensatzes und Anwendung linearer Projektionen auf die lernbaren Schichtparameter. Vorwärtsdurchläufe eines projizierten tiefen neuronalen Netzes sind in der Regel schneller, wenn Sie das Netz mit bibliotheksfreier C/C++ Codegenerierung auf eingebetteter Hardware bereitstellen.
Quantisierung der Gewichte, Verzerrungen und Aktivierungen von Schichten auf skalierte Ganzzahl-Datentypen mit reduzierter Genauigkeit. Sie können dann C/C++, CUDA- oder HDL-Code aus diesem quantisierten Netz generieren.
Für die C/C++ und CUDA-Codegenerierung erzeugt die Software Code für ein tiefes faltendes neuronales Netz, indem sie die Gewichte, Verzerrungen und Aktivierungen der Faltungsschichten auf 8-Bit skalierte Ganzzahldatentypen quantisiert. Die Quantisierung wird durchgeführt, indem die Kalibrierungsergebnisdatei, die von der calibrate-Funktion erzeugt wurde, dem codegen (MATLAB Coder)-Befehl bereitstellt.
Die Codegenerierung unterstützt keine quantisierten tiefen neuronalen Netze, die von der quantize-Funktion erzeugt werden.

Funktionen

alle erweitern

Pruning

`taylorPrunableNetwork`	Network that can be pruned by using first-order Taylor approximation (Seit R2022a)
`forward`	Compute deep learning network output for training (Seit R2019b)
`predict`	Compute deep learning network output for inference (Seit R2019b)
`updatePrunables`	Remove filters from prunable layers based on importance scores (Seit R2022a)
`updateScore`	Compute and accumulate Taylor-based importance scores for pruning (Seit R2022a)
`dlnetwork`	Deep learning neural network (Seit R2019b)

Projektion

`compressNetworkUsingProjection`	Compress neural network using projection (Seit R2022b)
`neuronPCA`	Principal component analysis of neuron activations (Seit R2022b)
`unpackProjectedLayers`	Unpack projected layers of neural network (Seit R2023b)
`ProjectedLayer`	Compressed neural network layer using projection (Seit R2023b)
`gruProjectedLayer`	Gated recurrent unit (GRU) projected layer for recurrent neural network (RNN) (Seit R2023b)
`lstmProjectedLayer`	Long short-term memory (LSTM) projected layer for recurrent neural network (RNN) (Seit R2022b)

Quantisierung

`dlquantizer`	Quantize a deep neural network to 8-bit scaled integer data types (Seit R2020a)
`dlquantizationOptions`	Options for quantizing a trained deep neural network (Seit R2020a)
`calibrate`	Simulate and collect ranges of a deep neural network (Seit R2020a)
`quantize`	Quantize deep neural network (Seit R2022a)
`validate`	Quantize and validate a deep neural network (Seit R2020a)
`quantizationDetails`	Display quantization details for a neural network (Seit R2022a)
`estimateNetworkMetrics`	Estimate network metrics for specific layers of a neural network (Seit R2022a)
`equalizeLayers`	Equalize layer parameters of deep neural network (Seit R2022b)

Apps

Deep Network Quantizer

Quantize deep neural network to 8-bit scaled integer data types (Seit R2020a)

Themen

Pruning

Parameter Pruning and Quantization of Image Classification Network
Use parameter pruning and quantization to reduce network size.
Prune Image Classification Network Using Taylor Scores
This example shows how to reduce the size of a deep neural network using Taylor pruning.
Prune Filters in a Detection Network Using Taylor Scores
This example shows how to reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.
Prune and Quantize Convolutional Neural Network for Speech Recognition
This example shows how to compress a convolutional neural nework (CNN) to prepare it for deployment on an embedded system.

Projektion und Wissensdestillation

Compress Neural Network Using Projection
This example shows how to compress a neural network using projection and principal component analysis.
Compress Network for Estimating Battery State of Charge
This example shows how to compress a neural network for predicting the state of charge of a battery using projection and principal component analysis. (Seit R2023b)
Train Smaller Neural Network Using Knowledge Distillation
This example shows how to reduce the memory footprint of a deep learning network by using knowledge distillation. (Seit R2023b)

Quantisierung

Quantization of Deep Neural Networks
Understand effects of quantization and how to visualize dynamic ranges of network convolution layers.
Quantization Workflow Prerequisites
Products required for the quantization of deep learning networks.
Prepare Data for Quantizing Networks
Supported datastores for quantization workflows.
Quantize Multiple-Input Network Using Image and Feature Data
Quantize Multiple Input Network Using Image and Feature Data

Quantisierung für GPU-Ziel

Generate INT8 Code for Deep Learning Networks (GPU Coder)
Quantize and generate code for a pretrained convolutional neural network.
Quantize Residual Network Trained for Image Classification and Generate CUDA Code
This example shows how to quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data.
Quantize Layers in Object Detectors and Generate CUDA Code
This example shows how to generate CUDA® code for an SSD vehicle detector and a YOLO v2 vehicle detector that performs inference computations in 8-bit integers for the convolutional layers.
Quantize Semantic Segmentation Network and Generate CUDA Code
Quantize Convolutional Neural Network Trained for Semantic Segmentation and Generate CUDA Code

Quantisierung für FPGA-Ziel

Quantize Network for FPGA Deployment (Deep Learning HDL Toolbox)
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types.
Classify Images on FPGA Using Quantized Neural Network (Deep Learning HDL Toolbox)
This example shows how to use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network (CNN) to an FPGA.
Classify Images on FPGA by Using Quantized GoogLeNet Network (Deep Learning HDL Toolbox)
This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image.

Quantisierung für CPU-Ziel

Generate int8 Code for Deep Learning Networks (MATLAB Coder)
Quantize and generate code for a pretrained convolutional neural network.
Generate INT8 Code for Deep Learning Network on Raspberry Pi (MATLAB Coder)
Generate code for deep learning network that performs inference computations in 8-bit integers.

Enthaltene Beispiele

Prune Image Classification Network Using Taylor Scores

Reduce the size of a deep neural network using Taylor pruning. By using the taylorPrunableNetwork function to remove convolution layer filters, you can reduce the overall network size and increase the inference speed.

Live Script öffnen

Prune Filters in a Detection Network Using Taylor Scores

Reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.

Live Script öffnen

Compress Neural Network Using Projection

Compress a neural network using projection and principal component analysis.

Live Script öffnen

Quantize Residual Network Trained for Image Classification and Generate CUDA Code

Quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data.

Live Script öffnen

Prune and Quantize Semantic Segmentation Network

Reduce the memory footprint of a semantic segmentation network and speed-up inference by compressing the network using pruning and quantization.

Live Script öffnen

Explore Quantized Semantic Segmentation Network Using Grad-CAM

Compare the predictions of a quantized semantic segmentation network to the original network using the gradient-weighted class activation mapping (Grad-CAM) interpretability method.

Live Script öffnen