coder.GpuCodeConfig

Configuration parameters for CUDA code generation from MATLAB code

Description

coder.GpuCodeConfig objects specify the code configuration parameters for generating NVIDIA^® CUDA^® code from MATLAB^® code. Use the properties of the coder.GpuCodeConfig object to customize CUDA features, such as kernel launch parameters, NVIDIA code libraries, and CUDA compute capability.

Creation

To create a coder.GpuCodeConfig object, first create one of these code configuration objects by using the coder.gpuConfig function:

The GpuConfig property of the configuration object contains a coder.GpuCodeConfig object.

Properties

expand all

`Enabled` — Option to generate GPU code
`true` (default) | `false`

Option to generate GPU code, specified as numeric or 1 (true) or 0 (false). For more information, see Generate GPU Code.

Example: cfg.GpuConfig.Enabled = true

`MallocMode` — GPU memory allocation
`"discrete"` (default) | `"unified"`

Memory allocation mode to use in the generated CUDA code, specified as "discrete" or "unified". For more information, see Malloc mode.

Example: cfg.GpuConfig.MallocMode = "discrete"

`KernelNamePrefix` — Custom kernel name prefixes
`''` (default) | character vector | string scalar

Custom kernel name prefixes, specified as a character vector or string scalar. For more information, see Kernel name prefix.

Example: cfg.GpuConfig.KernelNamePrefix = "myKernel"

`EnableCUBLAS` — Option to replace math function calls with `cuBLAS` library calls
`true` or `1` (default) | `false` or `0`

Option to replace math function calls with NVIDIA cuBLAS library calls, specified as numeric or 1 (true) or 0 (false). For more information, see Enable cuBLAS.

Example: cfg.GpuConfig.EnableCUBLAS = true

`EnableCUSOLVER` — Option to replace math function calls to `cuSOLVER` library calls
`true` or `1` (default) | `false` or `0`

Option to replace math function calls with NVIDIA cuSOLVER library calls, specified as numeric or logical 1 (true) or 0 (false). For more information, see Enable cuSOLVER.

Example: cfg.GpuConfig.EnableCUSOLVER = true

`EnableCUFFT` — Option to replace `fft` function calls with `cuFFT` library calls
`true` or `1` (default) | `false` or `0`

Option to replace fft function calls with NVIDIA cuFFT library calls, specified as numeric or logical 1 (true) or 0 (false). For more information, see Enable cuFFT.

Example: cfg.GpuConfig.EnableCUFFT = true

`SafeBuild` — Option to check for errors in the generated code
`false` or `0` (default) | `true` or `1`

Option to check for errors in generated CUDA code, specified as numeric or logical 1 (true) or 0 (false). Use this property to check for errors in CUDA API calls and kernel launches. For more information, see Safe build.

Example: cfg.GpuConfig.SafeBuild = true

`ComputeCapability` — Minimum compute capability for generated code
`'Auto'` (default) | `"5.0"` | `"5.2"` | `"5.3"` | `"6.0"` | `"6.1"` | `"6.2"` | `"7.0"` | ...

Minimum compute capability required to run generated CUDA code, specified as one of these values.

"Auto"
"3.2"
"3.5"
"3.7"
"5.0"
"5.2"
"5.3"
"6.0"
"6.1"
"6.2"
"7.0"
"7.2"
"7.5"
"8.0"
"8.6"
"8.7"
"8.9"
"9.0"

For more information, see Minimum compute capability.

Example: cfg.GpuConfig.ComputeCapability = "6.1"

`CustomComputeCapability` — Name of virtual GPU architecture
`''` (default) | character vector | string scalar

Name of the NVIDIA virtual GPU architecture for which to compile the CUDA input files, specified as a character or string scalar. For more information, see Custom compute capability.

Example: cfg.GpuConfig.CustomComputeCapability = "-arch=compute_50"

`CompilerFlags` — Additional flags to pass to GPU compiler
`''` (default) | character vector | string scalar

Additional flags to pass to the GPU compiler, specified as a character vector or string scalar. For more information, see Compiler flags.

Example: cfg.GpuConfig.CompilerFlags = "--fmad=false"

`StackLimitPerThread` — Stack limit per GPU thread
`51200` (default) | integer

Stack limit per GPU thread in bytes, specified as an integer. For more information, see Stack limit.

Example: cfg.GpuConfig.StackLimitPerThread = 1024

`MaximumBlocksPerKernel` — Maximum number of blocks created during kernel launch
`0` (default) | integer

Maximum number of blocks created during a kernel launch, specified as an integer. For more information, see Maximum blocks per kernel.

Example: cfg.GpuConfig.MaximumBlocksPerKernel = 1024

`EnableMemoryManager` — Option to use GPU memory manager
`true` or `1` (default) | `false` or `0`

Option to use GPU memory manager, specified as numeric or logical 1 (true) or 0 (false). For more information, see Enable GPU memory manager.

Example: cfg.GpuConfig.EnableMemoryManager = true

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`

CUDA device selection, specified as the numeric value of the device ID. For more information, see GPU device ID.

Example: cfg.GpuConfig.SelectCudaDevice = 0

Examples

collapse all

Generate CUDA MEX Function

Create a configuration object to generate a CUDA MEX function from a MATLAB function. Then, use the GpuConfig property of the configuration object to modify the coder.GpuCodeConfig object. In this example, enable the option to replace math function with calls to the NVIDIA cuBLAS library.

Write a MATLAB function, vecAdd, that adds two inputs.

function [C] = VecAdd(A,B) %#codegen
coder.gpu.kernelfun(); 
C = A + B;
end

To generate a MEX function, create a code generation configuration object.

cfg = coder.gpuConfig("mex");

Enable the code generation report.

cfg.GpuConfig.EnableCUBLAS = true;
cfg.GenerateReport = true;

Generate a MEX function in the current folder by using the -config option and the configuration object.

% Generate a MEX function and code generation report
codegen -config cfg -args {zeros(512,512,"double"),zeros(512,512,"double")} VecAdd

Limitations

On Windows^® platforms, the generated makefiles for standalone targets, such as dynamic libraries, static libraries, and executables, do not set the /MT or /MD compiler flags. These flags direct the Visual Studio^® compiler to use the multithread library. By default, Visual Studio uses the /MT flag during compilation. To pass other compiler-specific flags, use the CompilerFlags property. For example, to specify the /MD flag for a configuration object cfg, enter this code :
```
cfg.GpuConfig.CompilerFlags = "-Xcompiler /MD";
```
The nvcc compiler supports a limited set of file suffixes. For example, if object file contains version numbers, compilation may fail. In such cases, create symbolic links or specify "-Xlinker" in the CompilerFlags property.

Alternative Functionality

App

You can use the GPU Coder app to configure the code generator and generate CUDA code. For more information, see Generate Code by Using the GPU Coder App.

Version History

Introduced in R2017b

expand all

R2026a: `Benchmarking` property will be removed

Setting Benchmarking to true or 1 and generating code generates a warning. To generate and profile CUDA code, use the gpuPerformanceAnalyzer function instead.

R2026a: `MallocThreshold` property will be removed

Setting the MallocThreshold property to a value other than 200 and generating code generates a warning. To limit the amount of stack memory per GPU thread, use the StackLimitPerThread property instead.

R2026a: Default limit of stack memory per GPU thread increased

The default value of the StackLimitPerThread property is 51200. In previous releases, the default value was 1024.

R2024a: GPU memory manager is enabled by default

In previous releases, the default value of the EnableMemoryManager property was false. Now, the default value has changed to true. Therefore, when you generate CUDA code, the GPU memory manager is enabled by default.

Because of this change, once you generate a CUDA MEX with the default configuration setting, you cannot run this MEX on a different GPU. If you want to run the generated MEX on a different GPU, set the EnableMemoryManager property to false before you generate code.

R2024a: Configuration parameters related to GPU memory manager are removed

In previous releases, the GPU memory manager provided code configuration parameters to manage the allocation and deallocation of memory blocks in the GPU memory pools. These properties have now been removed.

The removed properties are:

BlockAlignment
FreeMode
MinPoolSize
MaxPoolSize

R2024a: Change to default compute capability value in code configuration

The default value of the ComputeCapability property is now 'Auto' instead of '3.5'. When compute capability is set to 'Auto', the code generator detects and uses the compute capability of the GPU device that you have selected for GPU code generation. If no GPU device is available or if the code generator is unable to detect a GPU device, the code generator uses a compute capability value of '5.0'.

For Simulink^® Coder™, the default compute capability value is now '5.0' instead of '3.5'. To change this default value, modify the Compute capability parameter on the Code Generation > GPU Code pane in the Configuration Parameters dialog box. For more information, see Compute capability (Simulink Coder).

R2021a: `unified` memory allocation mode on host being removed

In a future release, the unified memory allocation (cudaMallocManaged) mode will be removed when targeting NVIDIA GPU devices on the host development computer. You can continue to use unified memory allocation mode when targeting NVIDIA embedded platforms.

When generating CUDA code for the host from MATLAB, set the MallocMode property of the coder.gpuConfig code configuration object to 'discrete'.

coder.GpuCodeConfig

Description

Creation

Properties

`Enabled` — Option to generate GPU code
`true` (default) | `false`

`MallocMode` — GPU memory allocation
`"discrete"` (default) | `"unified"`

`KernelNamePrefix` — Custom kernel name prefixes
`''` (default) | character vector | string scalar

`EnableCUBLAS` — Option to replace math function calls with `cuBLAS` library calls
`true` or `1` (default) | `false` or `0`

`EnableCUSOLVER` — Option to replace math function calls to `cuSOLVER` library calls
`true` or `1` (default) | `false` or `0`

`EnableCUFFT` — Option to replace `fft` function calls with `cuFFT` library calls
`true` or `1` (default) | `false` or `0`

`SafeBuild` — Option to check for errors in the generated code
`false` or `0` (default) | `true` or `1`

`ComputeCapability` — Minimum compute capability for generated code
`'Auto'` (default) | `"5.0"` | `"5.2"` | `"5.3"` | `"6.0"` | `"6.1"` | `"6.2"` | `"7.0"` | ...

`CustomComputeCapability` — Name of virtual GPU architecture
`''` (default) | character vector | string scalar

`CompilerFlags` — Additional flags to pass to GPU compiler
`''` (default) | character vector | string scalar

`StackLimitPerThread` — Stack limit per GPU thread
`51200` (default) | integer

`MaximumBlocksPerKernel` — Maximum number of blocks created during kernel launch
`0` (default) | integer

`EnableMemoryManager` — Option to use GPU memory manager
`true` or `1` (default) | `false` or `0`

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`

Examples

Generate CUDA MEX Function

Limitations

Alternative Functionality

App

Version History

R2026a: `Benchmarking` property will be removed

R2026a: `MallocThreshold` property will be removed

R2026a: Default limit of stack memory per GPU thread increased

R2024a: GPU memory manager is enabled by default

R2024a: Configuration parameters related to GPU memory manager are removed

R2024a: Change to default compute capability value in code configuration

R2021a: `unified` memory allocation mode on host being removed

See Also

Functions

Objects

Topics

coder.GpuCodeConfig

Description

Creation

Properties

Enabled — Option to generate GPU code true (default) | false

MallocMode — GPU memory allocation "discrete" (default) | "unified"

KernelNamePrefix — Custom kernel name prefixes '' (default) | character vector | string scalar

EnableCUBLAS — Option to replace math function calls with cuBLAS library calls true or 1 (default) | false or 0

EnableCUSOLVER — Option to replace math function calls to cuSOLVER library calls true or 1 (default) | false or 0

EnableCUFFT — Option to replace fft function calls with cuFFT library calls true or 1 (default) | false or 0

SafeBuild — Option to check for errors in the generated code false or 0 (default) | true or 1

ComputeCapability — Minimum compute capability for generated code 'Auto' (default) | "5.0" | "5.2" | "5.3" | "6.0" | "6.1" | "6.2" | "7.0" | ...

CustomComputeCapability — Name of virtual GPU architecture '' (default) | character vector | string scalar

CompilerFlags — Additional flags to pass to GPU compiler '' (default) | character vector | string scalar

StackLimitPerThread — Stack limit per GPU thread 51200 (default) | integer

MaximumBlocksPerKernel — Maximum number of blocks created during kernel launch 0 (default) | integer

EnableMemoryManager — Option to use GPU memory manager true or 1 (default) | false or 0

SelectCudaDevice — CUDA device selection -1 (default) | deviceID

Examples

Generate CUDA MEX Function

Limitations

Alternative Functionality

App

Version History

R2026a: Benchmarking property will be removed

R2026a: MallocThreshold property will be removed

R2026a: Default limit of stack memory per GPU thread increased

R2024a: GPU memory manager is enabled by default

R2024a: Configuration parameters related to GPU memory manager are removed

R2024a: Change to default compute capability value in code configuration

R2021a: unified memory allocation mode on host being removed

See Also

Functions

Objects

Topics

`Enabled` — Option to generate GPU code
`true` (default) | `false`

`MallocMode` — GPU memory allocation
`"discrete"` (default) | `"unified"`

`KernelNamePrefix` — Custom kernel name prefixes
`''` (default) | character vector | string scalar

`EnableCUBLAS` — Option to replace math function calls with `cuBLAS` library calls
`true` or `1` (default) | `false` or `0`

`EnableCUSOLVER` — Option to replace math function calls to `cuSOLVER` library calls
`true` or `1` (default) | `false` or `0`

`EnableCUFFT` — Option to replace `fft` function calls with `cuFFT` library calls
`true` or `1` (default) | `false` or `0`

`SafeBuild` — Option to check for errors in the generated code
`false` or `0` (default) | `true` or `1`

`ComputeCapability` — Minimum compute capability for generated code
`'Auto'` (default) | `"5.0"` | `"5.2"` | `"5.3"` | `"6.0"` | `"6.1"` | `"6.2"` | `"7.0"` | ...

`CustomComputeCapability` — Name of virtual GPU architecture
`''` (default) | character vector | string scalar

`CompilerFlags` — Additional flags to pass to GPU compiler
`''` (default) | character vector | string scalar

`StackLimitPerThread` — Stack limit per GPU thread
`51200` (default) | integer

`MaximumBlocksPerKernel` — Maximum number of blocks created during kernel launch
`0` (default) | integer

`EnableMemoryManager` — Option to use GPU memory manager
`true` or `1` (default) | `false` or `0`

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`

R2026a: `Benchmarking` property will be removed

R2026a: `MallocThreshold` property will be removed

R2021a: `unified` memory allocation mode on host being removed