Main Content

Code Generation by Using the GPU Coder App

The easiest way to create CUDA® kernels is to place the coder.gpu.kernelfun pragma into your primary MATLAB® function. The primary function is also known as the top-level or entry-point function. When GPU Coder™ encounters the kernelfun pragma, it attempts to parallelize all the computation within this function and then maps it to the GPU. For more information about GPU kernels, see GPU Programming Paradigm.

Learning Objectives

In this tutorial, you learn how to:

  • Prepare your MATLAB code for CUDA code generation by using the kernelfun pragma.

  • Create and set up a GPU Coder project.

  • Define function input properties.

  • Check for code generation readiness and run-time issues.

  • Specify code generation properties.

  • Generate CUDA C code by using the GPU Coder app.

Tutorial Prerequisites

This tutorial requires the following products:

  • MATLAB

  • MATLAB Coder™

  • GPU Coder

  • C compiler

  • NVIDIA® GPU enabled for CUDA

  • CUDA toolkit and driver

  • Environment variables for the compilers and libraries. For more information, see Environment Variables.

Example: The Mandelbrot Set

Description

You do not have to be familiar with the algorithm in the example to complete the tutorial.

The Mandelbrot set is the region in the complex plane consisting of the values z0 for which the trajectories defined by this equation remain bounded at k→∞.

zk+1=zk2+z0,k=0,1,

The overall geometry of the Mandelbrot set is shown in the figure. This view does not have the resolution to show the richly detailed structure of the fringe just outside the boundary of the set. At increasing magnifications, the Mandelbrot set exhibits an elaborate boundary that reveals progressively finer recursive detail.

Algorithm

For this tutorial, pick a set of limits that specify a highly zoomed part of the Mandelbrot set in the valley between the main cardioid and the p/q bulb to its left. A 1000x1000 grid of real parts (x) and imaginary parts (y) is created between these two limits. The Mandelbrot algorithm is then iterated at each grid location. An iteration number of 500 is enough to render the image in full resolution.

maxIterations = 500;
gridSize = 1000;
xlim = [-0.748766713922161,-0.748766707771757];
ylim = [0.123640844894862,0.123640851045266];

This tutorial uses an implementation of the Mandelbrot set by using standard MATLAB commands running on the CPU. This implementation is based on the code provided in the Experiments with MATLAB e-book by Cleve Moler. This calculation is vectorized such that every location is updated simultaneously.

Tutorial Files

Create a MATLAB function called mandelbrot_count.m with the following lines of code. This code is a baseline vectorized MATLAB implementation of the Mandelbrot set. For every point (xGrid,yGrid) in the grid, it calculates the iteration index count at which the trajectory defined by the equation reaches a distance of 2 from the origin. It then returns the natural logarithm of count, which is used generate the color coded plot of the Mandelbrot set. Later in this tutorial, you modify this file to make it suitable for code generation.

function count = mandelbrot_count(maxIterations,xGrid,yGrid)
% mandelbrot computation

z0 = xGrid + 1i*yGrid;
count = ones(size(z0));

z = z0;
for n = 0:maxIterations
    z = z.*z + z0;
    inside = abs(z)<=2;
    count = count + inside;
end
count = log(count);

Create a MATLAB script called mandelbrot_test.m with the following lines of code. The script generates a 1000 x 1000 grid of real parts (x) and imaginary parts (y) between the limits specified by xlim and ylim. It also calls the mandelbrot_count function and plots the resulting Mandelbrot set.

maxIterations = 500;
gridSize = 1000;
xlim = [-0.748766713922161,-0.748766707771757];
ylim = [0.123640844894862,0.123640851045266];

x = linspace(xlim(1),xlim(2),gridSize);
y = linspace(ylim(1),ylim(2),gridSize);
[xGrid,yGrid] = meshgrid(x,y);

%% Mandelbrot computation in MATLAB
count = mandelbrot_count(maxIterations,xGrid,yGrid);

% Show
figure(1)
imagesc(x,y,count);
colormap([jet();flipud(jet());0 0 0]);
axis off
title('Mandelbrot set with MATLAB');

Run the Original MATLAB Code

Run the Mandelbrot Example

Before making the MATLAB version of the Mandelbrot set algorithm suitable for code generation, you can test the functionality of the original code.

  1. Change the current MATLAB working folder to the location that contains mandelbrot_count.m and mandelbrot_test.m. GPU Coder places generated code in this folder. Change your current working folder if you do not have full access to this folder.

  2. Run the mandelbrot_test script.

The test script runs and shows the geometry of the Mandelbrot within the boundary set by the variables xlim and ylim.

Prepare MATLAB Code for Code Generation

Before you generate code with GPU Coder, check for coding issues in the original MATLAB code.

Check for Issues at Design Time

There are two tools that help you detect code generation issues at design time:

  • Code Analyzer tool

  • Code generation readiness tool

The Code Analyzer is a tool incorporated into the MATLAB Editor that continuously checks your code as you enter it. The Code Analyzer reports issues and recommends modifications to maximize performance and maintainability of your code. To identify the warnings and errors specific to code generation from your MATLAB code, add the %#codegen directive to your MATLAB file. For more information, see Code Analyzer preferences.

Note

The Code Analyzer does not detect all code generation issues. After eliminating the errors or warnings that the Code Analyzer detects, compile your code with GPU Coder to determine if the code has other compliance issues.

The code generation readiness tool screens the MATLAB code for features and functions that are not supported for code generation. This tool provides a report that lists issues and recommendations for making the MATLAB code suitable for code generation. You can access the code generation readiness tool in these ways:

  • In the current folder browser — right-click the MATLAB file that contains the entry-point function.

  • At the command line — by using the coder.screener() function.

  • In the GPU Coder app — after specifying the entry-point files, the app runs the Code Analyzer and the code generation readiness tool.

Check for Issues at Code Generation Time

You can use GPU Coder to check for issues at code generation time. When GPU Coder detects errors or warnings, it generates an error report that describes the issues and provides links to the problematic MATLAB code. For more information, see Code Generation Reports.

Make the MATLAB Code Suitable for Code Generation

To begin the process of making your MATLAB code suitable for code generation, use the file mandelbrot_count.m.

  1. Set your MATLAB current folder to the work folder that contains your files for this tutorial.

  2. In the MATLAB Editor, open mandelbrot_count.m. The Code Analyzer message indicator at the top right corner of the MATLAB Editor is green. The analyzer did not detect errors, warnings, or opportunities for improvement in the code.

  3. After the function declaration, add the %#codegen directive to turn on the error checking that is specific to code generation.

    function count = mandelbrot_count(maxIterations,xGrid,yGrid) %#codegen

    The Code Analyzer message indicator remains green, indicating that it has not detected any code generation issues.

  4. To map the mandelbrot_count function to a CUDA kernel, modify the original MATLAB code by placing the coder.gpu.kernelfun pragma in the body of the function.

    function count = mandelbrot_count(maxIterations,xGrid,yGrid) %#codegen
    % Add kernelfun pragma to trigger kernel creation
    coder.gpu.kernelfun;
    
    % mandelbrot computation
    z0 = xGrid + 1i*yGrid;
    count = ones(size(z0));
    
    z = z0;
    for n = 0:maxIterations
        z = z.*z + z0;
        inside = abs(z)<=2;
        count = count + inside;
    end
    count = log(count);

    If you use the coder.gpu.kernelfun pragma, GPU Coder attempts to map the computations in the function mandelbrot_count to the GPU.

  5. Save the file. You are now ready to compile your code by using the GPU Coder app.

Generate Code by Using the GPU Coder App

Open the GPU Coder App

On the MATLAB toolstrip Apps tab, under Code Generation, click the GPU Coder app icon. You can also open the app by typing gpucoder in the MATLAB Command Window. The app opens the Select source files page.

Select Source Files

  1. On the Select source files page, enter or select the name of the primary function, mandelbrot_count. The primary function is also known as the top-level or entry-point function. The app creates a project with the default name mandelbrot_count.prj in the current folder.

  2. Click Next and go to the Define Input Types step. The app analyzes the function for coding issues and code generation readiness. If the app identifies issues, it opens the Review Code Generation Readiness page where you can review and fix issues. In this example, because the app does not detect issues, it opens the Define Input Types page.

Define Input Types

The code generator must determine the data types of all the variables in the MATLAB files at compile time. Therefore, you must specify the data types of all the input variables. You can specify the input data types in one of these two ways:

  • Provide a test file that calls the project entry-point functions. The GPU Coder app can infer the input argument types by running the test file.

  • Enter the input types directly.

For more information about input specifications, see Input Specification.

In this example, to define the properties of the inputs maxIterations, xGrid, and yGrid, specify the test file mandelbrot_test.m:

  1. Enter or select the test file mandelbrot_test.m.

  2. Click Autodefine Input Types.

    The test file mandelbrot_test.m calls the entry-point function, mandelbrot_count.m with the expected input types. The app infers that the input maxIterations is double(1x1) and the inputs xGrid and yGrid are double(1000x1000).

  3. Click Next go to the Check for Run-Time Issues step.

Check for Run-Time Issues

The Check for Run-Time Issues step generates a MEX file from your entry-point functions, runs the MEX function, and reports issues. This step is optional. However, it is a best practice to perform this step. Using this step, you can detect and fix defects that are harder to diagnose in the generated GPU code.

GPU Coder provides the option to perform GPU-specific checks at this point. When you select this option, GPU Coder generates CUDA C code and a MEX file from your entry-point functions, runs the MEX function, and reports issues. Some of the GPU-specific run-time checks include:

  • Checks for register spills.

  • Stack size conformance checks.

Note

There may be certain MATLAB constructs in your code that cause the Check for Run-Time Issues to fail CPU-specific checks but pass the GPU-specific checks.

  1. To open the Check for Run-Time Issues dialog box, click the Check for Issues arrow.

  2. In the Check for Run-Time Issues dialog box, specify a test file or enter code that calls the entry-point function with example inputs. For this example, use the test file mandelbrot_test.m that you used to define the input types.

  3. To enable GPU-specific checks, select the GPU option button. Click Check for Issues.

    The app generates a MEX function. It runs the test script mandelbrot_test replacing calls to mandelbrot_count with calls to the generated MEX. If the app detects issues during the MEX function generation or execution, it provides warning and error messages. You can click these messages to navigate to the problematic code and fix the issue. In this example, the app does not detect issues. The MEX function has the same functionality as the original mandelbrot_count function.

    Note

    There may be certain MATLAB constructs in your code that cause the Check for Run-Time Issues to fail CPU-specific checks but pass the GPU-specific checks.

  4. Click Next go to the Generate Code step.

Generate CUDA Code

  1. To open the Generate dialog box, click the Generate arrow.

  2. In the Generate dialog box, you can select the type of build that you want GPU Coder to perform. The available options are listed in this table.

    Build TypeDescription
    Source code

    CUDA C Source code to integrate with an external project.

    MEX

    Compiled code to run inside MATLAB.

    Static Library

    Binary library for static linking with an external project.

    Dynamic Library

    Binary library for dynamic linking with an external project.

    Executable

    Standalone program (requires a separate main file written in C).

    For this tutorial, set Build type to MEX(.mex). By generating a MEX output, you can check the correctness of the generated CUDA code from within MATLAB. The MEX build type does not require additional settings like Toolchain and Hardware Board. It also does not provide the option to generate only the source code. GPU Coder can automatically select an available CUDA toolchain as long as the Environment Variables are set properly.

    To view advanced options, select More Settings. To the Compiler Flags option, add --fmad=false. This flag, when passed to the nvcc, instructs the compiler to disable Floating-point Multiply-add (FMAD) optimization. This option is set to prevent numerical mismatch in the generated code because of architectural differences between the CPU and the GPU. For more information, see Numerical Differences Between CPU and GPU.

    This table describes the settings specific to GPU Coder.

    GPU Coder Configuration Properties

    UI SettingValue TypeDescription
    Kernel Name Prefix

    String

    Specify custom name prefix for kernel names in the generated code. For example, entering 'CUDA_' creates kernels with names CUDA_kernel1, CUDA_kernel2, and so on. If no name is provided, GPU Coder prepends the kernel name with the name of the entry-point function.

    Kernel names can contain upper-case letters, lowercase letters, digits 0–9, and underscore character _. GPU Coder removes unsupported characters from the kernel names and appends alpha to prefixes that do not begin with an alphabetic letter.

    Malloc Mode

    Enumerated

    'Discrete'|'Unified'

    Selects the type of GPU memory allocation: Discrete or Unified.

    Malloc ThresholdInteger

    Size above which the private variables are allocated on the heap instead of the stack.

    Stack LimitInteger

    Available stack limit per GPU thread.

    Enable cuSOLVER

    Boolean

    'True'|'False'

    Allows GPU Coder to utilize cuSOLVER library calls where appropriate.

    Benchmarking

    Boolean

    'True'|'False'

    Generates CUDA code with benchmarking options such as cudaEvent API to accurately time kernel, memcpy, and other events.

    Safe Build

    Boolean

    'True'|'False'

    Generates code with error-checking for CUDA API and kernel calls.

    Minimum Compute Capability

    Enumerated

    '3.2'|'3.5'|'3.7'|'5.0'|'5.2'|'5.3'|'6.0'|'6.1'

    Select the minimum compute capability for code generation. The compute capability identifies the features supported by the GPU hardware and is used by applications at run time to determine which hardware features, instructions are available on the present GPU. If you specify custom compute capability, GPU Coder ignores this setting.

    Custom Compute Capability

    String

    Specify the name of the NVIDIA virtual GPU architecture for which the CUDA input files must be compiled.

    For example, to specify a virtual architecture type -arch=compute_50. You can specify a real architecture using -arch=sm_50. For more information, see the Options for Steering GPU Code Generation topic in the CUDA toolkit documentation.

    Compiler Flags

    String

    Pass additional flags to the GPU compiler. For example, --fmad=false instructs the nvcc compiler to disable contraction of floating-point multiply and add to a single Floating-Point Multiply-Add (FMAD) instruction.

    For similar NVIDIA compiler options, see the topic on NVCC Command Options in the CUDA toolkit documentation.

    SelectCudaDevice

    Integer

    In a multi GPU environment such as NVIDIA Drive platforms, specify the CUDA device to target.

  3. Click Generate.

    GPU Coder generates the MEX executable mandelbrot_count_mex in your working folder. The <pwd>\codegen\mex\mandelbrot_count folder contains all other the generated files including the CUDA source (*.cu) and header files. The GPU Coder app indicates that the code generation succeeded. It displays the source MATLAB files and generated output files on the left side of the page. On the Variables tab, it displays information about the MATLAB source variables. On the Target Build Log tab, it displays the build log, including compiler warnings and errors. By default, in the code window, the app displays the CUDA source file mandelbrot_count.cu. To view a different file, in the Source Code or Output Files pane, click the file name.

  4. To view the code generation report, click View Report. The report provides links to your MATLAB code and the generated CUDA (*.cu) files. It also provides compile-time information for the variables and expressions in your MATLAB code. This information helps you to find sources of error and warnings. It also helps you to debug code generation issues in your code. For more information, see Code Generation Reports.

    The GPU Kernels section on the Generated Code tab provides a list of kernels created during GPU code generation. The items in this list link to the relevant source code. For example, when you click mandelbrot_count_kernel1, the code section for this kernel is shown in the code browser window.

    After you review the report, you can close the Code Generation Report window. To view the report later, open report.mldatx in <pwd>\codegen\mex\mandelbrot_cout\html folder.

  5. The <pwd>\codegen\mex\mandelbrot_count contains the gpu_codegen_info.mat MAT-file that contains the statistics for the generated GPU code. This MAT-file contains the cuda_Kernel variable that has information about the thread and block sizes, shared and constant memory usage, and input and output arguments of each kernel. The cudaMalloc and cudaMemcpy variables contain information about the size of all the GPU variables and the number of memcpy calls between the host and the device.

  6. In the GPU Coder app, click Next to open the Finish Workflow page.

Review the Finish Workflow Page

The Finish Workflow page indicates that the code generation succeeded. It provides a project summary and links to the MATLAB source files, the code generation report, and the generated output binaries. You can save the configuration parameters of the current GPU Coder project as a MATLAB script. See Convert MATLAB Coder Project to MATLAB Script.

Verify Correctness of the Generated Code

To verify the correctness of the generated MEX file, see Verify Correctness of the Generated Code.

See Also

Apps

Functions

Objects

Related Topics