The GPU Coder™ Support Package for NVIDIA® GPUs uses the GPU Coder product to generate CUDA® code (kernels) from the MATLAB® algorithm. These kernels run on any CUDA enabled GPU platform. The support package automates the deployment of the generated CUDA code on GPU hardware platforms such as Jetson or DRIVE
In this tutorial, you learn how to:
Prepare your MATLAB code for CUDA code generation by using the
Create and set up a GPU Coder project.
Change settings to connect to the NVIDIA target board.
Generate and deploy CUDA executable on the target board.
Run the executable on the board and verify the results.
Before getting started with this tutorial, it is recommended to familiarize yourself with the GPU Coder App. For more information, see Code Generation by Using the GPU Coder App.
NVIDIA DRIVE or Jetson embedded platform.
Ethernet crossover cable to connect the target board and host PC (if the target board cannot be connected to a local network).
NVIDIA CUDA toolkit installed on the board.
Environment variables on the target for the compilers and libraries. For information on the supported versions of the compilers and libraries and their setup, see Install and Setup Prerequisites for NVIDIA Boards.
GPU Coder for CUDA code generation. For help on getting started with GPU Coder, see Get Started with GPU Coder.
NVIDIA CUDA toolkit on the host.
Environment variables on the host for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-Party Hardware. For setting up the environment variables, see Environment Variables.
This tutorial uses a simple vector addition example to demonstrate the build and
deployment workflow on NVIDIA GPUs. Create a MATLAB function
myAdd.m that acts as the
entry-point for code generation. Alternatively, you can use the
files provided in the Getting Started with the GPU Coder Support Package for NVIDIA GPUs example. The easiest way to
generate CUDA kernels for your MATLAB algorithm is to place the
coder.gpu.kernelfun pragma in the entry-point function. When GPU Coder encounters
kernelfun pragma, it attempts to parallelize all
the computation within this function and then maps it to the GPU.
function out = myAdd(inp1,inp2) %#codegen coder.gpu.kernelfun(); out = inp1 + inp2; end
To generate a CUDA executable that can be deployed to an NVIDIA target, create a custom main wrapper file
main.cu and its
associated header file
main.h. The main file calls the entry-point
function in the generated code. The main file passes a vector containing the first 100
natural numbers to the entry-point function and writes the results to the
myAdd.bin binary file.
To open the GPU Coder app, on the MATLAB toolstrip, in the Apps tab, under Code
Generation, click the GPU Coder app icon. You can also open the app by typing
the MATLAB Command Window.
The app opens the Select source files page. Select
myAdd.m as the entry-point function. Click
In the Define Input Types window, enter
myAdd(1:100,1:100) and click Autodefine Input
Types, then click Next.
You can initiate the Check for Run-Time Issues process or click Next to go to the Generate Code step.
Set the Build type to
the Hardware Board to
Click More Settings, on the Custom Code
panel, enter the custom main file
main.cu in the field for
Additional source files. The custom main file and the header file
must be in the same location as the entry-point file.
Under the Hardware panel, enter the device address, user name, password, and build folder for the board.
The GPU code configuration object uses the default compute capability value specified in the GPU Code pane. To use the complete set of features supported by your CUDA GPU and to reduce numerical mismatches, set the Compute Capability to a value that matches your GPU specifications.
Close the Settings window and click Generate. The software generates CUDA code and deploys the executable to the folder specified. Click Next and close the app.
In the MATLAB command window, use the
method of the hardware object to start the executable on the target hardware.
hwobj = jetson; pid = runApplication(hwobj,'myAdd');
### Launching the executable on the target... Executable launched successfully with process ID 26432. Displaying the simple runtime log for the executable...
Copy the output bin file
myAdd.bin to the MATLAB environment on the host and compare the computed results with the simulation
results from MATLAB.
outputFile = [hwobj.workspaceDir '/myAdd.bin'] getFile(hwobj,outputFile); % Simulation result from the MATLAB. simOut = myAdd(0:99,0:99); % Read the copied result binary file from target in MATLAB. fId = fopen('myAdd.bin','r'); tOut = fread(fId,'double'); diff = simOut - tOut'; fprintf('Maximum deviation is: %f\n', max(diff(:)));
Maximum deviation between MATLAB Simulation output and GPU coder output on Target is: 0.000000