Kernels from Library Calls
GPU Coder™ supports libraries optimized for CUDA® GPUs such as cuBLAS, cuSOLVER, cuFFT, Thrust, cuDNN, and TensorRT libraries.
- The cuBLAS library is an implementation of Basic Linear algebra Subprograms (BLAS) on top of the NVIDIA® CUDA run time. It allows you to access the computational resources of the NVIDIA GPU. 
- The cuSOLVER library is a high-level package based on the cuBLAS and cuSPARSE libraries. It provides useful features such as common matrix factorization and triangular solve routines for dense matrices, a sparse least-squares solver, and an Eigenvalue solver. 
- The cuFFT library provides a high-performance implementation of the Fast Fourier Transform (FFT) algorithm on NVIDIA GPUs. The cuBLAS, cuSOLVER, and cuFFT libraries are part of the NVIDIA CUDA Toolkit. 
- Thrust is a C++ template library for CUDA. The Thrust library is shipped with CUDA Toolkit and allows you to take advantage of GPU-accelerated primitives such as sort to implement complex high-performance parallel applications. 
- The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The NVIDIA TensorRT is a high performance deep learning inference optimizer and runtime library. For more information, see Code Generation for Deep Learning Networks by Using cuDNN and Code Generation for Deep Learning Networks by Using TensorRT. 
GPU Coder does not require a special pragma to generate kernel calls to libraries. During
      the code generation process, when you select the Enable cuBLAS option in
      the GPU Coder app or use config_object.GpuConfig.EnableCUBLAS = true
      property in CLI, GPU Coder replaces some functionality with calls to the cuBLAS library. When
      you select the Enable cuSOLVER option in the GPU Coder app or use config_object.GpuConfig.EnableCUSOLVER = true
      property in CLI, GPU Coder replaces some functionality with calls to the cuSOLVER library. For GPU Coder to replace high-level math functions to library calls, the following conditions
      must be met:
- GPU-specific library replacement must exist for these functions. 
- MATLAB® Coder™ data size thresholds must be satisfied. 
GPU Coder supports cuFFT, cuSOLVER, and cuBLAS library replacements for the functions listed in the table. For functions that do not have replacements in CUDA, GPU Coder uses portable MATLAB functions that are mapped to the GPU.
| MATLAB Function | Description | MATLAB Coder LAPACK Support | cuBLAS, cuSOLVER, cuFFT, Thrust Support | 
|---|---|---|---|
| 
 | Matrix multiply | Yes | Yes | 
| 
 | Solve system of linear equation  | Yes | Yes | 
| 
 | LU matrix factorization | Yes | Yes | 
| 
 | Orthogonal-triangular decomposition | Yes | Partial | 
| 
 | Matrix determinant | Yes | Yes | 
| 
 | Cholesky factorization | Yes | Yes | 
| 
 | Reciprocal condition number | Yes | Yes | 
| 
 | Solve system of linear equations  | Yes | Yes | 
| 
 | Eigenvalues and eigen vectors | Yes | No | 
| 
 | Schur decomposition | Yes | No | 
| 
 | Singular value decomposition | Yes | Partial | 
| 
 | Fast Fourier Transform | Yes | Yes | 
| 
 | Inverse Fast Fourier Transform | Yes | Yes | 
| Sort array elements | Yes, using  | 
When you select the Enable cuFFT option in the GPU Coder app or use config_object.GpuConfig.EnableCUFFT = true
      property in CLI, GPU Coder maps fft,ifft,fft2,ifft2,fftn.ifftn function calls in your
        MATLAB code to the corresponding cuFFT library calls. For 2-D transforms and higher,
        GPU Coder creates multiple 1-D batched transforms. These batched transforms have higher
      performance than single transforms. GPU Coder only supports out-of-place transforms. If Enable cuFFT is
      not selected, GPU Coder uses C FFTW libraries where available or generates kernels
      from portable MATLAB FFT. Both single and double precision data types are supported. Input and output
      can be real or complex-valued, but real-valued transforms are faster. cuFFT library support
      input sizes that are typically specified as a power of 2 or a value that can be factored into
      a product of small prime numbers. In general the smaller the prime factor, the better the
      performance. 
Note
Using CUDA library names such as cufft, cublas, and
          cudnn as the names of your MATLAB function results in code generation errors.
See Also
coder.gpu.kernel | coder.gpu.kernelfun | gpucoder.matrixMatrixKernel | coder.gpu.constantMemory | stencilfun | gpucoder.sort