Profile Generated CUDA MEX Functions Using Performance Analyzer
This example shows how to profile generated CUDA® MEX files by using the GPU Performance Analyzer. For more information on the MATLAB® code that this example uses, see Fog Rectification.
Third-Party Prerequisites
- CUDA enabled NVIDIA® GPU and compatible driver. 
Verify GPU Environment
To verify that the compilers and libraries necessary for running this example are set up correctly, use the coder.checkGpuInstall function.
envCfg = coder.gpuEnvConfig('host');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);Define Entry-Point Function
Define the entry-point function fog_rectification that take a foggy image as input and returns a defogged image.
type fog_rectification.mfunction [out] = fog_rectification(input) %#codegen
%
%   Copyright 2017-2023 The MathWorks, Inc.
coder.gpu.kernelfun;
% restoreOut is used to store the output of restoration
restoreOut = zeros(size(input),"double");
% Changing the precision level of input image to double
input = double(input)./255;
%% Dark channel Estimation from input
darkChannel = min(input,[],3);
% diff_im is used as input and output variable for anisotropic 
% diffusion
diff_im = 0.9*darkChannel;
num_iter = 3;
% 2D convolution mask for Anisotropic diffusion
hN = [0.0625 0.1250 0.0625; 0.1250 0.2500 0.1250;
 0.0625 0.1250 0.0625];
hN = double(hN);
%% Refine dark channel using Anisotropic diffusion.
for t = 1:num_iter
    diff_im = conv2(diff_im,hN,"same");
end
%% Reduction with min
diff_im = min(darkChannel,diff_im);
diff_im = 0.6*diff_im ;
%% Parallel element-wise math to compute
%  Restoration with inverse Koschmieder's law
factor = 1.0./(1.0-(diff_im));
restoreOut(:,:,1) = (input(:,:,1)-diff_im).*factor;
restoreOut(:,:,2) = (input(:,:,2)-diff_im).*factor;
restoreOut(:,:,3) = (input(:,:,3)-diff_im).*factor;
restoreOut = uint8(255.*restoreOut);
%%
% Stretching performs the histogram stretching of the image.
% im is the input color image and p is cdf limit.
% out is the contrast stretched image and cdf is the cumulative
% prob. density function and T is the stretching function.
% RGB to grayscale conversion
im_gray = im2gray(restoreOut);
[row,col] = size(im_gray);
% histogram calculation
[count,~] = imhist(im_gray);
prob = count'/(row*col);
% cumulative Sum calculation
cdf = cumsum(prob(:));
% Utilize gpucoder.reduce to find less than particular probability.
% This is equal to "i1 = length(find(cdf <= (p/100)));", but is 
% more GPU friendly.
% lessThanP is the preprocess function that returns 1 if the input
% value from cdf is less than the defined threshold and returns 0 
% otherwise. gpucoder.reduce then sums up the returned values to get 
% the final count.
i1 = gpucoder.reduce(cdf,@plus,"preprocess", @lessThanP);
i2 = 255 - gpucoder.reduce(cdf,@plus,"preprocess", @greaterThanP);
o1 = floor(255*.10);
o2 = floor(255*.90);
t1 = (o1/i1)*[0:i1];
t2 = (((o2-o1)/(i2-i1))*[i1+1:i2])-(((o2-o1)/(i2-i1))*i1)+o1;
t3 = (((255-o2)/(255-i2))*[i2+1:255])-(((255-o2)/(255-i2))*i2)+o2;
T = (floor([t1 t2 t3]));
restoreOut(restoreOut == 0) = 1;
u1 = (restoreOut(:,:,1));
u2 = (restoreOut(:,:,2));
u3 = (restoreOut(:,:,3));
% replacing the value from look up table
out1 = T(u1);
out2 = T(u2);
out3 = T(u3);
out = zeros([size(out1),3], "uint8");
out(:,:,1) = uint8(out1);
out(:,:,2) = uint8(out2);
out(:,:,3) = uint8(out3);
end
function out = lessThanP(input)
p = 5/100;
out = uint32(0);
if input <= p
    out = uint32(1);
end
end
function out = greaterThanP(input)
p = 5/100;
out = uint32(0);
if input >= 1 - p
    out = uint32(1);
end
end
Approach 1: Generate and Profile MEX without Instrumentation
Generate a CUDA MEX for the fog_rectification function by running the codegen command. Do not supply any additional options to the codegen command other than -config and -args. The generated code does not contain profiling instrumentation.
cfg = coder.gpuConfig('mex'); inputImg = imread('foggyInput.png'); codegen -config cfg -args {inputImg} fog_rectification.m
Code generation successful: View report
Start the GPU profiler by running the gpuprofile command. Run the generated MEX twice and view the profiling results.
gpuprofile on fog_rectification_mex(inputImg); fog_rectification_mex(inputImg); gpuprofile viewer
### Starting profiling data processing ### Profiling data processing finished ### Showing profiling data
The GPU Performance Analyzer report shows CPU overhead and GPU activities for the two MEX executions. Because there is no profiling instrumentation, the Functions and Loops rows are empty.

Approach 2: Generate and Profile MEX with Instrumentation
To add profiling instrumentation to the generated MEX, run the codegen command again with the -gpuprofile option.
cfg = coder.gpuConfig('mex'); inputImg = imread('foggyInput.png'); codegen -config cfg -args {inputImg} fog_rectification.m -gpuprofile
Code generation successful: View report
Start the GPU profiler by running the gpuprofile command. Run the generated MEX twice and view the profiling results.
gpuprofile on fog_rectification_mex(inputImg); fog_rectification_mex(inputImg); gpuprofile viewer
### Starting profiling data processing ### Profiling data processing finished ### Showing profiling data
The GPU Performance Analyzer now shows the Functions and Loops events.

Approach 3: Generate and Profile MEX Using gpuPerformanceAnalyzer Function
You can also generate and profile the MEX by passing a MEX configuration object to the gpuPerformanceAnalyzer function.
cfg = coder.gpuConfig('mex'); inputImg = imread('foggyInput.png'); gpuPerformanceAnalyzer('fog_rectification.m', {inputImg}, Config=cfg);
### Starting GPU code generation Code generation successful: View report ### GPU code generation finished ### Starting application profiling ### Application profiling finished ### Starting profiling data processing ### Profiling data processing finished ### Showing profiling data
When you profile the MEX using the gpuPerformanceAnalyzer function, you can also view the generated code and trace the events to the code in the Performance Analyzer report.
