MATLAB Answers

Compatibility Matlab & GPU coder Compute Capability 8.6 RTX 3070

39 views (last 30 days)
Good morning,
I recently bought an RTX 3070 and was trying to make use of it by generating CUDA code via the GPU coder. The card works, but I have noticed two things. I have Matlab 2021a, the latest nvidia drivers, all the required programs for GPU coder to work (as explained in https://es.mathworks.com/help/gpucoder/gs/install-prerequisites.html ) and the "coder.checkGpuInstall" command shows the following (see attached .txt).
(i) When running GPU bench, the results seem to indicate that the single precision TFLOPS are about half of the cards theoretical value (please see figure enclosed). In contrast, other third party tools like CUDA-Z (also below) show that the card has about 22 TFLOPS. Does this mean that Matlab is currently using half the CUDA cores per SM? Am I missing something obvious?
Figure 1: Matlab's GPU bench results
Figure 2: Cuda-Z results
(ii) I was trying to profile the GPU coder generated code by following the steps in https://es.mathworks.com/help/gpucoder/ug/gpucoder-execution-profiling-report.html (in fact, by running the code in "C:\ (...) \Documents\MATLAB\Examples\R2021a\gpucoder\GPUExecutionProfilingOfTheGeneratedCodeExample") and I am getting the following error message:
"
Error using gpucoder.profile (line 41)
Error setting property 'ComputeCapability' of class 'GpuConfig': Invalid value '8.6'.
Allowed values are:
3.2, 3.5, 3.7, 5.0, 5.2, 5.3, 6.0, 6.1, 6.2, 7.0, 7.2, 7.5, 8.0
"
Would this mean again that the compute capability 8.6 is yet not supported?
I have tried downloading the Matlab 2021b prerelease but unfortuntaely it does not install properly (the files for matlab are in the directory but the launcher does not appear anywhere. When I launch the .exe within the files I get an error (unfortunately I don't have it now to show you)).
Thank you in advance for your help, I hope my question was clear and concise. This is my first question so feedback on how to improve is very welcome.
Best,

Accepted Answer

Nathan Malimban
Nathan Malimban on 19 Jul 2021
Hi Marco,
I can address the second part of your question for now.
Yes, MATLAB R2021a does not support CC 8.6 (because the CUDA version supported by R2021a does not support CC 8.6.) It seems like the error occurs because, in the absence of a user-specified CC, the profiler code picks up the default CC from the machine. This may not be the correct default value to choose, however; I can create an internal report so we can look into this further. In the meantime, could you try the following workaround? It specifies the CC explicitly.
gpucoder.profile(designFileName, inputs, 'GpuConfigurationOptions', {'ComputeCapability', '8.0'});
designFileName is the name of your design file, and inputs is a cell array of inputs, as per the example you reference.
  3 Comments
Marco Irisarri
Marco Irisarri on 20 Jul 2021
Hello Nathan,
Thank you for your reply and for looking into it. I understand, then I will wait until Matlab supports this CC 8.6.
Thank you again for the help.
Best,

Sign in to comment.

More Answers (1)

Joss Knight
Joss Knight on 22 Jul 2021
Regarding the gpuBench results: no, MATLAB is definitely not only using half the cores! What you are seeing is the raw performance of SGEMM in NVIDIA's cublas library in CUDA 11.0. My understanding is that on compute capability 8.6 devices, cublas is still undergoing considerable optimisation; and indeed we see that confirmed with some improvements when upgrading to CUDA 11.2 (for which you'll have to wait until next year).
However, the performance of MTIMES still does not reach the theoretical maximum and perhaps it never will. If you click on your result for single precision MTIMES in the gpuBench report it will take you to the graph and you'll see that the performance peaks and flattens out at a certain matrix size. It may be that on these devices memory bandwidth starts to become more of a bottleneck for larger sizes. In CUDA-Z the benchmark no doubt simply runs floating point operations inside a kernel without any input or output data at all. This is great for testing raw compute power, not so useful for working out how fast the card is at doing something genuinely useful.
We're going to continue investigating this to see if we can get any more information on why cublas performance isn't as good as expected for these cards, and whether there is anything you can do.
  1 Comment
Marco Irisarri
Marco Irisarri on 5 Aug 2021
Dear Joss,
Thank you very much for your answer and sorry for the very late reply. I now understand the situation better thanks to your explanation.
I was surprised at first because the new RTX 3000 series graphics cards (3070,3080,3090...) were reaching about half their theoretical performance in the posts I saw in the GPUbench forum (even the GDDR6X 3080 and 3090 I believe) while previous generation ones were reaching close to their theoretical performance.
From your explanation however I understand that there are optimisations that need to be done on CC 8.6. I am looking forward to seeing those optimisations in Matlab. Thank you for continuing to investigate on the issue, I would gladly follow any advice you might have.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by