CUDA kernel MaxThreadsPerBlock not constant
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
Martin Strambach
am 30 Jan. 2020
Beantwortet: Edric Ellis
am 3 Feb. 2020
I create a CUDA kernel using KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC). Block size is computed from KERN.MaxThreadsPerBlock which may vary based on a function which is used to build the kernel. I presumed MaxThreadsPerBlock is only dependent on gpuDevice properties. So far, it seems there might be some connection to number of function parameters. Can someone explain how this is actually determined or am I missing something?
I'm using Matlab 2019b, GCC 8.3, CUDA Toolkit 10.1 with NVidia V100 (CC 7.0).
2 Kommentare
Joss Knight
am 2 Feb. 2020
I can't work out how you'd see this for the same device. Can you post some reproduction code?
Akzeptierte Antwort
Edric Ellis
am 3 Feb. 2020
In your comment you mention that you see different values of MaxThreadsPerBlock for different kernels. This is expected. The CUDAKernel object builds on the underlying CUDA Driver API. Different kernel functions have different requirements in terms of shared memory, registers, and other resources, and this affects how many threads per block can be launched. This is described (briefly) in the CUDA Driver reference documentation here: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g5e92a1b0d8d1b82cb00dcfb2de15961b (In case that link goes stale - it describes the function cuFuncGetAttribute which allows you to query the CUDA attribute CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK).
0 Kommentare
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu GPU Computing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!