gpuDevice command very slow
Ältere Kommentare anzeigen
I am running CUDA kernels using the parallel computing toolbox and r2012a. Recently upgraded to a 600 series (Kepler) gpu. To setup the CUDA kernel we extract the maximum threads per block using: gpu_han=gpuDevice(1); k = parallel.gpu.CUDAKernel('gpu_tfm_linear_arb.ptx', gpu_tfm_linear_arb.cu'); k.ThreadBlockSize = gpu_han.MaxThreadsPerBlock;
This is now executing very slowly (order 2mins). If I specify the threadblocksize manually to the max of the card (1024 in this case), it executes in 0.1 s.
This used to run quickly with a 400 series card. Any help gratefully received
Akzeptierte Antwort
Weitere Antworten (2)
Andrei Pokrovsky
am 15 Sep. 2016
Bearbeitet: Andrei Pokrovsky
am 15 Sep. 2016
3 Stimmen
Try setting these env vars:
export CUDA_CACHE_MAXSIZE=2147483647
export CUDA_CACHE_DISABLE=0
This cured the problem on my GTX1080.
https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-understand-fat-binaries-jit-caching/
Anthony
am 17 Jun. 2013
0 Stimmen
2 Kommentare
Edric Ellis
am 18 Jun. 2013
The cache is not stored where the program lives, this page from NVIDIA has all the gory details, including this:
- on Windows, %APPDATA%\NVIDIA\ComputeCache,
- on MacOS, $HOME/Library/Application\ Support/NVIDIA/ComputeCache,
- on Linux, ~/.nv/ComputeCache
Anthony
am 12 Jul. 2013
Kategorien
Mehr zu GPU Computing finden Sie in Hilfe-Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!