Initializing GPU on multiple workers cause an unknown error
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Igor Varfolomeev
am 25 Jun. 2018
Beantwortet: Igor Varfolomeev
am 25 Nov. 2018
I've noticed that the following simple code results in an weird error, if I use R2016b on a machine with two GTX1080Ti and one K2200 :
% start a _new_ Matlab instance first!
parpool(16);
fetchOutputs( parfevalOnAll(@() gather(gpuArray(1)),1) )
The error message I get:
Error using parallel.FevalOnAllFuture/fetchOutputs (line 69)
One or more futures resulted in an error.
Caused by:
Error using parallel.internal.pool.deserialize>@()gather(gpuArray(1))
An unexpected error occurred during CUDA execution. The CUDA error was:
unknown error
<-- repeated multiple times -->
After that, all GPU functionality gets completely broken:
>> a=gpuArray(1)
Error using gpuArray
An unexpected error occurred during CUDA execution. The CUDA error was:
unknown error
Even re-starting Matlab won't help. The fix is to clear the CUDA JIT cache folder, "%USERPROFILE%\AppData\Roaming\NVIDIA\ComputeCache".
However, the following "longer pre-initialization" works OK for me:
% start a _new_ Matlab instance first and clear CUDA JIT cache if there was an error.
gpuDevice(1)
gather(gpuArray(1))
parpool();
fetchOutputs( parfevalOnAll(@() gpuDevice(1),1) )
fetchOutputs(parfevalOnAll(@() gather(gpuArray(1)),1))
AFAIU:
- Matlab R2016b that I use here, was designed for CUDA 7.5, and there are no binaries for CUDA Compute Capability 6.1.
- That's why Matlab uses CUDA JIT to recompile a ton (~400 MB) of stuff when user calls any gpu-related function the first time. (Which also causes many " gpuDevice() is slow " questions.
- There's something wrong with that JIT, if combined with parpool (a race condition?).
My system is: Windows 10, CUDA 8.0 (cuda_8.0.61_win10) with patch 2 (cuda_8.0.61.2_windows), nvidia driver r384.94. The CUDA_CACHE_MAXSIZE environment variable is set to 2147483647.
My questions:
- Is my "longer pre-initialization" workaround actually "safe"? Is it a real workaround for those "race condition"? Or is it as good as the original (might be stable on my specific system, but is likely to fail on some other)? Assuming I have to stay with R2016b for now, targeting CUDA 8.0 and Pascal GPU (building a dll).
- Same code works OK in R2017b-R2018a and above. Is that just because they don't use CUDA JIT here? Or is the real underlying issue actually fixed? (I don't have a device with compute capability >6.x at hand, so I'm unable to check that.)R2017a behaves like R2016b here, even though it claims CUDA 8.0 support - it still writes something (but just ~40MB) to CUDA JIT cache, fails in test #1 and works in test #2.
10 Kommentare
Joss Knight
am 4 Jul. 2018
I had a colleague check their dual GTX 1080 system and they saw no issues, with 16b or with the current version with a forced JIT.
Sounds interesting... But this does not give me the same behaviour - the ComputeCache is still almost empty after running those commands - few KB only. It looks like files are being added and instantly erased. Hmm... Could you please advice - am I doing something wrong here? Were you able to make it populate the ComputeCache?
This works for me but ... possibly only when your card's architecture is the maximum supported or higher, because if it were lower there would be no compatible PTX in the libraries. So you'll need to run R2017a or R2017b for your Pascal card.
It would be good to establish why upgrading MATLAB is not an option for you.
Akzeptierte Antwort
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu GPU Computing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!