reset(gpuDevice) does not work

Question

Renu Dhadwal am 18 Aug. 2016

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/299970-reset-gpudevice-does-not-work

Bearbeitet: hewayda hew am 14 Jan. 2023

When I run the following code for values of n<5000 it runs just fine.

reset(gpuDevice);
n=5000;
a=gpuArray(rand(n));
b=gpuArray(rand(n));
tic
t=a'*a;
c=t\(a*b');
toc

But when I run it for n=5000 i get the error "Error using \ Call to Double LU on GPU failed with error status: unspecified launch failure."

If I try running the program again for any small value of n I get the error

"Error using parallel.gpu.CUDADevice/reset

An unexpected error pccured during CUDA execution. The CUDA error was " all CUDA -capable devices are busy or unavailable"

Also, if I execute the following command

g=gpuDevice;
disp(g.FreeMemory)

I get the answer to be NAN

I am unable to run the reset(gpuDevice) command. It gives the same error as above.

2 Kommentare
Keine anzeigenKeine ausblenden

Walter Roberson am 18 Aug. 2016

Which MATLAB version are you using, and which operating system, and which GPU are you using? Also which gpu driver version do you have installed?

arnold am 20 Aug. 2016

In MATLAB Online öffnen

Hi,

I was just now looking for this error, I have a similar problem on a machine at work. I tried using

class(a)
ans = 
   gpuArray
b = medfilt2(a,[9,9]);  
Error using medfilt2gpumex
Failure in GPU implementation.
unspecified launch failure.
Error in gpuArray/medfilt2 (line 37)
b = medfilt2gpumex(varargin{:});

Filter sizes [7,7] and smaller work but 9 upwards gives this error. After that, the gpuDevice also shows

availableMemory: NaN

From this I can't use the GPU anymore without restarting Matlab. This is too bad since the GPU is 20 times faster doing this kind of calculations.

Setup:

Matlab 2016a
Windows 10 Pro 64 (all updates)
Intel 5960X
64GB RAM
GTX1080 with 372.54 (newest driver).

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Alison Eele am 25 Aug. 2016

4
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/299970-reset-gpudevice-does-not-work#answer_232727

In MATLAB Online öffnen

I think you are experiencing the symptoms of a kernel execution time-out. If the GPU is connected to a monitor (or in Windows the GPU is running in WDDM mode) then the operating system imposes a kill time out on any operation taking place on the GPU. The intention of this timeout is to allow screen display to continue. When this kill takes place on a MATLAB process using the GPU it disrupts our connection to the GPU and typically requires a restart of MATLAB to fix.

You can find out if a kernel time-out is in place on your GPU by executing the gpuDevice command in MATLAB. One of the properties listed will be:

KernelExecutionTimeout: 0

If this is 0 then there is no execution timeout being applied to that card. If it is 1 then the operating system is imposing a timeout (the exact timeout varies by operating system).

Ways to work around the issue:

If possible do computation in smaller pieces to avoid the timeout.
If there are multiple GPU cards in the computer and the computer is Windows then some NVIDIA cards can be switched from WDDM (display) to TCC (compute) mode using the nvidia-smi utility. TCC cards do not have an execution timeout. You cannot connect a display to a TCC mode card.
In Windows it is possible to lengthen the timeout using registry edits though as with all registry edits this should be done with care. https://msdn.microsoft.com/en-us/Library/Windows/Hardware/ff569918(v=vs.85).aspx

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

arnold am 26 Aug. 2016

Bearbeitet: arnold am 26 Aug. 2016

In MATLAB Online öffnen

Hi Alison,

Thank you, you're right again. After causing the error by using i.e. 'medfilt2', typing 'gpuDevice' gives:

...
KernelExecutionTimeout: 1
...

As far as I can tell by GTX Cards are not TCC capable. Now that's a shame, so adding other dedicated GPUs for the display output would not help to get NVidia GTX cards up and running for for Matlab tasks like that.

I'm no GPU/CUDA expert so I only played with it a little. I'm a bit confused what's happening under the hood and how I could take advantage of what you call 'computation in smaller pieces'. Trying a matrix inversion using

b = inv(gpuArray(rand(10000)));

takes much longer (9s) to finish than it takes for 'medfilt2' to produce the error. So I can only speculate that the matrix inversion is already split up into smaller kernel tasks which take up less computational time than 2s (standard TDR limit) and thus causes no error. medfilt2 on the other hand crashes the card (causes kernel execution time-out) in 2s. So apparently different gpu enabled Matlab functions cause different kernel times by splitting computational tasks up under the hood - at least this is my simplistic explanation for this behaviour. I love Matlab because it usually is high level enough to take care of such things for the user. In this case having to go an rewrite medfilt2 or split up data every time one is unsure of how much kernel time it might take would seem to cause a lot of overhead. Too much for me and my students. We mainly use Matlab to avoid exactly that, utilising its plethora of readily available functionality without having to dig this deep. So unless you could think of a foolproof way to maybe pause GPU computation every 1.999s in order to not trigger this error the only solution I see would be to set TDR to a reasonably long time and hope for the best. Deactivating it entirely doesn't seem wise as I can't see a way to kill the task if it's running indefinitely.

Not everybody can purchase 10000$ cards for the same 10Tflops (single precision) a consumer card can offer.

I really think that it would greatly add to Matlab's usability with GPU compute if some sort of failsafe took care of this, even if it came at a small performance loss. A quick ask around the lab revealed a handful of people who had experienced this in windows (not only with matlab) and then concluded that GPU compute was not high level enough for them. Most (including me) would not go and purchase the 10000$ card without having tested similar code on a lesser card, but that is NVidia's problem. I guess the market is too small (still).

thanks very much for your very helpful input!

Alison Eele am 2 Sep. 2016

Hi Arnold, Renu,

The ability to split up larger computations into smaller pieces is very application specific. Some operations could be effectively tiled across a large matrix but others cannot. Tiled or element wise calculations is something that GPU computing often excels at and would fall into the 'use smaller blocks' option for avoiding the timeout.

As Arnold's experiments indicate the kernel timeout applies to a single kernel level operation. So whilst the total GPU computation time appears above the 2 seconds limit, the smaller kernels called as part of the GPU computation might still be below the limit and you see no problem.

TCC driver mode in Windows is as you identified limited to a few high level cards, normally chosen for their 'suitability' for scientific computation I believe the only GTX cards supported are from the Titan range. I had hoped they had included the Geforce 1080 as standard with the new generation.

arnold am 25 Sep. 2016

Hi Alison,

I think most of my computations should be splittable into parallel tasks as I do a lot of element wise computations of image stacks. Can you hint me in the right direction as to how to split that up, maybe with blockfunctions?

regards Arnold

Melden Sie sich an, um zu kommentieren.

Answer 2

Yahya Zakaria mohamed am 29 Jun. 2017

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/299970-reset-gpudevice-does-not-work#answer_272286

Thank You. I faced the same problem, I disconnected the second monitor and no error appeared.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 3

Ricardo de Azevedo am 19 Nov. 2019

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/299970-reset-gpudevice-does-not-work#answer_402312

Bearbeitet: Ricardo de Azevedo am 21 Nov. 2019

In MATLAB Online öffnen

I am facing the same problem now training an RNN and have tried both the TdrDelay to longer and the TdrLevel to 0.

Error:

Error using gpuArray/gatherAn unexpected error occurred during CUDA execution. The CUDA error was:CUDA_ERROR_LAUNCH_FAILED

The weird thing is the network trains for a while and then crashes, I can't really tell what triggers it.

(Using Matlab 2019b and latest NVIDIA drivers 441.20 for GTX 1080 Ti)

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Ricardo de Azevedo am 6 Apr. 2020

I desisted as I had other things to do and couldn't follow up.

Mathworks Support Sent me this:

After conferring with colleagues in development, there are a few steps we can take to narrow down the issue.

If you are able to get a minimum set of data and code that reproduces the issue, that would be the easiest way to see what is causing this error.

Try reducing the 'MiniBatchSize' all the way down to 1 to see if the issue still occurs

Find out where the error actually occurred. One easy way to do this is to run with profiling switched on by calling the following command before running the script:

>> profile on

This should cause the CUDA error to be thrown after the line of code where the issue occurred.

giorgio toscana am 7 Apr. 2020

Hi Ricardo,

I will try them.

If the problem persists i'll contact the support with those info.

Thank you very much for your quick reply.

Melden Sie sich an, um zu kommentieren.

Answer 4

D.W. Moyar am 12 Jan. 2022

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/299970-reset-gpudevice-does-not-work#answer_873455

I have encountered a scenario where resetting the GPU device clearly does not work, leading to GPU out of memory errors. I was training LSTM networks in a loop. Each loop trained a new LSTM with different features. Every time I ran the program, the loop would run without incident for 20 iterations and then produce a GPU out of memory error. I tried resetting the GPU between loop iterations, deleting the training and target data variables between iterations, and pausing after the GPU reset. None of these efforts worked. The only way I was able to get the loop to run was to save all the variables every 15 iterations, clear the entire workspace, and reload the variables. Perhaps this issue is hardware related? I have a GeForce 1080 Ti graphics card.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 5

hewayda hew am 14 Jan. 2023

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/299970-reset-gpudevice-does-not-work#answer_1148715

Bearbeitet: hewayda hew am 14 Jan. 2023

I have the same problem

Error using gpuArray/gather

Encountered unexpected error during CUDA execution. The CUDA error was:

CUDA_ERROR_LAUNCH_TIMEOUT.

what is the proper solution?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

reset(gpuDevice) does not work

2 Kommentare
Keine anzeigenKeine ausblenden

Akzeptierte Antwort

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Weitere Antworten (4)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

reset(gpuDevice) does not work

2 Kommentare Keine anzeigenKeine ausblenden

Akzeptierte Antwort

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Weitere Antworten (4)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

2 Kommentare
Keine anzeigenKeine ausblenden

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden