GPU memory overhead dependent on fft dimension.

Question

D. Plotnick am 2 Jul. 2018

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/408452-gpu-memory-overhead-dependent-on-fft-dimension

Beantwortet: Joss Knight am 2 Jul. 2018

Hello all, I have a question regarding memory management during Matlab's gpuArray/fft operation. I have a large NxM matrix [N = 10E3,M = 20E3, as an approx] where where I wish to take an fft in the M dimension. Now, for CPU operations I would normally permute the matrix to make the fft operation act in the 1st (column) dimension, for speed.

On the GPU, if I run the fft operation in the 1st dimension, I slam into the memory ceiling of my GPU. However, if I apply it in the row dimension I do not. I assume that this has to do with whether Matlab is doing N asynchronous fft's in the row direction, vs. a single massive matrix operation in the column dimension.

So, 4 questions:

Is my assumption true?
Are GPU operations still faster in the column direction (sort of answered this myself, got 3x speed advantage with below snippet.)
Is there a way to know what the GPU memory need will be for the fft? If so, I can try chunking up the fft based on the GPU memory available.
Is there another implementation that will have the speed of the column operation without the memory issues? I am going to try doing this as an arrayfun just to see.

Code snippet:

 x = gpuArray.rand(10000,10000);
xp = x.';
gputimeit(@() fft(x,[],1))
gputimeit(@() fft(xp,[],2))

Thanks all.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

D. Plotnick am 2 Jul. 2018

In MATLAB Online öffnen

As I suspected, arrayfun (at least my way of using it) is way slower.

 f = @(i) fft(x(:,i),[],1);
tic
y = arrayfun(f,1:size(x,2),'UniformOutput',false);
wait(g);
y = cat(2,y{:});
toc

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Joss Knight am 2 Jul. 2018

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/408452-gpu-memory-overhead-dependent-on-fft-dimension#answer_327211

In MATLAB Online öffnen

MATLAB uses cufft, so the behaviour is whatever its behaviour is. The implication of the batching API as described by the doc - https://docs.nvidia.com/cuda/cufft/index.html - is that batches that are contiguous result in multiple kernel launches. This will be slower, but more efficient with memory.

Because the amount of memory an FFT needs is so variable and dependent on signal length, it isn't that valuable to know what the size will be for any particular example. If you're curious you can watch the FreeMemory property output from gpuDevice:

gpu = gpuDevice
gpu.FreeMemory

After an FFT the FFT plan is retained so you should see how much memory it took up (as long as it's the first FFT you do in the MATLAB session). For working memory you can assume there will be a copy of the input, possibly two because MATLAB itself will often take a copy of the input in order to ensure your data is not corrupted in the event of an error.

If you can get your signals to be a power of 2 in length (say, 8192) you'll find them much more efficient with memory.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

GPU memory overhead dependent on fft dimension.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

GPU memory overhead dependent on fft dimension.

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden