Why is GPU Array slow for matrix multiplication

Question

0 Stimmen

when I run the following code:

TRIALS=1;
t = tic;
for i = 1:TRIALS
A = rand(1280,1280,'single','gpuArray');
for iter = 1:100
A = A*A;
end
end
ITERATION_TIME = toc(t)

it only takes 0.0039 seconds consistently. But when I set

TRIALS=5

It takes 4.9892 seconds, a 1000x performance hit for only 5 times the number of iterations.

I realize that gpuTimeit() can be used to time code on the GPU specifically but for the overall program to complete, why doesn't the time scale with the number of trials?

I am running this on a Mac Pro 2014 on an NVIDIA GeForce GT 750M 2048 MB, MATLAB R2016b.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Joss Knight am 24 Apr. 2017

Your first number is wrong, because the computation hadn't actually finished when you called toc. You need to read the doc page on measuring performance on the GPU.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Anmelden, um Aktivität zu verfolgen

Answer 1

Swathik Kurella Janardhan am 21 Apr. 2017

In MATLAB Online öffnen

0 Stimmen

I tried your code on a Windows 10, MATLAB R2016b and only a few attempts I see the 1000x performance hit for 5 Trials and in other attempts I see the execution time as expected it increased only by number of trials.

There are multiple factors which determine a GPU's performance. To measure the GPU performance you can run the below benchmark tests:

>>openExample('distcomp/paralleldemo_gpu_benchmark')
>>paralleldemo_gpu_benchmark

Built in example, will give information on PCI bus speed (send and gather speed), GPU memory read/write and peak calculation performances for double precision matrix multiply.

>>gpuBench

Written by MathWorks Parallel Computing Team and available on file exchange. http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench

Main advantage over the example script is this will do a variety of tests involving both memory and compute intensive tasks in both single and double precision. It will also offer comparison between a relatively normal display card and a reasonable compute card. The performance of these are matched with the version of Matlab run. (gpuBench will soon be updated to cover the missing releases to R2016b).

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

BMWv am 22 Apr. 2017

Thanks for the answer! However I wasn't looking for profiling tools of the GPU - the example code is a simplification of an actual program I need with GPU acceleration. I'm trying to see how to modify the code for when TRIALS is very large - I'd need the time to scale well with TRIALS.

Melden Sie sich an, um zu kommentieren.

Answer 2

Jeffrey Daniels am 30 Jan. 2018

0 Stimmen

@BMWv It really depends on your application. Assuming one trial is not dependent on a previous trial and if you know the whole matrix before you start your iter loop, you could use pagefun and not use the TRIALS loop at all. Your matrix would need to start with size (1280,1280,TRIALS). This should be much faster for large TRIALS. See pagefun documentation for more examples.

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Why is GPU Array slow for matrix multiplication

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Tags

Community Treasure Hunt

Why is GPU Array slow for matrix multiplication

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Tags

Siehe auch

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden