Solving sparse matrix on GPU and memory problems (even with free memory available)

Question

0 Stimmen

Hi there,

I am running a backslash computation A\b where A is an array of type sparse and b is a vector, both stored as gpuArrays. It works fine for small matrices, but the following error is given for a matrix A of order 2e5:

_ Error using \ The GPU failed to allocate memory. To continue, reset the GPU by running 'gpuDevice(1)'. If this problem persists, partition your computations into smaller pieces._

KGR_gpu=gpuArray(KGR);
FR_gpu=gpuArray(FR);
sol=KGR_gpu\FR_gpu;

This same computation works fine when gpuArrays are not used. In fact, the size of A is only 50Mb and the GPU is a Nvidia GTX 1050, 4Gb. Following is the result of a gpuDevice call after this error:

   CUDADevice with properties:
                      Name: 'GeForce GTX 1050'
                     Index: 1
         ComputeCapability: '6.1'
            SupportsDouble: 1
             DriverVersion: 9.2000
            ToolkitVersion: 8
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 4.2950e+09
           AvailableMemory: 3.3949e+09
       MultiprocessorCount: 5
              ClockRateKHz: 1493000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

It seems that this is an issue with the backslash operator since both A and b can be stored on the GPU without problems.

Any thoughts on how to solve large sparse systems on the GPU? Thanks!

Regards, Paulo

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Walter Roberson am 21 Mai 2018

0 Stimmen

The result of \ between two sparse arrays is generally a dense array. For example sprand(1000,1000,.01) \ sprand(1000,1000,.01) gave me a result with fill fraction of 0.997031

2 Kommentare
Keine anzeigen Keine ausblenden

Paulo Ribeiro am 21 Mai 2018

Dear Walter, thanks. But b is a vector. So it's a sparse array A and a vector b, both stored in the GPU. Perhaps this specific GPU is not able to perform these operations using double arrays? I know that single performance works great, but cant be used with sparse.

Walter Roberson am 21 Mai 2018

In MATLAB Online öffnen

>> t = sprand(1000,1000,.01) \ sprand(1000,1,.01); nnz(t)./numel(t)
ans =
     1
>> whos t
  Name         Size            Bytes  Class     Attributes
    t         1000x1             16016  double    sparse

Notice this is twice the storage size that would be required for a non-sparse array with the same number of elements, due to the overhead of storing sparse arrays.

Melden Sie sich an, um zu kommentieren.

Answer 2

Joss Knight am 21 Mai 2018

In MATLAB Online öffnen

0 Stimmen

Hi Paulo. I must admit, I'm not extremely familiar with the behaviour of the matrix factorization we use to implement the sparse direct solve; however it wouldn't surprise me if the result is quite dense. It might be interesting to (on the CPU) look at the density of the QR factors using the qr function, for your particular input. Certainly, when I did it on a random matrix with 10% fill, the Q factor was nearly completely dense and R factor was 50%.

>> A = sprand(1000, 1000, 0.1);
>> [Q,R] = qr(A);
>> nnz(Q)/numel(Q)
ans =
    0.9903
>> nnz(R)/numel(R)
ans =
    0.5005

For LU, both factors are 50% dense.

Obviously random sparse matrices don't properly reflect the structure of real sparse matrices, so your problem would be different. But it's not unreasonable to surmise that the intermediate factors might be very large.

To circumvent such problems a normal approach would be to use an iterative solver like gmres, bicg, pcg, cgs, lsqr etc. It is not uncommon for these to converge quicker than the direct solve can, especially if you can give them a good preconditioner.

9 Kommentare
7 ältere Kommentare anzeigen 7 ältere Kommentare ausblenden

Paulo Ribeiro am 22 Mai 2018

Bearbeitet: Paulo Ribeiro am 22 Mai 2018

In MATLAB Online öffnen

Thanks Joss,

Very nice. I am at this present time trying an iteractive solver combined with the GPU. Would you recommend any additional steps to this code?

L=ichol(KGR); 
sol= pcg(KGR_gpu,FR_gpu,1e-1,1e6,L,L');

For the time being this same code runs whenever matrix KGR and vector FR are not stored at the GPU. But for some reason, it does not work with gpuArrays:

Error using gpuArray/pcg (line 58) When the first input argument is a sparse matrix, the second preconditioner cannot be a matrix. Use functions for both preconditioners, or multiply the precondition matrices into one matrix.

By following the recommendation for a product of the preconditioners it follows:

_ _ _ Undefined function 'itermsg' for input arguments of type 'char'.

Error in gpuArray/pcg>iPcgBuiltinWrapper (line 119) itermsg('pcg',tol,maxit,0,flag,iter,NaN);

Error in gpuArray/pcg (line 63) [varargout{1:nargout}] = iPcgBuiltinWrapper(varargin{:});_____________

Thanks!

Regards,

Paulo

Paulo Ribeiro am 12 Jun. 2018

Thanks Joss,

For this specific case the pcg performs better. No further improvement was achieved using a preconditioner. What still amazes me is the fact that a 0.20Gb matrix can't be solved using the backslash operator in a nominal 4Gb GPU. On the other hand, while monitoring the CPU version of this code (on Windows task manager), I can see that MATLAB bites almost 4Gb of RAM during the backslash operation. That might explain something.

I wonder how far can you get even with high-end GPU's.

Regards,

Paulo

Walter Roberson am 12 Jun. 2018

I seem to recall that some memory is required to marshal the data on the GPU, which uses arrays in a different order than MATLAB uses. I do not recall the details at the moment, but potentially you might not be able to create an output array larger than half of your GPU memory.

Melden Sie sich an, um zu kommentieren.

Solving sparse matrix on GPU and memory problems (even with free memory available)

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (2)

2 Kommentare
Keine anzeigen Keine ausblenden

9 Kommentare
7 ältere Kommentare anzeigen 7 ältere Kommentare ausblenden

Kategorien

Produkte

Tags

Community Treasure Hunt

Solving sparse matrix on GPU and memory problems (even with free memory available)

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (2)

2 Kommentare Keine anzeigen Keine ausblenden

9 Kommentare 7 ältere Kommentare anzeigen 7 ältere Kommentare ausblenden

Kategorien

Produkte

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden

9 Kommentare
7 ältere Kommentare anzeigen 7 ältere Kommentare ausblenden