mex cuda code for matrix multiplication

12 Ansichten (letzte 30 Tage)
Chieh-Hsun Wu
Chieh-Hsun Wu am 23 Okt. 2023
Beantwortet: Varun am 30 Okt. 2023
Hello,
I made a simple mex cuda code to calculate multiplication of two matrices of size NxN but never get the same results as in matlab command
C = A*B except for B is a diagonal matrix. Just wondering if I was wrong at any point? The code is attached. Thanks.
#include "mex.h"
#include "cuda_runtime.h"
// CUDA kernel for matrix multiplication
__global__ void matrixMultiplication(const double* A, const double* B, double* C, int N) {
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
if (row < N && col < N) {
double sum = 0;
for (int i = 0; i < N; i++) {
sum += A[row * N + i] * B[i * N + col];
}
C[row * N + col] = sum;
}
}
void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[]) {
/* Declare all variables */
const double* A;
const double* B;
double* C;
double* d_A = nullptr;
double* d_B = nullptr;
double* d_C = nullptr;
int N;
A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]); // Use prhs[1] for matrix B
N = mxGetN(prhs[0]); // Get the number of columns of matrix A
/* Initialize output array */
plhs[0] = mxCreateDoubleMatrix((mwSize)N, (mwSize)N, mxREAL);
C = mxGetPr(plhs[0]);
// Allocate device memory for matrices A, B, and C
cudaMalloc((void**)&d_A, N * N * sizeof(double));
cudaMalloc((void**)&d_B, N * N * sizeof(double));
cudaMalloc((void**)&d_C, N * N * sizeof(double));
// Copy matrices A and B from host to device
cudaMemcpy(d_A, A, N * N * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy(d_B, B, N * N * sizeof(double), cudaMemcpyHostToDevice);
// Define grid and block dimensions
dim3 dimGrid(128, 128);
dim3 dimBlock(1, 1);
// dim3 dimGrid((N + 15) / 16, (N + 15) / 16); // Adjust block size as needed
// dim3 dimBlock(16, 16);
// Launch the CUDA kernel
matrixMultiplication << <dimGrid, dimBlock >> > (d_A, d_B, d_C, N);
cudaDeviceSynchronize();
// Copy the result matrix C from device to host
cudaMemcpy(C, d_C, N * N * sizeof(double), cudaMemcpyDeviceToHost);
// Free device memory
cudaFree(d_A);
cudaFree(d_B);
cudaFree(d_C);
}

Akzeptierte Antwort

Varun
Varun am 30 Okt. 2023
Hi Chieh-Hsun,
Looks like you are trying to calculate matrix multiplication of matrices A and B using "mex" CUDA code but the results are not matching when calculated using MATLAB command window as "C=A*B".
There are two orderings in which a 2D matrix can be flatten into 1D matrix:
  1. Row-major wise: In row-major order, you traverse the 2D matrix row by row, collecting the elements one row at a time. You start with the first row, then move to the second row, and so on until you've visited all the rows.
  2. Column-major wise: In column-major order, you traverse the 2D matrix column by column, collecting the elements one column at a time. You start with the first column, then move to the second column, and so on until you've visited all the columns.
For Example, consider the following matrix:
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
Row-major order: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Column-major order: [1, 4, 7, 2, 5, 8, 3, 6, 9]
So, the CUDA code which you have shared uses row-major wise sorting to flatten 2d matrix "A" and "B" into 1d array. It then calculates matrix multiplication on these 1d arrays and stores the result into new 1d array "C", again it is in row-major order.
But the "mexFunction", by default converts these 2d matrices into 1d arrays using column-major sorting.
Here 1d arrays "A" and "B" is in column-major sorting which was expected to be in row-major sorting.
A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);
And, finally the 1d array "C" is returned to the MATLAB as 2d array and conversion is based on the assumption that "C" is column-major 1d array.
C = mxGetPr(plhs[0]);
To resolve the above issues related to array orderings in the "mex" CUDA code, you can just pass transpose of the matrices "A" and "B" i.e. " A' " and " B' " to the CUDA code and again do transpose of the returned matrix in MATLAB after calling the CUDA code. Please refer to the following code snippets:
C = CUDA_MATRIX_MUL(A', B')
C = C'
%% C is the final answer
Hope it helps.

Weitere Antworten (0)

Kategorien

Mehr zu GPU Computing finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by