Wrong result when calling CBLAS dgemv function in a mex file

7 views (last 30 days)
Code is as follows,
extern "C" {
#include "cblas.h"
#include "mex.h"
}
#include<iostream>
using namespace std;
void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
if(nrhs!=2) {
mexErrMsgTxt("Two input required.");
} else if(nlhs>1){
mexErrMsgTxt("Too many outputs!");
}
mwSignedIndex row0 = mxGetM(prhs[0]);
mwSignedIndex col0 = mxGetN(prhs[0]);
mwSignedIndex row1 = mxGetM(prhs[1]);
mwSignedIndex col1 = mxGetN(prhs[1]);
if(col1!=1){
mexErrMsgTxt("Column vector required!");}
if(col0!=row1){
mexErrMsgTxt("Dimension mismatch!");}
double* A = mxGetPr(prhs[0]);
double* b = mxGetPr(prhs[1]);
plhs[0] = mxCreateDoubleMatrix(row0,1,mxREAL);
double* y = mxGetPr(plhs[0]);
cout << row0 << "," << col0 <<endl;
ptrdiff_t b_stride = 1;
ptrdiff_t y_stride = 1;
cblas_dgemv(CblasColMajor, CblasNoTrans,
row0, col0,
1.0,
A, col0,
b, b_stride,
0.,
y, y_stride);
}
Then, if I call,
mv([1,2;3,4;5,6;7,8],[3;4])
it just give result,
ans =
0
0
0
0
However, it give correct result for 2 by 2 matrice,
mv([1,2;3,4;],[3;4])
ans =
11
25
Very confused, if I call cblas_dgemv in cpp and compile it with Visual Studio, everything just works well.
Any suggestion and help is appreciated!
I compile the Git version of OpenBLAS. Since it works well programming in C++, I believe there is nothing to do with the compiled library.
  2 Comments
Jan
Jan on 26 Mar 2018
Edited: Jan on 26 Mar 2018
It would be easier if you explain, how you compile the code: I'm not using CBLAS, but the standard BLAS libraries shipped with Matlab usually. This does not allow the smart CblasColMajor method, such that I cannot reproduce your problem directly.
Maybe this is the bug:
cblas_dgemv(CblasColMajor, CblasNoTrans,
row0, col0,
1.0,
A, row0, % Not: col0
b, b_stride,
0.,
y, y_stride);

Sign in to comment.

Answers (1)

Jan
Jan on 26 Mar 2018
Edited: Jan on 26 Mar 2018
This does not answer your question directly, but this is a C-Mex function to call DGEMM and DGEMV:
#include "mex.h"
// Management of trailing underscore in BLAS and LAPACK calls:
#if defined(__WINDOWS__) || defined(WIN32) || defined(_WIN32) || defined(_WIN64)
#define BLASCALL(f) f
#else
#define BLASCALL(f) f ## _
#endif
// Integers in BLAS calls:
// #define int_BLAS ptrdiff_t
#define int_BLAS mwSignedIndex
int BLASCALL(dgemm)(char *transa, char *transb, int_BLAS *m, int_BLAS *n,
int_BLAS *k, double *alpha, double *a, int_BLAS *lda,
double *b, int_BLAS *ldb, double *beta, double *c,
int_BLAS *ldc);
int BLASCALL(dgemv)(char *transa, int_BLAS *m, int_BLAS *n,
double *alpha, double *a, int_BLAS *lda,
double *b, int_BLAS *incb, double *beta, double *c,
int_BLAS *incc);
void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
int_BLAS mA, nA, mB, nB, i1 = 1;
double *A, *B, *C;
double d1 = 1.0, d0 = 0.0;
if (nrhs != 2) {
mexErrMsgTxt("Two inputs required.");
} else if (nlhs > 1) {
mexErrMsgTxt("Too many outputs!");
}
mA = (int_BLAS) mxGetM(prhs[0]);
nA = (int_BLAS) mxGetN(prhs[0]);
mB = (int_BLAS) mxGetM(prhs[1]);
nB = (int_BLAS) mxGetN(prhs[1]);
if (nA != mB) {
mexErrMsgTxt("Dimension mismatch!");
}
A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);
plhs[0] = mxCreateDoubleMatrix(mA, nB, mxREAL);
C = mxGetPr(plhs[0]);
// C = A * B:
if (nB == 1) {
BLASCALL(dgemv) ("N", &mA, &nA,
&d1, A, &mA, B, &i1, &d0, C, &i1);
} else {
BLASCALL(dgemm) ("N", "N", &mA, &nB,
&nA, &d1, A, &mA, B, &mB,
&d0, C, &mA);
}
return;
}
Compile this by:
mex -O DGEMM.c libmwblas.lib -largeArrayDims
For matix input, this has the same speed as calling the matrix multiplication in Matlab directly, while for matrix-vector multiplication Matlab is faster:
A = rand(200, 300);
B = rand(300, 200);
for k = 1:100; C = A*B; end; % Warmup!
tic; for k = 1:1000; C = A*B; end; toc
tic; for k = 1:1000; C = DGEMM(A, B); end; toc
A = rand(200, 300);
b = rand(300, 1);
tic; for k = 1:10000; C = A*b; end; toc
tic; for k = 1:10000; C = DGEMM(A, b); end; toc
Matlab R2016b, Core2Duo, Win7:
Elapsed time is 3.166592 seconds.
Elapsed time is 3.148328 seconds.
Elapsed time is 0.263720 seconds.
Elapsed time is 0.304909 seconds.
By the way:
  • A*B and DGEMM(A,B) use multiple cores.
  • Defining alpha for alpha*A * B let the C-Mex function be 2% faster than Matlab.
  • It took me 20 minutes to find out, that the default -compatibleArrayDims let the BLAS library crash, because it worked sometimes by accident. My conclusion: Linear Algebra functions are implemented efficiently in Matlab and I do not spend time in performing them in C.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by