MATLAB Coder: Matrix-Scalar-Multiplication slower in generated code?
8 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Stefan
am 10 Okt. 2018
Kommentiert: Stefan
am 29 Okt. 2018
We are generating DLLs and MEX-Files from MATLAB-Code and realized that Matrix*Scalar-Operations take 3-10 times longer in generated code in comparison to the original MATLAB code. Can anyone explain this slowdown?
I wrote two toy-functions for Matrix*Scalar and Matrix*Vector. For the latter, execution time was the same in the orginial and the generated code. The matrix size was [1000x1000] for both cases.
Interestingly, the MEX calls BLAS-library for Matrix*Vector but not for Matrix*Scalar. May this be a reason?
Toy function for Matrix*Scalar:
function [MatrixOut] = MatrixScalar_Function(MatrixIn,ScalarIn)
MatrixOut = MatrixIn;
for index = 1:1000
MatrixOut = MatrixOut*ScalarIn;
end
end
Generated C-Code for Matrix*Scalar:
/*
* MatrixScalar_Function.cpp
*
* Code generation for function 'MatrixScalar_Function'
*
*/
/* Include files */
#include "rt_nonfinite.h"
#include "MatrixScalar_Function.h"
#include "MatrixScalar_Function_data.h"
/* Function Definitions */
void MatrixScalar_Function(const emlrtStack *sp, const real_T MatrixIn[1000000],
real_T ScalarIn, real_T MatrixOut[1000000])
{
int32_T b_index;
int32_T i0;
memcpy(&MatrixOut[0], &MatrixIn[0], 1000000U * sizeof(real_T));
b_index = 0;
while (b_index < 1000) {
for (i0 = 0; i0 < 1000000; i0++) {
MatrixOut[i0] *= ScalarIn;
}
b_index++;
if (*emlrtBreakCheckR2012bFlagVar != 0) {
emlrtBreakCheckR2012b(sp);
}
}
}
/* End of code generation (MatrixScalar_Function.cpp) */
0 Kommentare
Akzeptierte Antwort
James Tursa
am 11 Okt. 2018
Bearbeitet: James Tursa
am 11 Okt. 2018
I'm only seeing about a 5% difference in timing when comparing the BLAS dscal function call to an explicit loop in my R2017b Win64. Certainly not the 3-10 times difference that you are seeing. You might try replacing that loop with a dscal call and see what you get in your case. E.g., replace this
for (i0 = 0; i0 < 1000000; i0++) {
MatrixOut[i0] *= ScalarIn;
}
with something like this
#include "blas.h"
:
int64_T n, incx; <-- or maybe int32_T in your case
:
incx = 1;
n = 1000000;
dscal( &n, &ScalarIn, MatrixOut, &incx );
But I do see a big difference in timing when compared to the m-code. My guess is that perhaps the BLAS dscal routine is not multi-threaded and that is why the timing is nearly the same as a manual loop, but MATLAB uses a multi-threaded scalar multiply routine in the background for the m-code.
3 Kommentare
Ryan Livingston
am 23 Okt. 2018
Bearbeitet: Ryan Livingston
am 23 Okt. 2018
MATLAB Coder supports generating OpenMP code for parfor loops:
When I change your for to a parfor and regenerate a MEX file, I see s performance improvement. Generally the MATLAB execution is still faster but the numbers are closer.
When generating standalone code (e.g. a DLL) make sure you set the build configuration to Faster Runs to enable C/C++ compiler optimizations:
More info on optimizing for performance with MATLAB Coder is available in the documentation:
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu MATLAB Coder finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!