Poor performance of mex file
34 views (last 30 days)
I am trying to speed up a code using Matlab coder, but the resulting mex code is actually slower than the m-file. The simplified version of the code that I am trying to speed up is attached below. I am wondering:
- Why the mex function is so much slower than the m-file function?
- Is there any way to improve performance of my code using Matlab coder?
Any hep would be much appreciated.
function v_d = choice_model_nop(q,v_exp,wealth,b_grid_choice,k_grid_choice,nz,nk,nb)
v_d = coder.nullcopy(zeros(nz,nk,nb));
parfor ii = 1:nz
q_ini_ii = reshape(q(ii,:,:),,nk);
v_ini_exp_ii = reshape(v_exp(ii,:,:),,nk);
choice = q_ini_ii.*b_grid_choice - k_grid_choice;
for jj = 1:nk
for kk = 1:nb
% dividends at time t
d = wealth(ii,jj,kk) + choice;
% choosing optimal consumption
vec_d = d + v_ini_exp_ii.*(d>0);
v_d(ii,jj,kk) = max(vec_d,,'all');
When I run the above code as an m-file I get:
>> timeit( @() choice_model(q,v_exp,wealth,b_grid_choice,k_grid_choice,nz,nk,nb))
When I use matlab coder to convert the above code into a mex-file (without using JIT to use parallelization and with all speed options unchecked) I obtain
timeit( @() choice_model_mex(q,v_exp,wealth,b_grid_choice,k_grid_choice,nz,nk,nb))
So the mex-function is actually slower. When I actually used squeeze instead of reshape the prerformance of the m-file was unaffected by mex-file was twice as low as using reshape...
In my testing q, v_exp,wealth are 61-by-61-by-61 arrays, b_grid_choice and k_grid_choice are 61-by-61 matrices, and nz,nk,nb are constants equal to 61.
So I am left wondering why the mex code is slower than the m-file. Is it because a large fraction of the time is spent using in-built matlab function? Is there a way to speed it up?
Yair Altman on 11 Apr 2020
You m-function uses parallelization (parfor), while your mex file does not. This in itself could cause your m-code to run much faster than mex, depending on the parallelization speedup. Moreover, you said that you turned off the speedup options in the Matlab Coder, effectively forcing it to run in "crippled" mode.
To compare apples to apples, you need to (1) disable m-code parallelization by replacing parfor with a regular for loop, and (2) enable all the speedup options in the coder. My guess is that you will then see significant speedup of mex vs. m-code.
In case you want to employ parallelization in both your m-code and mex code, you'll need to use a compiler that supports the OpenMP parallelization pragmas that the Matlab Coder generates (many compilers do NOT support them). Also rememeber to turn on all the performance options in your C compiler.