Why is subscripted assignment so inefficient?

Why is MATLAB's subscripted assignment so abysmally inefficient with more than 2 subscripts? e.g.,
>> clear, a = []; tic, for j = 1e3:-1:1, for k = 1e3:-1:1, a(j,k,1) = 1; end, end, toc
Elapsed time is 0.193743 seconds.
>> clear, a = []; tic, for j = 1e3:-1:1, for k = 1e3:-1:1, a(j,k) = 1; end, end, toc
Elapsed time is 0.004241 seconds.

1 Kommentar

Interesting. I thought for a moment that it might be due to the extra multiplication and addition needed each time, but when I rewrite the code in terms of linear indexing as-if a 3D array, then I do not see nearly that much performance drop.

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Matt J
Matt J am 5 Feb. 2022
Bearbeitet: Matt J am 5 Feb. 2022
I would guess it's only because you are using 3 subscripts on an array that only has 2 dimensions. This forces Matlab to to do extra work, namely running a check to make sure that in a(j,k,m) the 3rd subscript is m=1 and simplifying the subscripts from (j,k,1) to (j,k). When a actualy does have 3 dimensions, the times are much more comparable:
a=zeros(1e3,1e3,2);
tic,
for j = 1e3:-1:1
for k = 1e3:-1:1
a(j,k,1) = 1;
end
end
toc
Elapsed time is 0.019444 seconds.
a=zeros(1e3,1e3);
tic,
for j = 1e3:-1:1
for k = 1e3:-1:1
a(j,k) = 1;
end
end
toc
Elapsed time is 0.016890 seconds.

7 Kommentare

Matt J
Matt J am 5 Feb. 2022
Bearbeitet: Matt J am 5 Feb. 2022
This forces Matlab to to do extra work, namely running a check to make sure that in a(j,k,m) the 3rd subscript is m=1 and simplifying the subscripts from (j,k,1) to (j,k)
Additonally, these extra demands are exacerbated by the fact that you are doing these operations in a loop. The performance difference is neglible when the indexing is vectorized:
clear a
j = 1e3:-1:1;
k = 1e3:-1:1;
tic,
a(j,k,1)=1;
toc
Elapsed time is 0.012739 seconds.
clear a
tic,
a(j,k)=1;
toc
Elapsed time is 0.006345 seconds.
Oddly, time with the 2d vectorized form is inconsistent, with it being common that there are bursts where it is much slower than the 3d. The below graph is not typical, in that it shows that sometimes 3D is much slower -- which does happen from time to time, but more often in my testing, the 3d was at most only a little slower.
It is all pretty irregular.
Oh, and it does make a difference whether you are assigning to (j,k,1) or (j,k,2) . I'll show that result in a moment.
N = 50;
t2 = zeros(1,N); t3 = zeros(1,N);
j = 1e3:-1:1;
k = 1e3:-1:1;
for K = 1 : N; t2(K) = timeit(@()index2(j,k), 0); end
for K = 1 : N; t3(K) = timeit(@()index3(j,k), 0); end
plot(1:N, t2, 1:N, t3)
legend({'2d', '3d'})
function index2(j,k)
a(j,k) = 1;
end
function index3(j,k)
a(j,k,1)=1;
end
This shows us that if you have a 2d array that is initialized as 2d and you use 2d indexing into it, then that is much faster than an array that is initialized as 3d and you use 2d indexing into it. But once you have a 3d array, then it is very inconsistent as to whether 2d indexing or 3d indexing is faster.
Next step would be to measure whether just allocating 2d vs 3d is significantly different.
N = 50;
t21 = zeros(1,N); t2 = zeros(1,N); t3 = zeros(1,N);
j = 1e3:-1:1;
k = 1e3:-1:1;
for K = 1 : N; t21(K) = timeit(@()index21(j,k), 0); end
for K = 1 : N; t2(K) = timeit(@()index2(j,k), 0); end
for K = 1 : N; t3(K) = timeit(@()index3(j,k), 0); end
plot(1:N, t21, 1:N, t2, 1:N, t3)
legend({'2d init 2', '2d init 3', '3d'})
function index21(j,k)
a = zeros(1e3,1e3,1);
a(j,k) = 1;
end
function index2(j,k)
a = zeros(1e3,1e3,2);
a(j,k) = 1;
end
function index3(j,k)
a = zeros(1e3,1e3,2);
a(j,k,2)=1;
end
I believe the implication of these plots is that the time required for the function is fairly much linear in the amount of memory. I guess I had expected there to be more overhead.
N = 50;
t2d = zeros(1,N); t2dx2 = zeros(1,N); t2dtwice = zeros(1,N); t3 = zeros(1,N);
for K = 1 : N; t2d(K) = timeit(@()init2d(), 0); end
for K = 1 : N; t2dx2(K) = timeit(@()init2dx2(), 0); end
for K = 1 : N; t2dtwice(K) = timeit(@()init2dtwice(), 0); end
for K = 1 : N; t3(K) = timeit(@()init3d(), 0); end
plot(1:N, [t2d; t2dx2; t2dtwice; t3].')
legend({'1000x1000', '1000x2000', '1000x1000,twice', '1000x1000x2'})
function init2d
a = zeros(1e3,1e3);
end
function init2dx2
a = zeros(1e3,2e3);
end
function init2dtwice
a = zeros(1e3,1e3);
b = zeros(1e3,1e3);
end
function init3d
a = zeros(1e3,1e3,2);
end
Well this is interesting...
When I was allocating zeros(1000,1000,M) then at M = 5, the time dropped to practically zero. That didn't make sense to me at first -- not until I remembered that beyond some size, MATLAB does not allocate backing memory for zeros() until you store into the array (you can even get circumstances where you can allocate a zero array larger than the amount of available RAM.)
So I switched over to ones(1000,1000,M), and promptly starting having problems with execution time exceeding what the Answers facility permits. I reduced N (number of tries at the same size) from 50 to 30, which put me back into play. But around M = 8 I saw a sharp increase in allocation time. I reduced from 1000,1000,M to 500,500,M and increased M from 10 to 20, and you can see there is a sharp increase in allocation time around M = 16 -- cut the rows and columns in half, doubled the number of panes, and the critical point doubled. But... if it were purely based upon the amount of memory, the critical point should have gone to 4 times higher (around 32) instead of double. This calls for further testing.
N = 30;
M = 20;
t = zeros(N,M);
for m = 1 : M; for n = 1 : N; t(n,m) = timeit(@() init3d(m)); end; end
whos t
Name Size Bytes Class Attributes t 30x20 4800 double
semilogy(t);
legend("500x500x" + (1:M))
title('absolute time')
tmean = mean(t,1);
plot(tmean ./ tmean(1))
title('relative mean time')
function init3d(M)
a = ones(500,500,M);
end
Unfortunately I am running into time limits a lot in exploring the timing of allocations. With linear increasing allocations, if we assume approximately linear time, then we run into the 1t+2t+3t+4t+...n*t = t*n*(n+1)/2 problem, that the time needed to explore the suspected linear relation in a linear manner, requires squared time. I think I might need to switch to exponential increases in memory sizes to probe more closely.
N = 4;
G = 6;
start = datetime('now');
%timing pushed into a function to allow for potential JIT
t = run_timing(N, G);
stop = datetime('now');
elapsed = seconds(stop - start)
elapsed = 51.7211
plot(t);
xlabel('try #'); ylabel('seconds')
legend((1:G) + "G")
title('absolute time')
tmean = mean(t,1);
plot(tmean)
xlabel(' gigabytes'); ylabel('mean seconds')
p = polyfit(1:G, tmean, 1);
fprintf('roughly %g seconds + %g seconds per gigabyte\n', p(2), p(1))
roughly -0.00468463 seconds + 0.616974 seconds per gigabyte
function t = run_timing(N,G)
t = zeros(N,G);
for g = 1 : G
for n = 1 : N
tic();
initgig(g);
t(n,g) = toc;
end
end
end
%gigabyte 1073741824
%megabyte 1048576
function initgig(M)
a = ones(1,1073741824*M,'uint8');
end
There is a lot of noise in these measurements, and it is difficult to tell what is meaningful or not.
I skip the first few runs in the plot because in nearly all runs there is a bad peak early on.
The second plot here is the more interesting one, showing the log of how the time to allocate increases with amount requested. Noise affects it a lot, so I can only describe trends that I encountered a fair bit:
  • time to allocate 1 byte starts off a bit higher
  • time to allocate a quite small number of bytes but more than 1, falls a bit, then rises a bit.
  • there does seem to be a peak at 256 bytes
  • and then it does seem to fall a bit
  • until at 2^14 there seems to be a distinct rise. This would correspond to 16384 bytes. I suspect that either 4096 or 16384 bytes is the size of an entry in the "small block pool"
  • in some of the plots, there was a leveling from 2^14 to 2^16; the more fine-grained I draw the less I see this, so it might have been a statistical artificat
  • from 2^16 onward, the log grows pretty much linearly as log2() of number of bytes increases. To phrase that a different way: from 2^16 onward, the allocation time grows linearly with number of bytes allocated. The actual boundary might possibly be 2^14
  • so, below 2^14, there appear to be at least two different allocation strategies, but the boundaries between them are a bit difficult to find. If there were only one allocation strategy for that range, using a free block pool with entries of size 2^14, then you would expect the time to be constant up to that point, but there do seem to be peaks.
  • I seem to recall that 256 bytes is the limit below which for some constant allocations and some colon expressions, that MATLAB keeps a copy around as an optimization, to be handed out when encountering another request with the same spelling (spacing even being important!). Perhaps that is why we see a blip at 256??
If someone cares to do a study of how small block allocation is done in MATLAB, I suggest 2^14 as the upper limit to study (and remember to take into account those new hidden copies that MATLAB recently starting taking for small constants.)
N = 50;
G = 2^24;
start = datetime('now');
%timing pushed into a function to allow for potential JIT
[t, gvals] = run_timing(N, G);
stop = datetime('now');
elapsed = seconds(stop - start);
display(elapsed)
elapsed = 0.5941
figure(1)
%tfilt = filloutliers(t, 'center','movmedian', 3, 1);
startat = 6;
semilogy(startat:N, t(startat:end,:));
xlabel('try #'); ylabel('seconds')
title('absolute time')
%legend("2^{"+string(gvals)+"}", 'NumColumns',4);
figure(2)
tmean = mean(t(startat:end,:),1);
semilogy(gvals, tmean)
xt = 1:ceil(max(gvals));
xticks(xt)
xticklabels("2^{"+string(xt)+"}");
xtickangle(45)
xlabel('bytes')
ylabel('mean seconds')
plin = polyfit(2.^gvals, tmean, 1);
%plog = polyfit(log2(gvals), log2(tmean),1);
fprintf('linear: roughly %g seconds + %g seconds per gigabyte\n', plin(2), plin(1)*2^30);
linear: roughly -4.90643e-08 seconds + 0.0490081 seconds per gigabyte
%fprintf('log: roughly %g seconds + %g seconds per gigabyte\n', 2.^plog(2), plog(1));
function [t,gvals] = run_timing(N,G)
%wb = waitbar(0, '0G');
%cleanMe = onCleanup(@()delete(wb));
gvals = 0:.1:ceil(log2(G));
ng = length(gvals);
t = zeros(N,ng);
for g = 1 : ng
gp = gvals(g);
gv = round(2.^gp);
%waitbar(0, wb, "2^{"+string(gp)+"}");
for n = 1 : N
tic();
initmem(gv);
t(n,g) = toc;
%waitbar(n/N, wb);
end
end
end
%gigabyte 1073741824
%megabyte 1048576
function initmem(M)
ma = ones(1,M,'uint8');
end

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Startup and Shutdown finden Sie in Hilfe-Center und File Exchange

Produkte

Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by