Why is subscripted assignment so inefficient?

Question

3 Stimmen

Why is MATLAB's subscripted assignment so abysmally inefficient with more than 2 subscripts? e.g.,

>> clear, a = []; tic, for j = 1e3:-1:1, for k = 1e3:-1:1, a(j,k,1) = 1; end, end, toc

Elapsed time is 0.193743 seconds.

>> clear, a = []; tic, for j = 1e3:-1:1, for k = 1e3:-1:1, a(j,k) = 1; end, end, toc

Elapsed time is 0.004241 seconds.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Walter Roberson am 5 Feb. 2022

Interesting. I thought for a moment that it might be due to the extra multiplication and addition needed each time, but when I rewrite the code in terms of linear indexing as-if a 3D array, then I do not see nearly that much performance drop.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Matt J am 5 Feb. 2022

Bearbeitet: Matt J am 5 Feb. 2022

In MATLAB Online öffnen

0 Stimmen

I would guess it's only because you are using 3 subscripts on an array that only has 2 dimensions. This forces Matlab to to do extra work, namely running a check to make sure that in a(j,k,m) the 3rd subscript is m=1 and simplifying the subscripts from (j,k,1) to (j,k). When a actualy does have 3 dimensions, the times are much more comparable:

a=zeros(1e3,1e3,2);
tic, 
for j = 1e3:-1:1
    for k = 1e3:-1:1 
        a(j,k,1) = 1; 
    end 
end 
toc
Elapsed time is 0.019444 seconds.
a=zeros(1e3,1e3);
tic, 
for j = 1e3:-1:1
    for k = 1e3:-1:1 
        a(j,k) = 1; 
    end 
end
toc
Elapsed time is 0.016890 seconds.

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Walter Roberson am 6 Feb. 2022

In MATLAB Online öffnen

Well this is interesting...

When I was allocating zeros(1000,1000,M) then at M = 5, the time dropped to practically zero. That didn't make sense to me at first -- not until I remembered that beyond some size, MATLAB does not allocate backing memory for zeros() until you store into the array (you can even get circumstances where you can allocate a zero array larger than the amount of available RAM.)

So I switched over to ones(1000,1000,M), and promptly starting having problems with execution time exceeding what the Answers facility permits. I reduced N (number of tries at the same size) from 50 to 30, which put me back into play. But around M = 8 I saw a sharp increase in allocation time. I reduced from 1000,1000,M to 500,500,M and increased M from 10 to 20, and you can see there is a sharp increase in allocation time around M = 16 -- cut the rows and columns in half, doubled the number of panes, and the critical point doubled. But... if it were purely based upon the amount of memory, the critical point should have gone to 4 times higher (around 32) instead of double. This calls for further testing.

N = 30;

M = 20;

t = zeros(N,M);

for m = 1 : M; for n = 1 : N; t(n,m) = timeit(@() init3d(m)); end; end

whos t

Name Size Bytes Class Attributes t 30x20 4800 double

semilogy(t);

legend("500x500x" + (1:M))

title('absolute time')

tmean = mean(t,1);

plot(tmean ./ tmean(1))

title('relative mean time')

function init3d(M)

a = ones(500,500,M);

end

Walter Roberson am 6 Feb. 2022

In MATLAB Online öffnen

Unfortunately I am running into time limits a lot in exploring the timing of allocations. With linear increasing allocations, if we assume approximately linear time, then we run into the 1t+2t+3t+4t+...n*t = t*n*(n+1)/2 problem, that the time needed to explore the suspected linear relation in a linear manner, requires squared time. I think I might need to switch to exponential increases in memory sizes to probe more closely.

N = 4;

G = 6;

start = datetime('now');

%timing pushed into a function to allow for potential JIT

t = run_timing(N, G);

stop = datetime('now');

elapsed = seconds(stop - start)

elapsed = 51.7211

plot(t);

xlabel('try #'); ylabel('seconds')

legend((1:G) + "G")

title('absolute time')

tmean = mean(t,1);

plot(tmean)

xlabel(' gigabytes'); ylabel('mean seconds')

p = polyfit(1:G, tmean, 1);

fprintf('roughly %g seconds + %g seconds per gigabyte\n', p(2), p(1))

roughly -0.00468463 seconds + 0.616974 seconds per gigabyte

function t = run_timing(N,G)

t = zeros(N,G);

for g = 1 : G

for n = 1 : N

tic();

initgig(g);

t(n,g) = toc;

end

%gigabyte 1073741824

%megabyte 1048576

function initgig(M)

a = ones(1,1073741824*M,'uint8');

end

Walter Roberson am 6 Feb. 2022

In MATLAB Online öffnen

There is a lot of noise in these measurements, and it is difficult to tell what is meaningful or not.

I skip the first few runs in the plot because in nearly all runs there is a bad peak early on.

The second plot here is the more interesting one, showing the log of how the time to allocate increases with amount requested. Noise affects it a lot, so I can only describe trends that I encountered a fair bit:

time to allocate 1 byte starts off a bit higher
time to allocate a quite small number of bytes but more than 1, falls a bit, then rises a bit.
there does seem to be a peak at 256 bytes
and then it does seem to fall a bit
until at 2^14 there seems to be a distinct rise. This would correspond to 16384 bytes. I suspect that either 4096 or 16384 bytes is the size of an entry in the "small block pool"
in some of the plots, there was a leveling from 2^14 to 2^16; the more fine-grained I draw the less I see this, so it might have been a statistical artificat
from 2^16 onward, the log grows pretty much linearly as log2() of number of bytes increases. To phrase that a different way: from 2^16 onward, the allocation time grows linearly with number of bytes allocated. The actual boundary might possibly be 2^14
so, below 2^14, there appear to be at least two different allocation strategies, but the boundaries between them are a bit difficult to find. If there were only one allocation strategy for that range, using a free block pool with entries of size 2^14, then you would expect the time to be constant up to that point, but there do seem to be peaks.
I seem to recall that 256 bytes is the limit below which for some constant allocations and some colon expressions, that MATLAB keeps a copy around as an optimization, to be handed out when encountering another request with the same spelling (spacing even being important!). Perhaps that is why we see a blip at 256??

If someone cares to do a study of how small block allocation is done in MATLAB, I suggest 2^14 as the upper limit to study (and remember to take into account those new hidden copies that MATLAB recently starting taking for small constants.)

N = 50;

G = 2^24;

start = datetime('now');

%timing pushed into a function to allow for potential JIT

[t, gvals] = run_timing(N, G);

stop = datetime('now');

elapsed = seconds(stop - start);

display(elapsed)

elapsed = 0.5941

figure(1)

%tfilt = filloutliers(t, 'center','movmedian', 3, 1);

startat = 6;

semilogy(startat:N, t(startat:end,:));

xlabel('try #'); ylabel('seconds')

title('absolute time')

%legend("2^{"+string(gvals)+"}", 'NumColumns',4);

figure(2)

tmean = mean(t(startat:end,:),1);

semilogy(gvals, tmean)

xt = 1:ceil(max(gvals));

xticks(xt)

xticklabels("2^{"+string(xt)+"}");

xtickangle(45)

xlabel('bytes')

ylabel('mean seconds')

plin = polyfit(2.^gvals, tmean, 1);

%plog = polyfit(log2(gvals), log2(tmean),1);

fprintf('linear: roughly %g seconds + %g seconds per gigabyte\n', plin(2), plin(1)*2^30);

linear: roughly -4.90643e-08 seconds + 0.0490081 seconds per gigabyte

%fprintf('log: roughly %g seconds + %g seconds per gigabyte\n', 2.^plog(2), plog(1));

function [t,gvals] = run_timing(N,G)

%wb = waitbar(0, '0G');

%cleanMe = onCleanup(@()delete(wb));

gvals = 0:.1:ceil(log2(G));

ng = length(gvals);

t = zeros(N,ng);

for g = 1 : ng

gp = gvals(g);

gv = round(2.^gp);

%waitbar(0, wb, "2^{"+string(gp)+"}");

for n = 1 : N

tic();

initmem(gv);

t(n,g) = toc;

%waitbar(n/N, wb);

end

%gigabyte 1073741824

%megabyte 1048576

function initmem(M)

ma = ones(1,M,'uint8');

end

Melden Sie sich an, um zu kommentieren.

Why is subscripted assignment so inefficient?

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Akzeptierte Antwort

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

Why is subscripted assignment so inefficient?

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Akzeptierte Antwort

7 Kommentare 5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden