Faster way to initilize arrays via empty matrix multiplication?

Question

Adi Natan am 6 Jan. 2013

7
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/58055-faster-way-to-initilize-arrays-via-empty-matrix-multiplication

Bearbeitet: Adam Danz am 15 Aug. 2019

I've stumbled upon the weird way (in my view) that Matlab is dealing with empty matrices. For example, if two empty matrices are multiplied the result is:

    zeros(3,0)*zeros(0,3)
    ans =
     0     0     0
     0     0     0
     0     0     0

Now, this already took me by surprise, however, a quick search got me to the link http://www.mathworks.com/help/matlab/math/empty-matrices-scalars-and-vectors.html , and I got an explanation of the somewhat twisted logic of why this is happening.

However, nothing prepared me for the following observation. I asked myself, how efficient is this type of multiplication vs just using zeros(n) function, say for the purpose of initialization? I've used timeit to answer this:

    f=@() zeros(1000)
    timeit(f)
    ans =
        0.0033

vs:

    g=@() zeros(1000,0)*zeros(0,1000)
    timeit(g)
    ans =
        9.2048e-06

Both have the same outcome of 1000x1000 matrix of zeros of class `double`, but the empty matrix multiplication one is ~350 times faster! (a similar result happens using `tic` and `toc` and a loop)

How can this be? are `timeit` or `tic,toc` bluffing or have I found a faster way to initialize matrices?

(this was done with matlab 2012a, on a win7-64 machine, intel-i5 650 3.2Ghz...)

I have looked more carefully into this peculiarity, and tested on 2 different computers (same matlab ver though 2012a) a code that examine the run time vs the size of matrix n. This is what I get:

The code to generate this used timeit as before, but a loop with tic and toc will look the same. So, for small sizes, zeros(n) is comparable. However, around n=400 there is a jump in performance for the empty matrix multiplication. The code I've used to generate that plot was:

    n=unique(round(logspace(0,4,200)));
    for k=1:length(n)
        f=@() zeros(n(k));
        t1(k)=timeit(f);
        g=@() zeros(n(k),0)*zeros(0,n(k));
        t2(k)=timeit(g);
    end
    loglog(n,t1,'b',n,t2,'r');
    legend('zeros(n)','zeros(n,0)*zeros(0,n)',2);
    xlabel('matrix size (n)'); ylabel('time [sec]');

Are any of you seeing anything similar?

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Adi Natan am 7 Jan. 2013

Thank you for this insight! I tried to incorporate this in several codes of mine, and it does improve efficiency. So though it sound too good to be true, for several cases it seem to work.

Adam Danz am 15 Aug. 2019

Bearbeitet: Adam Danz am 15 Aug. 2019

In MATLAB Online öffnen

[many years later]

I re-ran this in r2019a (64 bit Lenovo ThinkPad, 2.50GHz) and got the opposite results (below). However, upon each iteration I received the following warning that timing of zeros(n(k)) may be inaccurate. No such warning for the 2nd method.

Warning: The measured time for F may be inaccurate because it is running too fast. 
Try measuring something that takes longer. 

Additionally, I timed 100 iterations individually for both methods using Method 1: zeros(10000); and Method 2: zeros(10000,0)*zeros(0,10000); using tic/toc and found that method 1 was MUCH faster. The median speed of method 1 was 9790 times faster than method 2.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

per isakson am 6 Jan. 2013

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/58055-faster-way-to-initilize-arrays-via-empty-matrix-multiplication#answer_70313

Bearbeitet: per isakson am 6 Jan. 2013

In MATLAB Online öffnen

... got me to the link above...

I cannot find the link you refer to.

.

The Windows task manager shows that

Z = zeros(1000,0)*zeros(0,1000);

doesn't allocate memory (R2012a, 64bit, Win7). Or rather the task manager doesn't show any allocation of memory for that assignment.

.

ADDED

I learned two things. There is a small speed advantage of using this way of memory allocation. While experimenting I made it necessary to restart the computer and I lost my desktop configuration.

    N = 1e4;
    clear z1
    tic
    z1 = zeros( N ); 
    for cc = 1 : N
        z1(:,cc)=cc;
    end
    toc
    clear z0
    tic
    z0 = zeros(N,0)*zeros(0,N);
    for cc = 1 : N
        z0(:,cc)=cc;
    end
    toc
    Elapsed time is 0.686084 seconds.
    Elapsed time is 0.532437 seconds.

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Jan am 6 Jan. 2013

This could mean, that the operating system can offer some already allocated memory. This would be faster of course. E.g. the memory obtained from Windows is filled with zeros. The cleaning is done transparently in the background, if possible. See FEX: Uninit

per isakson am 6 Jan. 2013

Bearbeitet: per isakson am 6 Jan. 2013

I assume it means that Matlab in this case use "lazy allocation of memory", i.e does not allocate memory until it is necessary. That's an implementation detail, which TMW don't want us to bother about. That is what Matt J says in his answer.

Melden Sie sich an, um zu kommentieren.

Answer 2

James Tursa am 7 Jan. 2013

3
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/58055-faster-way-to-initilize-arrays-via-empty-matrix-multiplication#answer_70351

Bearbeitet: James Tursa am 7 Jan. 2013

In MATLAB Online öffnen

Another (perhaps related) observation is that MATLAB does some pre 0 filling in the background in mex routines. E.g., I have run the following test:

- Allocate a very large block of memory via mxMalloc

- Fill this block with non-0 data

- Remember the block address

- Free the block with mxFree

- Immediately allocate another very large block again with mxMalloc

- Check the address and the contents

I have found that the second mxMalloc call, when returning the same address as the first call, results in the data block being set to 0.

So something is setting this data block to 0 in the background between calls to mxFree and mxMalloc. If one timed the mex routine you may be able to detect the effort to zero out the data block going on in the background. But if something similar is happening at the m-file level (e.g., having a pre-0-filled data block available for allocations) it may be hard to detect in timings at that level. And if the whole thing is in a loop maybe the perceived timing advantage will simply go away (like Matt's example).

EDIT

To complete my mex related comments for this thread, I will mention that there is an undocumented API "fast zeros" function called mxFastZeros that apparently draws memory from a previously allocated and 0'ed out memory block since it is extremely fast, comparable timing to the zeros(m,0)*zeros(0,n) method (for all I know this method may call mxFastZeros in the background). A bare-bones mex routine that implements this is as follows:

// mxFastZeros.c generates a zero 2D double matrix
// Syntax: z = mxFastZeros( ComplexFlag, M, N )
// Where:
// ComplexFlag = 0 (real) or 1 (complex)
// M = row size
// N = column size
// Programmer:  James Tursa
#include "mex.h"
mxArray *mxFastZeros(mxComplexity ComplexFlag, mwSize m, mwSize n);
mxArray *mxCreateSharedDataCopy(mxArray *mx);
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    mxArray *mx;
    mxComplexity ComplexFlag;
    mwSize m, n;
      if( nrhs != 3 ) {
          mexErrMsgTxt("Syntax: mxFastZeros(ComplexFlag,M,N)");
      }
      if( nlhs > 1 ) {
          mexErrMsgTxt("Too many outputs.");
      }
      ComplexFlag = mxGetScalar(prhs[0]);
      m = mxGetScalar(prhs[1]);
      n = mxGetScalar(prhs[2]);
      mx = mxFastZeros(ComplexFlag,m,n);
      plhs[0] = mxCreateSharedDataCopy(mx);
      mxDestroyArray(mx);
  }

NOTE For Advanced Programmers: The shared data copy stuff is needed because mxFastZeros produces a normal mxArray (NOT on the garbage collection list), rather than a temporary mxArray (ON the garbage collection list) like the documented API functions. Using the mxCreateSharedDataCopy function allows the mex routine to return a temporary mxArray and prevents the memory leak that would occur if the result of mxFastZeros were returned directly in plhs[0].

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Amro am 13 Aug. 2013

Bearbeitet: Amro am 13 Aug. 2013

I often come across interesting undocumented C functions in the MEX dll "libmx.dll" using a tool like Dependency Walker, but I dont know how to get the exact signature of such functions. I always thought this was not possible with just the DLL (without a complete header file that expose all undocumented functions), short of reverse-engineering the disassembly. Seeing your many contributions in this regard, may I ask you how you find the exact types and order of the arguments and return value? I understand that C++ name mangling expose some of that information, but what about pure C functions like the above mxFastZeros, how did you know the correct syntax?

Melden Sie sich an, um zu kommentieren.

Answer 3

Matt J am 6 Jan. 2013

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/58055-faster-way-to-initilize-arrays-via-empty-matrix-multiplication#answer_70332

Bearbeitet: Matt J am 6 Jan. 2013

In MATLAB Online öffnen

nothing prepared me for the following observation. I asked myself, how efficient is this type of multiplication vs just using zeros(n) function, say for the purpose of initialization? I've used timeit to answer this:

You can't trust what you're seeing as a more efficient method of intialization (sadly). All that's happening is that the actual creation of the array is being postponed until the array is used. As you can see in the illustration below, the summation of Z1 takes much longer than Z2, because it includes also the actual instantiation of Z1 for some reason.

    N=2000;
    tic
     Z1=zeros(N,0)*zeros(0,N);
    toc %Elapsed time is 0.000037 seconds.
    tic
     sum(Z1);
    toc Elapsed time is 0.008581 seconds.
    tic;
    Z2=zeros(N);
    toc%Elapsed time is 0.007404 seconds.
    tic
    sum(Z2);
    toc%Elapsed time is 0.002177 seconds.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Adi Natan am 7 Jan. 2013

I get the same thing, but I also get the same result per isakson gets (see per isakson's answer). So why is it sometimes a bit more efficient ans some other times really not efficient?

Melden Sie sich an, um zu kommentieren.

Answer 4

Walter Roberson am 6 Jan. 2013

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/58055-faster-way-to-initilize-arrays-via-empty-matrix-multiplication#answer_70279

On my MacBook Pro (i7 @ 2.6 GHz), the jump in performance I see is at 4096 exactly (i.e., also tested at 4095). Which would seem plausible as a breakpoint at which the libraries would take over.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 5

Daniel Shub am 7 Jan. 2013

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/58055-faster-way-to-initilize-arrays-via-empty-matrix-multiplication#answer_70354

Yair has a nice blog post about this (with some good comments at the end).

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Faster way to initilize arrays via empty matrix multiplication?

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (4)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

Faster way to initilize arrays via empty matrix multiplication?

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (4)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden