OOP-Performance problems in accessing large arrays in class properties

5 Ansichten (letzte 30 Tage)
Simon
Simon am 7 Nov. 2013
Kommentiert: Matt J am 5 Dez. 2013
Hi!
I have a problem with accessing arrays that are properties of a class. I attached a sample class. In this class just one property is defined, an array of size(N, 3). The variable "N" may be set to different values. The problem is: for large N it takes a long time to set some values somewhere in the matrix (it is prealloated with zeros for testing purposes).
The process of testing is
clear all
close all
clc
% create instance
T = TimingTest;
% test for array of 1000x3
T.Resize(1e3);
T.DoTesting;
% test for array of 100000x3
T.Resize(1e5);
T.DoTesting;
I cannot see why it takes so long or what I might have done wrong. Hints are very welcome. Thanks in advance!

Antworten (3)

Yair Altman
Yair Altman am 1 Dez. 2013
This is because you are inadvertently reallocating tens of thousands of elements, 1e4 times, in the following line:
ctmp(ind, 1:3) = value;
notice that ctmp was previously set to
ctmp = obj.Arr;
which is done using copy-on-write, i.e. only copying a pointer (reference) whereas in that line you are modifying 3 values so the entire array (100Kx3x8=2.4MB) needs to be allocated before it can be modified. You are then repeating this 1e4 times.
So the problem is not so much with OOP performance as with your inefficient memory reallocations.
  2 Kommentare
Simon
Simon am 2 Dez. 2013
Thanks for your reply!
It is faster for a non-handle class. Is it correct that in this case there is not a reference but the full array is copied and no extra allocation (1e4 times) is needed? On the other hand if I call functions inside the class the full instance is copied every time which is not neccesary if it is a handle class?
BTW: I do not know the size of Arr before I use the class and it may change during usage. I came across "containers.Map" objects that seem to be very efficient when growing in size. I don't know why but I like it ;-)
Matt J
Matt J am 2 Dez. 2013
No, you will see the same slow speed even if you use a non-handle class. You need to make sure, however, that you are correctly resizing T, with the modified syntax
T=T.Resize(1e3);
T=T.Resize(1e5);

Melden Sie sich an, um zu kommentieren.


Matt J
Matt J am 4 Dez. 2013
Bearbeitet: Matt J am 4 Dez. 2013
But apparently it takes the same time to modify the property directly with obj.Arr(ind, 1:3) = value;
This is interesting (and unfortunate), but I believe I know why it happens. When you modify property data, the entire array-valued property contents are always first pre-processed and returned via the property's get() method. After that, indexing operations are applied to the get method's output and the array is modified. Then the modified data is put back into the property, but pre-processed with the property's set method. In your case, you haven't defined a get.Arr() and set.Arr() methods, but default ones are provided in the background.
Because the pre- and post-processing done by set.Arr and get.Arr are limited only by the class designer's imagination, MATLAB cannot simply take specific array elements out and put modified ones back in in an in-place manner. It has no choice but to implement the expression obj.Arr(ind, 1:3) = value with the equivalent of
ctmp = obj.Arr; %get.Arr() called here
ctmp(ind, 1:3) = value;
obj.Arr = ctmp; %set.Arr() called here
Thus a second deep copy of Arr is made before obj.Arr is updated. It's too bad and hopefully future releases will offer property Attributes that can let you circumvent set/get. containers.Map don't have set/get methods as middle men (I don't think) and so they don't have this problem.
Note, however, that this slow access only occurs for operations that modify properties. If you simply did a subsref operation to access part of obj.Arr
value = obj.Arr(ind, 1:3) ;
you would not see this slow behavior.
  2 Kommentare
Simon
Simon am 5 Dez. 2013
Your explanation sounds reasonable. I agree that reading from a property seems to be fast. Writing to a property seems to involve a deep copy, but the matlab help states:
MATLAB® has no default set or get property access methods. Therefore, if you do not define property access methods, MATLAB software does not invoke any methods before assigning or returning property values.
On the other hand: The containers.Maps I use are properties of the class as well. I updated the class file and attached it. The process of testing is the same as before. The results are
size of array: 100000
index: 19659
value: 2.510839, 6.160447, 4.732888
timing for property array: 17.678382
timing separate loops: 0.017326, 0.010823, 0.043321 -> sum: 0.071470
timing combined loop: 0.080155, 15.576293, 1.911431 -> sum: 17.567879
timing for container (direct access): 0.359804
timing for container (with temp. copy): 0.408796
As you can see the timing for the container is really fast. It is interesting to note that for the combined loop the most time is spent storing the value in the temporary array! And the second (confusing) observation is: The total time used to modify the container is less than the time used to get and set the temporary variable for the combined loop! This is true regardless of the way the container is modified (direct or with temporary copy ot the property)! This is insofar interesting as the container map's size is about 10 times larger if you look at the size of the variable returned after converting it with "struct(container.map)".
The result for me is: storing large arrays as object properties is kind of slow and not recommended. But I have to admit that I use matlab version 7.11.2.1031 (R2010b) Service Pack 2 at the moment. So I don't know how the current release performs.
Matt J
Matt J am 5 Dez. 2013
MATLAB® has no default set or get property access methods.
Puzzling. But even if no set/get methods are called, I can still imagine the developers using the same copy semantics as if there were set/get methods defined. It would make coding simpler, I'd guess.
But I have to admit that I use matlab version 7.11.2.1031 (R2010b) Service Pack 2 at the moment. So I don't know how the current release performs.
I'm seeing the same behavior in R2013a.

Melden Sie sich an, um zu kommentieren.


Simon
Simon am 7 Nov. 2013
To answer myself: It seems that this problem occurs only if the class is a handle class ...
  4 Kommentare
Matt J
Matt J am 3 Dez. 2013
Bearbeitet: Matt J am 3 Dez. 2013
Neither the property obj.Arr nor ctmp are handle objects (even if obj itself is). Therefore, copying obj.Arr to ctmp and modifying the later always results in a fresh deep copy of the data.
Simon
Simon am 4 Dez. 2013
But apparently it takes the same time to modify the property directly with
obj.Arr(ind, 1:3) = value;
and to modify it in a copy of the property with
ctmp = obj.Arr;
ctmp(ind, 1:3) = value;
obj.Arr = ctmp;
This is measured in "timing for property array" vs. "timing combined loop". This is independent of the type (value/handle) of class.
I'm interested in the best (meaning: most efficient) way to handle this. That's the reason why I ask this kind of pedantic questions. There are situations where the computational time matters. Something else I discovered during my investigations is that the time it takes to store the value increases as the array gets filled. This supports your statement about copyiing the array. Copying zeros seems to be more efficient. Interestingly enough I cannot see a comparable slowing down if I use containers.Map objects.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Data Type Identification finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by