Deletion of cell array element performance issue - alternative or solutions?

9 Ansichten (letzte 30 Tage)
I have a struct array with 7 equal-sized arrays and one scalar value. One of the arrays is a cell array, as each element can contain a varying number of elements.
I have done performance profiling and found that deleting an element of the cell array is consuming 90% of the effort of my element-removing function. e.g. cell_array(index) = []; The function is called so often that it is consuming the majority of the time for my code.
Why is removing an element from a cell array so much more expensive than from a regular array?
Is there a more efficient way to achieve my goal of removing an element?
Is there an alternative way to store a varying-length element within an object that is otherwise a struct array that is more efficient for deletion operations?
  8 Kommentare
Guillaume
Guillaume am 13 Okt. 2017
For reference: comparison between array and cell arrays (R2017b matlab online)
%Deletion
a = 1:10000;
c = num2cell(a);
tic; for i = numel(a):-1:1, a(randi(i)) = []; end; toc
tic; for i = numel(c):-1:1, c(randi(i)) = []; end; toc
Elapsed time is 0.230962 seconds. %Array
Elapsed time is 0.885017 seconds. %Cell array
So roughly 4 times slower when deleting elements from a cell array
%Shifting
a = 1:10000;
c = num2cell(a);
tic; for step = 1:10000; idx = randi(numel(a)); a = [a(1:idx-1), a(idx+1:end), a(idx)]; end; toc
tic; for step = 1:10000; idx = randi(numel(c)); c = [c(1:idx-1), c(idx+1:end), c(idx)]; end; toc
Elapsed time is 0.271223 seconds. %Array
Elapsed time is 3.486516 seconds. %Cell array
Shockingly slow with a cell array even though it should be the same amount of memory moved. Not sure what is going on.
Stephen23
Stephen23 am 14 Okt. 2017
@Nicholas Bauer: resizing data arrays in a loop is invariably a bad idea, there are plenty of discussions on this forum about why resizing matrices in a loop is slow. You would be much better off simply storing a logical mask of the relevant rows.

Melden Sie sich an, um zu kommentieren.

Antworten (3)

Guillaume
Guillaume am 13 Okt. 2017
Cell array and matrices are very efficient for indexing (O(1) complexity) but innefficient for insertions and deletions (O(n) complexity).
The only container included in matlab that is efficient for insertions and deletions is containers.Map. Depending how it's implemented in matlab (unfortunately this is not documented), time complexity should be O(1) or O(log(n)). However, indexing takes longer.
There are other containers that have different trade-offs between insertion/deletion, indexing, search, etc. But none of these are implemented in matlab. I believe that there's an implementation of linked lists on the FileExchange (insertions/deletions in O(1)).
  4 Kommentare
Nicholas Bauer
Nicholas Bauer am 13 Okt. 2017
I think perhaps Guillaume's #3 is the difference then, freeing memory.
Nicholas Bauer
Nicholas Bauer am 13 Okt. 2017
Nevermind, it isn't-- see comment on question.

Melden Sie sich an, um zu kommentieren.


James Tursa
James Tursa am 16 Okt. 2017
Bearbeitet: James Tursa am 16 Okt. 2017
Keep in mind that deleting an element of a double array means:
- Allocating/copying the data (64-bit doubles) to new memory & free'ing old memory
Whereas deleting an element of a cell array means:
- Allocating/copying the data (32-bit or 64-bit pointers) to new memory & free'ing old memory, PLUS ...
- Checking for any sharing (data, reference, etc) of the cell element being deleted
- Unsharing the cell element if it is sharing in any manner
- Freeing the data of the cell element itself
- Free'ing the mxArray variable header of the cell element
So, because of that mxArray header stuff for the cell array variable pointers, a lot more is going on with the MATLAB memory manager than just allocating/copying the pointers in the data area. And free'ing memory does not necessarily mean that this memory gets released back to the OS. MATLAB may hang onto it in the background, keeping it in reserve for additional anticipated allocations in the future. As Jan points out, the details of all of this memory manager stuff are not published, so really I have no clue what I would expect for timing for this operation.
It would be interesting to see if one could force those deletions to be simply reference counter decrements instead of mxArray header free'ing. E.g., if you weren't concerned about the memory implications, something like this might speed things up:
- Create a new cell array where the elements are all reference copies of the original cell array
- This can easily be done in a mex routine (Maybe also at the m-file level but I would have to think about it first)
- Then delete the cell elements from the original. These deletions would result ONLY in a reference count decrement in the mxArray header ... no headers would need to be free'd so no memory manager stuff needed for this particular operation.
Then maybe the timing would reduce down to what the double array takes. But this is all just a conjecture on my part. Also, the downside is of course that all of the original cell elements remain in memory. But if you care mainly about the speed and not the memory then this approach might work.

Nicholas Bauer
Nicholas Bauer am 13 Okt. 2017
My solution, for the moment, is to not remove any elements from the array, just keep track of validity, the valid and total count of elements, and when needed take the valid subset.

Kategorien

Mehr zu Resizing and Reshaping Matrices finden Sie in Help Center und File Exchange

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by