Filter löschen
Filter löschen

Performance of table data type

80 Ansichten (letzte 30 Tage)
Michael
Michael am 30 Okt. 2014
Kommentiert: LuisCardona am 28 Jun. 2017
Hello!
Is it normal that writing into a table data structure is 1000 times slower than writing into a cell array of the same size? And that reading is 50 times slower?
Try the following code:
%Test:
tic;
A = cell(10000, 50);
'Time for initializing cell array:'
toc
tic;
B = cell2table(A);
'Time for initializing table:'
toc
i = 0; % create variable
tic;
for i = 1 : 2500
A{i, 7} = 'aaa';
end
'Time for writing into cell array:'
toc
tic;
for i = 1 : 2500
B{i, 7} = {'aaa'};
end
'Time for writing into table:'
toc
x = ''; % create variable
tic;
for i = 1 : 2500
x = A{i, 7};
end
'Time for reading from cell array:'
toc
tic;
for i = 1 : 2500
x = B{i, 7};
end
'Time for reading from table:'
toc
  2 Kommentare
Oleg Komarov
Oleg Komarov am 30 Nov. 2016
Bearbeitet: Oleg Komarov am 1 Dez. 2016
While tables do have performance issues, this example is particularly pathological.
The initialization of a table with an array of empty cells is problematic. The following initialization is much faster:
tic;
A = repmat({''},1e4,50);
'Time for initializing cell array:'
toc
Also, named reference is preferred to curly brackets, i.e. B.A7(i) instead of B{i,7}.
Victor
Victor am 26 Jun. 2017
Bearbeitet: Victor am 26 Jun. 2017
Added similar issue to Stackoverflow, it may be helpful: Matlab Table / Dataset type optimization

Melden Sie sich an, um zu kommentieren.

Antworten (6)

Peter Perkins
Peter Perkins am 30 Okt. 2014
Michael, table is currently not as fast as datatypes like double and cell when you are reading or writing individual values in a long loop. However, it's often possible to vectorize your code and read or write entire variables, at which point you probably won't notice a speed difference. You may also find that
B.Var7{i} = 'aaa'
is faster than
B{i, 7} = {'aaa'}
Hope this helps.

Michael
Michael am 31 Okt. 2014
Thank you for the answer. In my case, I have to write single values. Therefore, the slow performance of the table data type is very disappointing. I will try to use B.Var7{i} = 'aaa', as you wrote. But such an (undocumented) difference in the behavior is also quite unsatisfying...
  1 Kommentar
Nigel Dyer
Nigel Dyer am 28 Jun. 2015
Agreed. The table type appeared to be a perfect solution for what I needed to do. I found this question, registered my profile and wrote this while waiting for writetable to complete. The previous code using dmlwrite took a couple of seconds.

Melden Sie sich an, um zu kommentieren.


Oleg Komarov
Oleg Komarov am 30 Nov. 2016
Bearbeitet: Oleg Komarov am 2 Dez. 2016
As already replicated in table performance very slow , I repeat here my take.
I have been using table() way before they were introduced into the core package, since de facto they are the ported version of the dataset() class from the Statistics Toolbox. I also noticed long time ago many limitations in terms of performance and functionality, and have logged feature enhancements with TMW.
To address the limitations of the table(), while waiting for the ufficial implementation of my enhancement requests, I created the tableutils(). Among the problems, you would be astonished to know that the disp() of a big table can literally freeze your pc until the next ice age (and I am not talking about the movies...). This is somethig that I fixed with a buffered disp method.
While my tableutils() do not address directly the problems in subsref/subsasgn, anyone is welcome to contribute to this effort to make the table() class better by submitting an issue or a Pull Request on Github.
.
Addressing some points in the question
  • It is 50x faster to initialize with {''} rather than with []
N = 500;
A = cell(N);
sprintf('cell2table() on empty cells: %.3fs', timeit(@()cell2table(A)))
A = repmat({''},(N));
sprintf('cell2table() on {''} cells: %.3fs', timeit(@()cell2table(A)))
  • It is 5x faster to use dot-indexing, i.e. subsasgDot, than brace-indexing, i.e. subsasgBraces
S = 1000;
[row,col] = ind2sub(N,randsample(N^2,S,false));
% {} assignment
B = cell2table(A);
tic
for ii = 1:S
B{row(ii),col(ii)} = {'aaa'};
end
toc
% . assignment
C = cell2table(A);
vnames = B.Properties.VariableNames;
tic
for ii = 1:S
C.(vnames{col(ii)})(row(ii)) = {'aaa'};
end
toc

LuisCardona
LuisCardona am 5 Mai 2016
Tables are the slowest thing I have ever had. I had to rewrite my code to use matrices coding the name of my columns with integers because their poor performance.
Stay away of the tables!
  3 Kommentare
Victor
Victor am 15 Jun. 2017
I think, the current Table datatype seems to be an attempt to support more sophisticated Excel-like functionality, with optimization trade-off.
The problem is, with matrices you can't always remember column name by index, and searching string for every call to a variable is not a good solution.
I have used two ways to keep variable/column names - structure of vectors of the same length and vector of structures (a.k.a. nonscalar struct array).
Both have drawbacks - you can't get simultaneous simple row-wise and colum-wise access without slow convertion to another data structure.
But I think that there can be some simpler and optimized version of Table data type, if we want just to combine row-number and column-variable indexing with original arrays and cell arrays. And if we have only numbers (with no cell/string/sparce functionality), it can be even more faster.
LuisCardona
LuisCardona am 28 Jun. 2017
Hoi Wong. I wanted to clarify that I was talking about the tables in MATLAB, not the concept altogether. Thanks for the comment. But, I keep my position that they are terrible slow in MATLAB

Melden Sie sich an, um zu kommentieren.


jbpritts
jbpritts am 24 Nov. 2016
I have Matlab 2016b. I can confirm that tables are terribly slow. Unless you really need it for heterogeneous data, then avoid them in any performance critical code. I will have to rewrite a fairly complicated section of code using legacy data structures. Matlab should address this extreme performance deficiency.

Peter Perkins
Peter Perkins am 2 Dez. 2016
Bearbeitet: Peter Perkins am 2 Dez. 2016
As posts on this thread have indicated, while tables are often the right data structure for the job, their performance in scalar indexing is not comparable to that of types such as double and struct. While there have been significant performance improvements since the initial release in R2014b (e.g. writetable), and those improvements will continue, tables are best when operations can be vectorized. That's often true even with plain old double matrices. It's also best to pre-allocate a table rather than growing it row by row, and again, that's true even for double matrices.
In situations where code cannot be vectorized, perhaps because the results of one iteration of a loop affect subsequent iterations, it's often possible to encapsulate the body of a loop into a function that you call by passing it a table's variables using dot subscripting, and assign back to a table's variables, rather than completely rewriting code to not use tables. It often looks something like this:
[t.X,t.Y,t.Z] = fun(t.A,T.B,t.C)
where fun is a loop that works on separate arrays. Even when it's not desirable to encapsulate the code in a function body, it's often possible to "hoist" a small number of variables out of a table and into the workspace before a loop, have the loop work on them, and then put them back in the table. In other words, if performance is an issue, consider replacing the bottlenecks with code that uses lower-level data types rather than completely avoiding tables.
  2 Kommentare
Oleg Komarov
Oleg Komarov am 4 Dez. 2016
Bearbeitet: Oleg Komarov am 4 Dez. 2016
Hi Peter, thanks for the suggestion. Is there any particular reason why the table.subsasgnBraces() transforms the RHS into a table?
A lot of overhead is incurred in that operation and subsequent table methods applied to a table-like RHS.
See for e.g. line 121 @tabular\subsasgnBraces.m, and line 191 of @tabular\subsasgnParens.m which calls a matlab coded repmat since the input is the RHS rendered table, instead of the builtin repmat.
Peter Perkins
Peter Perkins am 5 Dez. 2016
Your earlier observation that dot-then-parens indexing is faster than braces, for example, B.A7(i) vs B{i,7}, is true. That's one of the "significant performance improvements" I was referring to. It's an ongoing process. Table brace indexing is something we're planning to work on.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Loops and Conditional Statements finden Sie in Help Center und File Exchange

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by