extracting rows from large table

3 views (last 30 days)
I have a large table (23000000x11) and want to extract rows from the table based on values in the 5th column.
for example the data looks like:
66172 8:17 1 2 1 1 1 1 1 6 8
66842 7:17 1 3 0 1 1 1 1 1 7
.
.
.
and I want to create a smaller table with all the rows whose value of the fifth column are 0. This is of course easy to do with for loops and if statements, but takes enormous amounts of time to run. Is there a way using maybe ismember, intersect, or some other means to carry this out. I'm sure there must be. Any suggestions are appreciated.

Accepted Answer

Walter Roberson
Walter Roberson on 30 Sep 2022
mask = YourTable{:, 5} == 0;
subset0 = YourTable(mask, :);
  3 Comments
Walter Roberson
Walter Roberson on 11 Oct 2022
I was curious about the timing.
In my tests, the first 5 or so {:,5} accesses were reliably much slower than the others, so I skip those in the plot.
I must admit that I do not understand why {:,5} would be 5 or 6 times slower than .(5)
format long g
YourTable = array2table(randi(9, 5000, 7), 'VariableNames', {'a', 'b', 'c', 'd', 'e', 'f', 'g'});
N = 50;
skip = 7;
time_brace = zeros(N,1);
time_named = zeros(N,1);
time_dotnum = zeros(N,1);
for K = 1 : N
tic; YourTable.(5); t=toc(); time_dotnum(K) = t;
tic; YourTable{:,5}; t=toc(); time_brace(K) = t;
tic; YourTable.e; t=toc(); time_named(K) = t;
end
subset = [time_brace(skip+1:end), time_named(skip+1:end), time_dotnum(skip+1:end)];
plot(subset);
legend({'\{:,5\}', '.e', '.(5)'})
mean(subset)
ans = 1×3
1.0e+00 * 7.27209302325581e-05 1.4046511627907e-05 1.01627906976744e-05

Sign in to comment.

More Answers (0)

Categories

Find more on Tables in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by