Finding cell array row indices based on numeric column values

10 Ansichten (letzte 30 Tage)
Piddy
Piddy am 9 Jan. 2018
Kommentiert: Piddy am 10 Jan. 2018
I have a large cell array keystrokes of approximate size 20000x4. Columns 1 and 3 each contain a char, while columns 2 and 4 each contain a double. For example:
>> keystrokes(378:380,:)
ans =
3×4 cell array
{'l' } {[ 180]} {'e' } {[ 69]}
{'e' } {[300664]} {'|space|'} {[ 125]}
{'|space|'} {[ 62]} {'n' } {[2500]}
I want to find the row indices in keystrokes of occurrences of every unique combination of columns 1 and 3, where the value in column 2 is less than 100000 and the value in column 4 is less than 2000. My current code gives me the error "Undefined operator '<' for input arguments of type 'cell'.", and is shown below.
% Temporarily convert keystroke structure to a table due to unique() apparently not supporting combinations of cellarray columns.
uniqueDigraphsTable = unique(cell2table(keystrokes(:,[1 3])), 'rows');
uniqueDigraphs = table2cell(uniqueDigraphsTable);
for ii = 1:length(uniqueDigraphs)
% Find rows containing the current unique digraph
occurrenceIndices = find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & strcmp(keystrokes(:,3),
uniqueDigraphs{ii,2}) & keystrokes(:,2)<100000 & keystrokes(:,4)<2000);
...
end
Using keystrokes{:,4}<2000 gives me this error: "Error using <. Too many input arguments." Is there a simple (and perhaps prettier) way to find the indices?
  1 Kommentar
Jan
Jan am 9 Jan. 2018
Prefer to post the input data such, that they can be used by copy&paste. Is keystrokes a nested cell:
kestrokes = { ...
{'l' } {[ 180]} {'e' } {[ 69]}; ...
{'e' } {[300664]} {'|space|'} {[ 125]}; ...
{'|space|'} {[ 62]} {'n' } {[2500]}}
or a cell:
kestrokes = { ...
'l', 180, 'e', 69; ...
'e', 300664, '|space|', 125; ...
'|space|', 62, 'n' 2500}
? Even typing this question need a lot of typing.

Melden Sie sich an, um zu kommentieren.

Antworten (2)

Guillaume
Guillaume am 9 Jan. 2018
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
[keystrokes{:,2}] < 100000 & ...
[keystrokes{:,4}] < 2000)
or
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
cell2mat(keystrokes(:,2)) < 100000 & ...
cell2mat(keystrokes(:,4)) < 2000)
In essence you have to transform your cell columns into numeric matrices.
  1 Kommentar
Piddy
Piddy am 10 Jan. 2018
Thanks a lot! Your cell2mat solution gives the results I'm looking for. The first solution seems to have sort of looping problem though. It produces a very large vector where the first elements are the correct indices, but following those are indices that exceed the length of the keystrokes array.
For example, when keystrokes is a 24894x4 cell, part of its output for a specific row in uniqueDigraph looks like this:
K>> length(occurrenceIndices)
ans =
158473
K>> occurrenceIndices(1:15)
ans =
591
677
1090
2247
2578
2912
3227
25485
25571
25984
27141
27472
27806
28121
50379
The first 7 values are correct, but the rest are too large. 24894 + 591 = 25485 though, and 24894 + 677 = 25571 etc.

Melden Sie sich an, um zu kommentieren.


Jan
Jan am 9 Jan. 2018
Bearbeitet: Jan am 9 Jan. 2018
The cell is not useful for these comparisons. Converting is to a table is the next indirection. Easier:
% Store strings in one cell string:
Strings = keystrokes(:, [1, 3]);
uStrings = unique(Strings, 'rows');
% Store numbers in a numerical array:
Values = cell2mat(keystrokes(:, [2, 4]));
% Move the check of the values out of the loop for performance:
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(strcmp(Strings(:,1), uStrings{ii, 1}) & ...
strcmp(Strings(:,2), uStrings{ii, 2}) & ...
match);
...
end
This would be faster, if you use the 2nd and 3rd output of unique() also:
[uStrings, iString, iUniq] = unique(Strings, 'rows');
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(iUniq == ii & match);
...
end
  2 Kommentare
Piddy
Piddy am 10 Jan. 2018
Thank you! There is still an issue though. The following line produces this warning: "The 'rows' input is not supported for cell array inputs."
[uStrings, iString, iUniq] = unique(Strings, 'rows');
Does this tie into your comment asking whether or not keystrokes is a nested cell? I didn't produce the keystrokes variable myself, but I'm fairly sure that it is not nested. I checked using class():
class(keystrokes{1,1})
ans = 'char'
I also think that if it were nested, the example command I showed in my original question would have produced an output like this:
>> keystrokes(378:380,:)
ans =
3×4 cell array
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
I could of course be mistaken.
Guillaume
Guillaume am 10 Jan. 2018
Annoyingly, unique (and ismember) do not support the 'row' option with cell arrays even if it is a cell array of char arrays. If you have matlab R2016b or later, you can convert the cell array of char arrays into a string array which can be used with unique and the 'row' option:
unique(string(keystrokes(:, [1 3])), 'rows')

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Matrix Indexing finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by