Finding cell array row indices based on numeric column values

Question

Piddy am 9 Jan. 2018

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/376288-finding-cell-array-row-indices-based-on-numeric-column-values

Kommentiert: Piddy am 10 Jan. 2018

I have a large cell array keystrokes of approximate size 20000x4. Columns 1 and 3 each contain a char, while columns 2 and 4 each contain a double. For example:

>> keystrokes(378:380,:)
ans =
3×4 cell array
    {'l'      }    {[   180]}    {'e'      }    {[  69]}
    {'e'      }    {[300664]}    {'|space|'}    {[ 125]}
    {'|space|'}    {[    62]}    {'n'      }    {[2500]}

I want to find the row indices in keystrokes of occurrences of every unique combination of columns 1 and 3, where the value in column 2 is less than 100000 and the value in column 4 is less than 2000. My current code gives me the error "Undefined operator '<' for input arguments of type 'cell'.", and is shown below.

% Temporarily convert keystroke structure to a table due to unique() apparently not supporting combinations of cellarray columns.
uniqueDigraphsTable = unique(cell2table(keystrokes(:,[1 3])), 'rows');
uniqueDigraphs = table2cell(uniqueDigraphsTable);
for ii = 1:length(uniqueDigraphs)
  % Find rows containing the current unique digraph
  occurrenceIndices = find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & strcmp(keystrokes(:,3), 
  uniqueDigraphs{ii,2}) & keystrokes(:,2)<100000 & keystrokes(:,4)<2000);
  ...
end

Using keystrokes{:,4}<2000 gives me this error: "Error using <. Too many input arguments." Is there a simple (and perhaps prettier) way to find the indices?

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Jan am 9 Jan. 2018

In MATLAB Online öffnen

Prefer to post the input data such, that they can be used by copy&paste. Is keystrokes a nested cell:

kestrokes = { ...
  {'l'      }    {[   180]}    {'e'      }    {[  69]}; ...
  {'e'      }    {[300664]}    {'|space|'}    {[ 125]}; ...
  {'|space|'}    {[    62]}    {'n'      }    {[2500]}}

or a cell:

kestrokes = { ...
  'l',         180, 'e',         69; ...
  'e',      300664, '|space|',  125; ...
 '|space|',    62,  'n'        2500}

? Even typing this question need a lot of typing.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Guillaume am 9 Jan. 2018

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/376288-finding-cell-array-row-indices-based-on-numeric-column-values#answer_299340

In MATLAB Online öffnen

 find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
      strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
      [keystrokes{:,2}] < 100000 & ...
      [keystrokes{:,4}] < 2000)

or

 find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
      strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
      cell2mat(keystrokes(:,2)) < 100000 & ...
      cell2mat(keystrokes(:,4)) < 2000)

In essence you have to transform your cell columns into numeric matrices.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Piddy am 10 Jan. 2018

In MATLAB Online öffnen

Thanks a lot! Your cell2mat solution gives the results I'm looking for. The first solution seems to have sort of looping problem though. It produces a very large vector where the first elements are the correct indices, but following those are indices that exceed the length of the keystrokes array.

For example, when keystrokes is a 24894x4 cell, part of its output for a specific row in uniqueDigraph looks like this:

K>> length(occurrenceIndices) 
  ans =
      158473
K>> occurrenceIndices(1:15)
ans =
         591
         677
        1090
        2247
        2578
        2912
        3227
       25485
       25571
       25984
       27141
       27472
       27806
       28121
       50379

The first 7 values are correct, but the rest are too large. 24894 + 591 = 25485 though, and 24894 + 677 = 25571 etc.

Melden Sie sich an, um zu kommentieren.

Answer 2

Jan am 9 Jan. 2018

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/376288-finding-cell-array-row-indices-based-on-numeric-column-values#answer_299339

Bearbeitet: Jan am 9 Jan. 2018

In MATLAB Online öffnen

The cell is not useful for these comparisons. Converting is to a table is the next indirection. Easier:

% Store strings in one cell string:
Strings  = keystrokes(:, [1, 3]);
uStrings = unique(Strings, 'rows'); 
% Store numbers in a numerical array:
Values = cell2mat(keystrokes(:, [2, 4]));
% Move the check of the values out of the loop for performance:
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
  occurrenceIndices = find(strcmp(Strings(:,1), uStrings{ii, 1}) & ...
                           strcmp(Strings(:,2), uStrings{ii, 2}) & ...
                           match);
  ...
end

This would be faster, if you use the 2nd and 3rd output of unique() also:

[uStrings, iString, iUniq] = unique(Strings, 'rows');
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
  occurrenceIndices = find(iUniq == ii & match);
  ...
end

2 Kommentare
Keine anzeigenKeine ausblenden

Piddy am 10 Jan. 2018

In MATLAB Online öffnen

Thank you! There is still an issue though. The following line produces this warning: "The 'rows' input is not supported for cell array inputs."

[uStrings, iString, iUniq] = unique(Strings, 'rows');

Does this tie into your comment asking whether or not keystrokes is a nested cell? I didn't produce the keystrokes variable myself, but I'm fairly sure that it is not nested. I checked using class():

class(keystrokes{1,1})
ans = 'char'

I also think that if it were nested, the example command I showed in my original question would have produced an output like this:

>> keystrokes(378:380,:)
ans =
3×4 cell array
    {1×1 cell}    {1×1 cell}    {1×1 cell}    {1×1 cell}
    {1×1 cell}    {1×1 cell}    {1×1 cell}    {1×1 cell}
    {1×1 cell}    {1×1 cell}    {1×1 cell}    {1×1 cell}

I could of course be mistaken.

Guillaume am 10 Jan. 2018

In MATLAB Online öffnen

Annoyingly, unique (and ismember) do not support the 'row' option with cell arrays even if it is a cell array of char arrays. If you have matlab R2016b or later, you can convert the cell array of char arrays into a string array which can be used with unique and the 'row' option:

unique(string(keystrokes(:, [1 3])), 'rows')

Melden Sie sich an, um zu kommentieren.

Finding cell array row indices based on numeric column values

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Finding cell array row indices based on numeric column values

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

2 Kommentare Keine anzeigenKeine ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden