MATLAB Answers

0

Find string that has multiple substrings

Asked by Tyler Murray on 27 Sep 2016
Latest activity Answered by Walter Roberson
on 28 Sep 2016
I have a cell array each cell containing a string and I am trying to find all cells that have contain 2 substrings. For example
A = [Car is fast; Car is slow; Train is fast; Plane is fast]
I am new to cellfun (that doesn't necessarily have to be the solution) but figured that was the only way to do it.
any(~cellfun('isempty',strfind(A,'Car' 'fast')))
The result should be
[1;0;0;0]

  0 Comments

Sign in to comment.

2 Answers

Answer by John BG
on 28 Sep 2016
 Accepted Answer

Tyler
1.- Instead of
any(~cellfun('isempty',strfind(A,'Car' 'fast')))
you should use something like
any(~cellfun('isempty',strfind(A,'Car')))
or
and(any(~cellfun('isempty',strfind(A,'Car'))),any(~cellfun('isempty',strfind(A,'fast'))))
If you want to input all conditions (strings to spot) at once, at least you should combine them logically, but the function strfind only compares one string against another, both have to be char type.
2.- so, to answer your question try this
A = {'Car is fast'; 'Car is slow'; 'Train is fast'; 'Plane is fast'}
B={'Car';'fast'}
[szA1 szA2]=size(A); % cell you want to scan
[szB1 szB2]=size(B); % cell containing the patterns you want to look for
marker1=zeros(szB1,szA1); % position where marker spotted
for n=1:1:szB1
L1=B{n}
for k=1:1:szA1
L2=A{k}
if strfind(L2,L1)
marker1(n,k)=1;
end;
end
end
now marker1 contains all coincidences, all left to do is to AND vertically with
prod(marker1)
=
1 0 0 0
This is the result you are after, isn't it?
There are more compact ways to write this answer, without for loops, but testing them takes time.
Tyler, please would you be so kind to mark my answer as accepted answer? thanks in advance.
To any other reader if you find my answer of any help, would you please click on the thumbs-up link, thanks in advance
John BG

  1 Comment

Great thank you!

Sign in to comment.


Answer by Walter Roberson
on 28 Sep 2016

A = {'Car is fast'; 'Car is slow'; 'Train is fast'; 'Plane is fast'};
targets = {'Car', 'fast'};
lit_targets = regexptranslate('escape', targets);
pattern = [sprintf('(?=.*%s)', lit_targets{:}) '.?'];
matches = ~cellfun(@isempty, regexp(A, pattern) );
This is extendable to any number of strings in targets.
The step with regexptranslate is to ensure that anything in targets is matched literally. For example if you had 'Car.' then the period needs to be treated as an exact period. Without this step, regexp would treat the period as meaning "any one character"
There is another potential approach:
A = {'Car is fast'; 'Car is slow'; 'Train is fast'; 'Plane is fast'};
targets = {'Car', 'fast'};
lit_targets = regexptranslate('escape', targets);
pattern = strjoin(lit_targets, '|');
matches = cellfun(@length, regexp(A, pattern)) >= length(targets);
However, this will have problems if there is a string that contains multiple copies of one of the words. For example, 'This is a test' contains two copies of 'is' so if you were searching for 'is' and 'car' then the two matches for 'is' would count as 2 and the code would not notice that 'car' was not there. This approach is therefore not recommended for the general purpose.

  0 Comments

Sign in to comment.