Find strings that contain multiple substrings at the same time

Question

1 Stimme

Hi, I have a array of strings and I would like to identify all those entries that contain two substrings at the same time. For example:

My strings are: 'First Example','Second Example', 'Third Example'

My Substrings are: 'irs','xam'

So I would like to identify the first string as the only one which contains both substrings. I have found a solution, but I am convinced that there must be a more elegant and efficient way of achieving this. My code looks as follows:

clear;clc;
%set variables
rbCode = {'RB_DEP_LI_EQ_EED'; ...
          'RB_DEP_LI_EQ_EED_INV'; ...
          'RB_DEP_LI_EQ_EED_TRS'; ...
          'RB_DEP_LI_EQ_IED'
          'RB_DEP_LI_EQ_IED_INV'; ...
          'RB_DEP_LI_EQ_IED_TRS'; ...
          'RB_DEP_LI_FI_INV'};
rbMarketValue = {100; 80; 20; 70; 40; 30; 20};     
%compare invested market value
strToFind ={'EQ';'_INV'};
%sum up rbMarketValue for all rbCodes that have both 'EQ' and '_INV' in
%their name
a = arrayfun(@(x) strfind(rbCode,char(strToFind(x))),1:size(strToFind),'un',false);
a1 = arrayfun(@(x) logical(~isempty(cell2mat(a{1,1}(x,1)))),1:size(rbCode));
a2 = arrayfun(@(x) logical(~isempty(cell2mat(a{1,2}(x,1)))),1:size(rbCode));
a3 = sum([a1' a2'],2);
a4 = cell2mat(rbMarketValue);
sumDeptInv = sum(a4(a3==2));

Any suggestion how I can achieve this? Thanks Sven

2 Kommentare
Keine anzeigen Keine ausblenden

dpb am 8 Apr. 2015

regular expressions (but I'm such a feeb I'll leave the actual expression to someone who can write it w/o having to read the full text... :) )

Stephen23 am 8 Apr. 2015

Is the order of the substrings known and fixed? If they have a fixed order, then this could be solved using a simple regular expression. If the order is not known, as your code currently solves, then this requires either two parses of the strings or some kind of pre-processing.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Anmelden, um Aktivität zu verfolgen

Answer 1

Stephen23 am 8 Apr. 2015

Bearbeitet: Stephen23 am 9 Apr. 2015

In MATLAB Online öffnen

0 Stimmen

Actually using strfind is often faster than using regexp, and it has the significant advantage that all characters are interpreted literally (and not as special characters like with regexp). strfind can be used quite compactly for your task:

rbCode = {'RB_DEP_LI_EQ_EED'; ...
          'RB_DEP_LI_EQ_EED_INV'; ...
          'RB_DEP_LI_EQ_EED_TRS'; ...
          'RB_DEP_LI_EQ_IED'
          'RB_DEP_LI_EQ_IED_INV'; ...
          'RB_DEP_LI_EQ_IED_TRS'; ...
          'RB_DEP_LI_FI_INV'};
strToFind ={'EQ';'_INV'};
>> fun = @(s)~cellfun('isempty',strfind(rbCode,s))
>> out = cellfun(fun,strToFind,'UniformOutput',false)
>> idx = all(horzcat(out{:}),2)
idx =
   0
   1
   0
   0
   1
   0
   0

2 Kommentare
Keine anzeigen Keine ausblenden

SpeedyGonzales am 9 Apr. 2015

Thanks Stephen, this is quite a nice way of achieving this problem.

Jos (10584) am 9 Apr. 2015

Bearbeitet: Jos (10584) am 9 Apr. 2015

In MATLAB Online öffnen

Here is a REGEXP way of solving this (but I also prefer the strained solution posted by Stephen!)

tf = cellfun(@(x) numel(x)==2,regexp(rbCode,sprintf('%s|',strToFind{:})))

Melden Sie sich an, um zu kommentieren.

Answer 2

Sven am 8 Apr. 2015

Bearbeitet: Sven am 8 Apr. 2015

In MATLAB Online öffnen

0 Stimmen

Hi Sven,

Here's how I would do it. It uses a call to regexp (you could also use strsplit) and one cellfun. The good thing about building a lookup table is that if your data is very large, there will only be an initial "hit" in terms of string comparison processing to build the lookup table, and after that you will be dealing with logical masks so any subsequent questions you ask (say, if you chose a different pair of codes) will be very efficient:

% Set up your problem
rbCode = {'RB_DEP_LI_EQ_EED'; ...
    'RB_DEP_LI_EQ_EED_INV'; ...
    'RB_DEP_LI_EQ_EED_TRS'; ...
    'RB_DEP_LI_EQ_IED'
    'RB_DEP_LI_EQ_IED_INV'; ...
    'RB_DEP_LI_EQ_IED_TRS'; ...
    'RB_DEP_LI_FI_INV'};
rbMarketValues = [100; 80; 20; 70; 40; 30; 20];
strToFind ={'EQ';'INV'};
% Find all codes and build a lookup table showing which are present
rbToks = regexp(rbCode,'_','split');
codes = unique([rbToks{:}]); % You won't need this if you already have a list
codeLookup = cell2mat(cellfun(@(tok)ismember(codes,tok),rbToks,'Un',0));
% Look in your lookup to see which entries have both codes
hasBothMask = all(codeLookup(:,ismember(codes,strToFind)),2);
sumDeptInv = sum(rbMarketValues(hasBothMask))

Did that make sense?

Thanks,

Sven.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

SpeedyGonzales am 9 Apr. 2015

Thanks Sven! this answer works as well. I guess learning more about regexp would be quite beneficial for me as it seems to be very flexible and powerful.

Melden Sie sich an, um zu kommentieren.

Find strings that contain multiple substrings at the same time

2 Kommentare
Keine anzeigen Keine ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigen Keine ausblenden

Weitere Antworten (1)

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Kategorien

Produkte

Tags

Community Treasure Hunt

Find strings that contain multiple substrings at the same time

2 Kommentare Keine anzeigen Keine ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigen Keine ausblenden

Weitere Antworten (1)

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Kategorien

Produkte

Tags

Siehe auch

Community Treasure Hunt

2 Kommentare
Keine anzeigen Keine ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden