how to search for multiple words anywhere in the sentence ?

I want to search for three words "Battery , power , failure" the three must exist in the sentence in any order to copy the cell .
I try :
j=1;
k=1;
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:); %save rows which didn't contain
but it search for any cell contains for one of the three.
how i can search for the cells contains the three words in any order?

2 Kommentare

where is per isakson comment ...!!
Is there any function do it instead of (reqexpi) ?

Melden Sie sich an, um zu kommentieren.

Antworten (3)

the cyclist
the cyclist am 19 Sep. 2015

0 Stimmen

The most straightforward way, it seems to me, is to do the regexp search three times, once for each word, and then copy the cells where all three match. I am not sure there is a way to do an "and" match in the same way one can do an "or" match like you have done.

2 Kommentare

thanks for your idea , but that's waste more time
thanks to you all...
I take your advice "to do the regexp search three times, once for each word"
and try this:
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
%2nd word
D2=data(:,126:130);
idx2 = cellfun('isclass',D2,'char');
idx2(idx2)=~cellfun('isempty',regexpi(D2(idx2),'power')) ;
data2 = data(any(idx2,2),:);
Notdata2 = data(~any(idx2,2),:);
%3rd word
D3=data2(:,126:130);
idx3 = cellfun('isclass',D3,'char');
idx3(idx3)=~cellfun('isempty',regexpi(D3(idx3),'failure')) ;
data3 = data2(any(idx3,2),:);
Notdata3 = data2(~any(idx3,2),:);
NotdataALL=[Notdata;Notdata2;Notdata3];
but I am still thinking, may be the three words not exist in the same cell.
I mean 126= battery 127: power 128= failure
but overall the code now sounds good :)

Melden Sie sich an, um zu kommentieren.

per isakson
per isakson am 19 Sep. 2015
Bearbeitet: per isakson am 20 Sep. 2015
Try this
sentence_1 = 'abc battery def power ghi failure';
typo_str_1 = 'abc battery def power ghi faiXure';
sentence_2 = 'Battery def power ghi failure.';
typo_str_2 = 'abc Xbattery def power ghi failure';
words = {'battery','power','failure'};
is1 = cellfun( @(str) not(isempty(regexpi( sentence_1, ['\<',str,'\>'] ))), words );
is2 = cellfun( @(str) not(isempty(regexpi( typo_str_1, ['\<',str,'\>'] ))), words );
is3 = cellfun( @(str) not(isempty(regexpi( sentence_2, ['\<',str,'\>'] ))), words );
is4 = cellfun( @(str) not(isempty(regexpi( typo_str_2, ['\<',str,'\>'] ))), words );
&nbsp
A different approach
>> cssm(1)
Elapsed time is 0.001078 seconds.
ans =
1 0 0 1 0 0
>> cssm(1e3);
Elapsed time is 0.791887 seconds.
where
function has_all_three = cssm( N )
sentence_1 = 'Abc battery def power ghi failure.';
typo_str_1 = 'Abc battery def power ghi faiXure.';
multistr_1 = 'Abc battery def power ghi battery.';
sentence_2 = 'Battery def failure ghi power jkl.';
typo_str_2 = 'Abc Xbattery def power ghi failure';
multistr_2 = 'Abc power def power ghi power jkl.';
%
test_sentences = {sentence_1,typo_str_1,multistr_1,sentence_2,typo_str_2,multistr_2};
%
text_corp = repmat( test_sentences, [N,1] );
tic
cac = regexpi( text_corp, ['\<(battery)|(power)|(failure)\>'], 'match' );
has_all_three = cellfun( @(c) length(unique(lower(c)))==3, cac );
toc
end

12 Kommentare

thanks ... but thats not what i want .
I have about (57000*6 cell)
per isakson
per isakson am 19 Sep. 2015
Bearbeitet: per isakson am 19 Sep. 2015
"... but thats not what i want"
Then you need to better explain what you want. And also explain why my hint isn't useful to you.
I only need to modify this line:
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
to allow me seaching for the cells contains the three words in any arrangement .
and in the same time to save the sequence of the hole code
Because he wants a magic solution.
Amr Hashem
Amr Hashem am 19 Sep. 2015
Bearbeitet: Amr Hashem am 19 Sep. 2015
No, my friend. I didn't want a magic solution.
I only want to solve this problem
I try :
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
exp = {'battery';...
'failure';...
'power'};
idx(idx)=~cellfun('isempty',regexpi(D(idx),exp,'match')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
but it didn't work, with an error:
??? Error using ==> regexpi
Multiple strings and patterns given to regexpi must have
the same quantity.
per isakson
per isakson am 19 Sep. 2015
Bearbeitet: per isakson am 19 Sep. 2015
The task is: &nbsp "search for three words "Battery, power, failure" the three must exist in the sentence in any order". &nbsp Is that correct?
"I have about (57000*6 cell)" &nbsp How are that cell array related to alldata(:,126:130)? Thus, with one sentence per cell, you have 0.342 million sentences(?). What is an acceptable execution time?
"I only need to modify this line:" &nbsp You need at least to explain what you expect the line to do! Why should I guess?
"I only want to solve this problem" &nbsp What problem? Why only? What make you think that it is even possible to accomplish the task with a code along the lines, which you propose? I don't think it is possible!
btw: "Xbattery" should that match "battery"?
thank you per isakson for your contribution
\
I already use this code -searching for one word - in the whole file (57000*6 cells ) and it works.
#
I am now want to search for the three words (as I explain above )
#
and I mean by only modifing this line (as I mentioned above):
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
#
I am now asking is it possible to modify the code or not? or in another words, is there any function i can use to search for multiple strings instead of (regexpi) or not ?
#
thanks in advance
"I am now asking is it possible to modify the code or not? " &nbsp I repeat: I don't think it is possible!
per isakson
per isakson am 20 Sep. 2015
Bearbeitet: per isakson am 20 Sep. 2015
Three words in any order is a tough job for regexp. &nbsp "to do the regexp search three times, once for each word" &nbsp is a sound approach and I cannot understand why you dismissed it.
Cedric
Cedric am 20 Sep. 2015
Bearbeitet: Cedric am 20 Sep. 2015
I agree with Per, and I am adding that it is often more efficient to make multiple calls of REGEXP(I) that involve simple patterns, than to make a single call that involves a rather complex pattern.
I added a new code to my answer.
I noticed ... thanks

Melden Sie sich an, um zu kommentieren.

Amr Hashem
Amr Hashem am 20 Sep. 2015
that's work:
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
%2nd word
D2=data(:,126:130);
idx2 = cellfun('isclass',D2,'char');
idx2(idx2)=~cellfun('isempty',regexpi(D2(idx2),'power')) ;
data2 = data(any(idx2,2),:);
Notdata2 = data(~any(idx2,2),:);
%3rd word
D3=data2(:,126:130);
idx3 = cellfun('isclass',D3,'char');
idx3(idx3)=~cellfun('isempty',regexpi(D3(idx3),'failure')) ;
data3 = data2(any(idx3,2),:);
Notdata3 = data2(~any(idx3,2),:);
NotdataALL=[Notdata;Notdata2;Notdata3];

1 Kommentar

This can be simplified as developed in my answer. I move it below as a comment:
Here is an alternate solution:
keywords = {'battery', 'power', 'failure'} ;
allCells = {'V_batterypowerfailure', 'I_batterypwerfailure'; ...
'V_batterypowerfailure', 'I_atterypowerfailure'; ...
'I_batterypowerfailre', 'V_batterypowerfailure'} ;
ids = 1 : numel( allCells ) ;
for k = 1 : numel( keywords )
isFound = ~cellfun( 'isempty', strfind( allCells(ids), keywords{k} )) ;
ids = ids(isFound) ;
end
validCells = allCells(ids) ;
You'll notice that it works on a pool of cells which reduces with the keyword index (as when a keyword is not found, there is no point in testing the others). I started valid entries of the dummy data set with V_ and invalid entries with I_ to simplify the final check.
If you need a case-insensitive solution, replace
strfind( allCells(ids), keywords{k} )
with
regexpi( allCells(ids), keywords{k}, 'once' )

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Characters and Strings finden Sie in Hilfe-Center und File Exchange

Gefragt:

am 19 Sep. 2015

Kommentiert:

am 22 Sep. 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by