Find repeated expression in array of strings, return logical.

3 Ansichten (letzte 30 Tage)
Tobias
Tobias am 21 Dez. 2017
Bearbeitet: Stephen23 am 3 Jan. 2018
I have data of the type
looking_for = ["apple", "melon"]
in
my_data = ["The apple is red", "The bee was yellow", "I am eating a melon", "The melon is sweet"]
with
timing = [2.5, 5, 10, 18]
I want to find when a regular expression was repeated consecutively and then return a logical index that pertains to the first observation of the repetition.
My approach:
1) Find out if the string contains one of the regular expression in looking_for, e.g. melon. I solve this using
idx = cellfun(@(x)( ~isempty(x) ), regexp(my_data, "apple"));
2) Then i transpose and multiply my indexing with the timing to get the relevant timings & remove the zeros (not shown here)
apple_timing = transpose(idx).*timing;
Which would give me a cell called apple_timing with a value of 2.5, which is exactly what I want.
I would like a bit of code that returns a variable called repeat_timing. In the case of the melon, this would return 18 - the first observed consecutive repeat of the regular expression melon.
  1 Kommentar
Jos (10584)
Jos (10584) am 22 Dez. 2017
huh, I don't see apple being repeated in your strings?
And why do you use cellfun and regexp rather than the dedicated string find function CONTAINS which returns a logical array directly?
contains(my_data, looking_for) % → [1 0 1 1]

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Stephen23
Stephen23 am 22 Dez. 2017
Bearbeitet: Stephen23 am 22 Dez. 2017
Here is one solution based around cumsum:
% Data:
LF = {'apple', 'melon'};
MD = {'The apple is red','The bee was yellow','I am eating a melon','The melon is sweet'};
TV = [2.5, 5, 10, 18];
% Locate patterns:
fun = @(p)~cellfun('isempty',strfind(MD,p));
BM = cell2mat(cellfun(fun,LF(:),'uni',0));
CS = cumsum(BM,2);
You can use this to identify the first, second, third, etc. times that a pattern occurs, and find the related timing value:
>> [R1,C1] = find(CS==1 & BM); % First occurrence.
>> LF{R1}
ans = apple
ans = melon
>> TV(C1)
ans =
2.5000 10.0000
>> [R2,C2] = find(CS==2 & BM); % Second occurrence.
>> LF{R2}
ans = melon
>> TV(C2)
ans = 18
You can easily automate this for an arbitrary number of matches, here I locate the first, second, and third occurrences (of which there are none in your sample data):
baz = @(n)find(CS==n & BM);
[row,col] = arrayfun(baz,1:3,'uni',0);
typ = cellfun(@(r)LF(r),row,'uni',0);
val = cellfun(@(c)TV(c),col,'uni',0);
giving:
>> typ{:}
ans =
'apple'
'melon'
ans =
'melon'
ans = {}
>> val{:}
ans =
2.5000 10.0000
ans = 18
ans = []
>>
  2 Kommentare
Tobias
Tobias am 27 Dez. 2017
Bearbeitet: Tobias am 29 Dez. 2017
Hi Stephen, and thanks for the answer.
However, your code does not seem addresses the constraint of finding consequetive repeats. E.g:
{'The apple is good', 'The apple is red', 'The bee has stripes'}
should lead to one consecutively repeated instance, while
{'The apple is good', 'The bee has stripes', 'The apple is red'}
should lead to none.
Stephen23
Stephen23 am 3 Jan. 2018
Bearbeitet: Stephen23 am 3 Jan. 2018
Ah, if you only want to identify adjacent cells then you do not need cumsum. A simple logical and will do the trick:
>> CS = BM & circshift(BM,1,2);
>> CS(:,1) = false;
>> [R1,C1] = find(CS)
R1 = 2
C1 = 4
>> LF{R1}
ans = melon
>> TV(C1)
ans = 18
>>

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Characters and Strings finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by