Hi, so I have a cell string with 100 X 1 like:
18WABO1-12345-0X
18WABO2-12345-0N
18WACE3-12345-00
18WACE4-12345-0R
18WAGUG-12345-0G
18WDUER-12345-0N
I would like to find the string sequence that is always between 18W and first - so the result is:
ABO1
AB02
ACE3
ACE4
AGUG
DUER etc
my example of a code:
%
somestring(:)= eic_p;
underscore_indices= strfind(somestring,'18W');
underscore_indices=cell2mat(underscore_indices);
fs_indices = strfind(somestring,'-');
fs_indices=fs_indices';
your_number=cellfun(@(v)v(1),fs_indices);
somestring(:)= somestring';
for i=1:length(fs_indices)
yourNumber= somestring{i}(underscore_indices(i)+2:your_number(i)-1);
%HOW i can save every iteration? thanks
end
in the last for loop somehow I am getting the weird output and can not save all results so I can have all those 205 abbreviations in one variable (yourNumber).
Thanks a lot,

 Akzeptierte Antwort

per isakson
per isakson am 18 Okt. 2017
Bearbeitet: per isakson am 18 Okt. 2017

1 Stimme

yourNumber is overwritten in the loop and only the last value is saved. The first step to fix your code is
yourNumber = cell( length(fs_indices), 1 );
for i = 1 : length(fs_indices)
yourNumber{i} = somestring{i}(underscore_indices(i)+2:your_number(i)-1);
end
There are other ways, e.g. with regular expressions
>> str = '18WABO1-12345-0X';
>> regexp( str, '(?<=18W)[^\-]+(?=\-)', 'match' )
ans =
'ABO1'
and
cac = {
'18WABO1-12345-0X'
'18WABO2-12345-0N'
'18WACE3-12345-00'
'18WACE4-12345-0R'
'18WAGUG-12345-0G'
'18WDUER-12345-0N' };
%
out = regexp( cac, '(?<=18W).+?(?=\-)', 'match' );
out = cat( 1, out{:} );
and
>> out
out =
'ABO1'
'ABO2'
'ACE3'
'ACE4'
'AGUG'
'DUER'
and with indexing
>> str = char( cac );
>> str = str( :, 4:7 )
str =
ABO1
ABO2
ACE3
ACE4
AGUG
DUER
>>

10 Kommentare

Cedric
Cedric am 18 Okt. 2017
Bearbeitet: Cedric am 18 Okt. 2017
No need for the look forward/around with your first pattern, and you can add the option 'once' to avoid the CAT. And if you need to debug it .. well I'm not really sure .. yet ;)
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match', 'once' )
out2 =
6×1 cell array
{'ABO1'}
{'ABO2'}
{'ACE3'}
{'ACE4'}
{'AGUG'}
{'DUER'}
Stephen23
Stephen23 am 18 Okt. 2017
Bearbeitet: Stephen23 am 18 Okt. 2017
I tried this myself, and came up with the almost the same reg exp, just with the ^ to match the start:
regexp(C,'(?<=^18W)[^-]+','once','match')
per isakson
per isakson am 18 Okt. 2017
Bearbeitet: per isakson am 18 Okt. 2017
Yes, the look-ahead is overkill and 'once' will save a microsecond. With long strings 'once' makes a significant difference.
However, regexp with or without 'once' returns a cell array of scalar cell arrays, which in turn contain the strings. cat "flattens" the cell array.
Cedric
Cedric am 18 Okt. 2017
Bearbeitet: Cedric am 18 Okt. 2017
There is always one more level without the 'once':
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match' )
out2 =
6×1 cell array
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match', 'once' )
out2 =
6×1 cell array
{'ABO1'}
{'ABO2'}
{'ACE3'}
{'ACE4'}
{'AGUG'}
{'DUER'}
.. you probably forgot to copy one of the lines (call to CAT) when you copy-pasted your example from the command window.
per isakson
per isakson am 18 Okt. 2017
Bearbeitet: per isakson am 18 Okt. 2017
"There is always one more level without the 'once':" Yes, that's correct. Now, I'll remember. A pity there isn't a strike-out feature.
One thing still puzzles me
out2 =
6×1 cell array
{'ABO1'}
why the braces around 'AB01'. Here on R2016a I get
>> out2
out2 =
'ABO1'
Have The MathWorks changed the display format?
Cedric
Cedric am 18 Okt. 2017
Bearbeitet: Cedric am 18 Okt. 2017
Wow, you're right, I had never realized, or already forgotten(!) My output is from 2017b, but I was on 2016b until very recently .. I'm wondering if I didn't pay attention or if the update was between 2016a/b (?)
sensation
sensation am 19 Okt. 2017
Thanks a lot guys for your answers! One quick question: can you just briefly eleborate (?<=18W)[^-]+ ?, or where I can find those expressions when I should use ? ^ or/and +. Thanks!
Cedric
Cedric am 19 Okt. 2017
Bearbeitet: Cedric am 19 Okt. 2017
Look at my comment starting with "Not far" here for a brief summary.
Understanding this, you will understand that
  • (?<=..) is a look-behind and (?<=18W) imposes that what is matched (by the rest of the pattern) is preceded by 18W
  • [^..] defines a set of elements not to match, so [^-] matches all characters but the dash.
  • + is a quantifier that means one or more times the expression that precedes directly (which is [^-])
So the whole thing reads: match one or more character that is not a dash (which translates into "read all until a dash"), preceded by the literal 18W.
Stephen23
Stephen23 am 19 Okt. 2017
"where I can find those expressions when I should use ? ^ or/and +."
By reading the documentation ten times:
And then read it another ten times. And practice lots.
Regular expressions are powerful and very useful, but they require practice and attention to detail. Study that page I linked to, and the other pages that it links to as well.
sensation
sensation am 19 Okt. 2017
Thanks!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Characters and Strings finden Sie in Hilfe-Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by