regexprep does not exactly what I want

Dear all,
I have the following cell array
Charge = {'OH-1'} {'KOH+0'} {'K+1'} {'I-1'} {'HI+0'} {'H3O+1'} {'H2O+0'}
I want to remove all information before the + and - signs. Therefore I tried the following:
regexprep(Charge,'[^-+].','');
which produces
{'-1'} {'0'} {'1'} {'1'} {'+0'} {'1'} {'0'}
This works well except in case of only one character in front of the minus sign (i.e. in case of I-1). In that case, the - sign is also deleted. The - signs are crucial to be included, the + signs not.
Any suggestions?
Thanks, Tim

 Akzeptierte Antwort

Daniel M
Daniel M am 16 Okt. 2019
Bearbeitet: Daniel M am 16 Okt. 2019

0 Stimmen

There's definitely a way to do it using regexprep, but I found this solution first, so hopefully it is sufficient.
Charge = {'OH-1','KOH+0','K+1','I-1','HI+0','H3O+1','H2O+0'};
c = regexp(Charge,'[-+]\w*','match');
cc = cat(2,c{:}); % put back into cell array

6 Kommentare

hyble
hyble am 16 Okt. 2019
Great stuff, many thanks for this!
regexprep(Charge,'^\w*','')
I'm not convinced that '[-+]\w*' is the right regexp. This constrains the symbols that follow the + or - to letters, digits or _. This restriction may or many not be appropriate.
The regexp that matches exactly your specification would be
regexp(Charge, '[+-].*', 'match', 'once')
Your original regexprep did not work at all. It basically said remove pairs of characters where the first character is anything but - or + and the 2nd one is anything. So looking at 'H3O+1', the first pair is 'H3', it doesn't start with a + or -, so is removed. the 2nd pair is 'O+'. Again, it doesn't start with a + or -, so is removed. Now with 'HI+1', the first pair is 'HI', removed, the 2nd one is '+1', starts with + so not removed. If you had something like 'H3O+01', it would have removed everything since the scan would remove 'H3', then 'O+', then '01'.
A regexprep that would have worked would be:
regexprep(Charge, '[^+]+\+|[^+]+(?=\-)', '')
Daniel M
Daniel M am 16 Okt. 2019
Bearbeitet: Daniel M am 16 Okt. 2019
Thanks Guillaume, I agree that '[-+]\w*' is not robust enough (which is why I voted for Stephen's solution), but it does satisfy his test case.
As for the comment on the regexprep, I'm not sure if you're referring to me or not. I wrote
regexprep(Charge,'^\w*','')
ans =
{'-1'} {'+0'} {'+1'} {'-1'} {'+0'} {'+1'} {'+0'}
which works. Your example however doesn't:
regexprep(Charge, '[^+]+\+|[^+]+(?=\-)', '')
ans =
{'-1'} {'0'} {'1'} {'-1'} {'0'} {'1'} {'0'}
As you can see it drops the sign of the charge.
Guillaume
Guillaume am 16 Okt. 2019
The comment about the regexprep referred to the original question, not your answer.
I wrote most of my comment shortly after you posted your answer but had to dash off to a meeting before posting it. When I finally posted it, it was a bit out of date. Sorry about that.
hyble
hyble am 17 Okt. 2019
Many thanks to all of you for helping out here!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Stephen23
Stephen23 am 16 Okt. 2019
Bearbeitet: Stephen23 am 16 Okt. 2019

1 Stimme

>> regexprep(Charge,'^[^-+]*','')
ans =
'-1' '+0' '+1' '-1' '+0' '+1'
>> regexp(Charge,'[-+].+$','once','match')
ans =
'-1' '+0' '+1' '-1' '+0' '+1'

1 Kommentar

Daniel M
Daniel M am 16 Okt. 2019
This will handle edge cases better than my solution above. More robust.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu MATLAB Coder finden Sie in Hilfe-Center und File Exchange

Tags

Gefragt:

am 16 Okt. 2019

Kommentiert:

am 17 Okt. 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by