Regex: How can I perform positive lookbehind for a specific sequence of characters?
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
EDIT: Changed 'Negative lookbehind' to 'Positive lookbehind'
Hi,
I am attempting to seperate the first name from a list of names, using regex. The format of the names is as follows:
<last name>, <title>. <first name> <middle names> (<other name>)
Where <middle names> and (<other name>) are optional.
I'm new to regex, and currently finding it hard to intuit. It seems to me that I need a positive lookbehind to capture the word preceded by a '.' followed by a 'whitespace' in order to capture the first names, but its not working how I'd like! See code below:
load titanic.mat
% Attempt #1 (Matches words preceded by'.' characters OR whitespace characters -
% I need it to match '.' followed by a whitespace... how???
name_first = regexp(train.Name, '(?<=[\.\s])([A-Z][a-z]+)', 'match')
% Attempt #2 (Captures unwanted '. ' before first names)
name_first2 = regexp(train.Name, '\.\s([A-Z][a-z]+)', 'match')
% Attempt #2 (Attempt to capture 3rd word, doesn't work)
name_first3 = regexp(train.Name, '(\w.*\w){3}', 'match')
Alternative solutions are great, but ideally I'd like to understand WHY my current code doesn't work (specifically attempt #1), and how I might be able to make it work using the negative lookbehind to lookbehind for a specific sequence of characters (i.e. return a word preceded by 'abc').
Thanks in advance for your help.
4 Kommentare
Walter Roberson
am 14 Sep. 2021
Bearbeitet: Walter Roberson
am 14 Sep. 2021
% I need it to match '.' followed by a whitespace... how???
Using
name_first = regexp(train.Name, '(?<=\.\s)([A-Z][a-z]+)', 'match')
But consider making it \s+ instead of \s .
Also, are you sure you do not need to handle names with apostrophe like O'Rorke ? Are you sure you do not need to handle names with dashes, like Fitz-Williams ? Are you sure you do not need to handle surnames with spaces, such as van Horton ? Which, incidentally, is also an example of a name that starts with lower-case.
Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!