How to build a pattern expression that does NOT match another pattern?

18 Ansichten (letzte 30 Tage)
As an exercise to learn pattern matching, I was trying to make a pattern that matches valid variable names. Here are the rules I'm trying to encode in a pattern:
  1. A valid variable name begins with a letter and contains not more than namelengthmax characters.
  2. Valid variable names can include letters, digits, and underscores.
  3. MATLAB keywords are not valid variable names.
I'm stuck trying to figure out how to make the pattern not match MATLAB keywords. Is this possible as of R2020b? Any workarounds? Here's what I was trying to go for...
function pat = varnamePattern
varchars = lettersPattern(1) + asManyOfPattern( alphanumericsPattern(1) | "_" , 0 , namelengthmax - 1 ); %rules 1 & 2
keywords = pattern(iskeyword); %pattern that matches MATLAB keywords
varname = varchars & ~keywords; %<-- invalid syntax. '&' and '~' not supported.
pat = namedPattern(varname,'varname','A valid MATLAB variable name')
end
As far as I can tell, wildcardPattern - with the optional parameter "Except" - is the only object function that considers "not" rules like #3. Unfortunately, the parameter value cannot be a string or cell array, and wildcardPattern seems meant for individual characters since it's lazy. I may be able to write an expression with regexpPattern, but I have no idea what that regular expression would look like.

Akzeptierte Antwort

Walter Roberson
Walter Roberson am 8 Mär. 2021
A word is not a valid MATLAB variable name if:
  • it is more than namelengthmax characters no matter what those characters are
  • it is one of the keywords; or
  • any character is something that is not letters, digits, underscores; or
  • it starts with a non-letter
Each of those should be easy to construct. For example "more than namelengthmax" is the "any character" pattern namelengthmax+1 or more times. Wildcard "Except" letters, digits, underscore is the "any character that is not" rule.
  2 Kommentare
Zachary
Zachary am 11 Mär. 2021
Bearbeitet: Zachary am 11 Mär. 2021
Tried out your suggestion.
function pat = notvarnamePattern
%NOTVARNAMEPATTERN Creates a pattern that matches anything but a valid MATLAB variable name.
% A valid variable name begins with a letter and contains not more than
% namelengthmax characters. Valid variable names can include letters,
% digits, and underscores. MATLAB keywords are not valid variable names.
%
% SYNTAX
% pat = notvarnamePattern returns a pattern array that does not match a
% valid MATLAB variable name.
%
% OUTPUT ARGUMENTS
% pat: 20x1 pattern array (because there are 20 MATLAB keywords.
%A pattern that matches names with invalid characters.
invalidname = maskedPattern(...
wildcardPattern(0, 1, "Except", lettersPattern) +... 1st character is not a letter
asManyOfPattern( wildcardPattern(1, "Except", alphanumericsPattern | "_") ),... remaining characters are not letters, digits, or underscores
'invalidname');
%A pattern that matches names that are too long.
invalidlength = maskedPattern(...
wildcardPattern(namelengthmax+1,inf),...
'invalidlength');
%A pattern that matches MATLAB keywords.
keywords = pattern(iskeyword);
%Combine patterns.
pat = invalidname | invalidlength | keywords;
end
I can match variable names with not (~) ...
str = ["a" "1" repelem('a',1,100) "break"];
Li = ~matches(str, notvarnamePattern);
...but the double-negative is a bit harder to read. Still wondering if there's a way to make a pattern that matches variable names rather than not match.
Walter Roberson
Walter Roberson am 11 Mär. 2021
Sure.
Any of these patterns:
any of 'adhjklmnquvxyzADHJKLMNQUVXYZ' followed by up to maxnamelengthmax-1 copies of (alphanumericsPattern | "_") followed by end of pattern
'b' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqstuvwxyz' followed by up to maxnamelengthmax-2 copies of (alphanumericsPattern | "_") followed by end of pattern
'br' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdfghijklmnopqrstuvwxyz' followed by up to maxnamelengthmax-3 copies of (alphanumericsPattern | "_") followed by end of pattern
'bre' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijklmnopqrstuvwxyz' followed by up to maxnamelengthmax-4 copies of (alphanumericsPattern | "_") followed by end of pattern
'brea' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijlmnopqrstuvwxyz' followed by up to maxnamelengthmax-5 copies of (alphanumericsPattern | "_") followed by end of pattern
'break' followed by at least 1 and at most maxnamelengthmax-5 copies of (alphanumericsPattern | "_") followed by end of pattern
'c' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijkmnopqrstuvwxyz' followed by up to maxnamelengthmax-2 copies of (alphanumericsPattern | "_") followed by end of pattern
'ca' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqruvwxyz' followed by up to maxnamelengthmax-3 copies of (alphanumericsPattern | "_") followed by end of pattern
'cas' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdfghijklmnopqrstuvwxyz' followed by up to maxnamelengthmax-4 copies of (alphanumericsPattern | "_") followed by end of pattern
'case' followed by at least 1 and at most maxnamelengthmax-4 copies of (alphanumericsPattern | "_") followed by end of pattern
'cat' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghijklmnopqrstuvwxyz' followed by up to maxnamelengthmax-4 copies of (alphanumericsPattern | "_") followed by end of pattern
'catc' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefgijklmnopqrstuvwxyz' followed by up to maxnamelengthmax-5 copies of (alphanumericsPattern | "_") followed by end of pattern
'catch' followed by at least 1 and at most maxnamelengthmax-5 copies of (alphanumericsPattern | "_") followed by end of pattern
'cl' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijklmnopqrstuvwxyz' followed by up to maxnamelengthmax-3 copies of (alphanumericsPattern | "_") followed by end of pattern
'cla' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrtuvwxyz' followed by up to maxnamelengthmax-4 copies of (alphanumericsPattern | "_") followed by end of pattern
'class' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijlmnopqrtuvwxyz' followed by up to maxnamelengthmax-5 copies of (alphanumericsPattern | "_") followed by end of pattern
'classd' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcefghijlmnopqrstuvwxyz' followed by up to maxnamelengthmax-6 copies of (alphanumericsPattern | "_") followed by end of pattern
'classde' followed by any of 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdeghijlmnopqrstuvwxyz' followed by up to maxnamelengthmax-7 copies of (alphanumericsPattern | "_") followed by end of pattern
'classdef' followed by at least 1 and at most maxnamelengthmax-7 copies of (alphanumericsPattern | "_") followed by end of pattern
and so on.
Not a "not match" in sight.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Characters and Strings finden Sie in Help Center und File Exchange

Produkte


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by