Using regexprep to clean up MATLAB code formatting

12 Ansichten (letzte 30 Tage)
DGM
DGM am 31 Jan. 2022
Kommentiert: DGM am 31 Jan. 2022
I was trying to put together something to fix operator spacing in a bunch of old .m files. I'm reducing this problem example to simply adding spaces around instances of = and ==. I want to ignore matches within quotes, but I realized that transposition operators on the same line mess up any sort of lookahead/lookbehind quote-counting approach I can think of.
Is there even a good way to deal with this using regex? Is there some sort of code formatting tool that I can use to accomplish this instead?
intext = sprintf(['don''t pad \n█ = █\n█ = █\n %% █=█\n''█=█''\n''█=█'' A.''\nA.'' ''█=█''\n' ...
'add pad\n█=█ A.''\nA.'' █=█\n█=█\n█==█']);
% only operate on uncommented lines
alltext = split(intext,newline);
ncom = cellfun(@isempty,(regexp(alltext,'^\s*%.*','match')));
niq = '(?=([^'']*''[^'']*'')*[^'']*$)'; % not in single quotes
alltext(ncom) = regexprep(alltext(ncom),['(?<=[^~=<>\s])=' niq],' ='); % rhs of = or ==
alltext(ncom) = regexprep(alltext(ncom),['=(?=[^=\s])' niq],'= '); % lhs
[split(intext,newline) alltext]
ans = 12×2 cell array
{'don't pad '} {'don't pad ' } {'█ = █' } {'█ = █' } {'█→=→█' } {'█→=→█' } {' % █=█' } {' % █=█' } {''█=█'' } {''█=█'' } {''█=█' A.'' } {''█ = █' A.''} {'A.' '█=█'' } {'A.' '█=█'' } {'add pad' } {'add pad' } {'█=█ A.'' } {'█=█ A.'' } {'A.' █=█' } {'A.' █ = █' } {'█=█' } {'█ = █' } {'█==█' } {'█ == █' }
I'm pretty much an absolute novice with regex, and this tool is likely only going to be used once, so I'm avoiding making the regex more complicated than I can understand well enough to have confidence in it. To that end, I'm simply using masking to ignore commented lines.
  7 Kommentare
DGM
DGM am 31 Jan. 2022
@Star Strider Yeah. Disregarding the inelegance of the kludge I've made so far, I can deal with the pre-spaced cases. It's the exclusion of operators within quoted substrings that I'm struggling with.
I decided to flag lines containing both quotes and targeted operators so that they can be reviewed. Since I don't have one guaranteed safe way to handle such lines, I can just present the user (me) with the option to quickly select from multiple format attempts with the option to discard all attempts and manually edit the line.
After a bit of observation, the vast majority of such cases present identifiable patterns and can be handled programmatically without prompting. The majority of remaining cases can be reviewed with a single keystroke. Out of about 100k lines, it took me about 30 minutes to grind through all the files.
I feel bad about taking the "avoidance for dummies" route, but the last thing I need is another project of the scale that a proper solution would require. Still, I can't say avoidance isn't a learning experience. The lesson here is to do a better job of formatting to begin with.
DGM
DGM am 31 Jan. 2022
@Stephen For what it's worth, I did check out fparser(). While I never managed to get it to run without dumping errors, It had some useful bits in it. I'm guessing some things just broke since it's been unmaintained for so many years.

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Kategorien

Mehr zu Programming finden Sie in Help Center und File Exchange

Tags

Produkte


Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by