Regexp expression to handle changing format
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
%dummy data
% t,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501
% t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501
S=fileread(filename);
myexpression = ['(?<tvar>w*,'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\w*\.*\w*),'...
'(?<HNL>\w*),'...
'(?<codeTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*,'... % <== This line handles the first line of dummy data
'(?<caprTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*\s*\d*,'... % <== This line handles the first line of dummy data
'(?<logAt>\w*\.*\w*']
parts = regexp(filtered,myexpression,'names')
The third and second to last variables (codeTm, caprTm) change formats within the data. How can I modify or add logic to accept 2 to 3 spaced values within the variable "codeTm" and 3 to 4 spaced values within variable "caprTm"???
2 spaced valued variable (000 00:00:00.00)
3 spaced valued variable (000 00 00:00:00.00) or (343 19:54:20.684 8)
4 spaced valued variable (21 343 19:54:20.684 8)
Thank you for the help. My apologies for making my expresion so complicated. Still learning the in's and out's for expression formats for regexp to read data.
2 Kommentare
Stephen23
am 7 Mär. 2022
It is not clear why you are using regular expressions for importing this data: READTABLE et al have options for handling missing field data. Having you considered using the inbuilt data importing functions?
Antworten (1)
Stephen23
am 7 Mär. 2022
Bearbeitet: Stephen23
am 7 Mär. 2022
You can easily make a group optional or occur a specific number of times using any suitable quantifier, for example:
(..)? % zero or one time
(..)* % zero or more times
(..){2,4} % two to four times
etc.
However, rather than trying to match specific groups of characters I would use a simpler approach of matching sets of characters. I had to fix several other bugs in your regular expression to get this working, mostly missing backslashes and parentheses.
str = fileread('test.txt')
rgx = ['^\s*(?<tvar>\w*),'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\d*\.?\d*),'...
'(?<HNL>\w*),'...
'(?<codeTm>[ :\w\.]*),'...
'(?<caprTm>[ :\w\.]*),'...
'(?<logAt>\d*\.?\d*)'];
parts = regexp(str,rgx,'names','lineanchors')
parts.codeTm
But personally I would not try and reinvent the wheel for such a data file, READTABLE is much simpler:
tbl = readtable('test.txt','delimiter',',');
tbl.Properties.VariableNames = {'tvar','tmCodeRdr','tmCodLv','HNL','codeTm','caprTm','logAt'}
Siehe auch
Kategorien
Mehr zu Characters and Strings finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!