extracting numbers after the particular string from cell array
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
data={'333', 'AS C37 2021 03 28 00 05 30.000000 1 -0.884071511631E-03','abvc','400 55 a','AS G17 2021 3 28 0 17 30.000000 1 0.416843065644E-03'};
For example in the above cell array, how can I extract all YYYY MM DD HH MM SS (2021 03 28 00 05 30.00 and 2021 3 28 0 17 30.0)?
The related YYYY MM DD HH MM SS values always comes after AS [A-Z][0-9][0-9] (for example, AS C37 and AS G17). So, can we define the codes for extracting these values following this rule? The original size of the data cell array is 1x400000, therefore the speed is also an important factor.
6 Kommentare
dpb
am 2 Jul. 2021
There may well be (probably is, no undoubtedly is) code to read these files available -- they might already have a MATLAB routine, even. Have you looked for what routines are available?
Akzeptierte Antwort
dpb
am 2 Jul. 2021
Bearbeitet: dpb
am 2 Jul. 2021
Oh. I see I didn't look far enough down the file -- the header stuff ends at record 170; the other data starts at record 171.
tCOD=readtable('COD0MGXFIN_20210870000_01D_30S_CLK.clk','FileType','text', ...
'headerlines',170,'readvariablenames',0);
tCOD.Properties.VariableNames(3:8)={'Yr','Mn','Day','Hr','Min','Sec'};
tCOD.DateTime=datetime(tCOD{:,{'Yr','Mn','Day','Hr','Min','Sec'}});
leaves you with
>> [head(tCOD);tail(tCOD)]
ans =
16×12 table
Var1 Var2 Yr Mn Day Hr Min Sec Var9 Var10 Var11 DateTime
______ _____________ ____ __ ___ __ ___ ___ ____ ___________ __________ ____________________
{'AR'} {'BADG00RUS'} 2021 3 28 0 0 0 2 0.00044149 3.7396e-11 28-Mar-2021 00:00:00
{'AR'} {'ABMF00GLP'} 2021 3 28 0 0 0 2 -0.00024309 3.739e-11 28-Mar-2021 00:00:00
{'AR'} {'AJAC00FRA'} 2021 3 28 0 0 0 2 -0.00038427 3.7166e-11 28-Mar-2021 00:00:00
{'AR'} {'ALIC00AUS'} 2021 3 28 0 0 0 2 -2.4277e-09 3.7381e-11 28-Mar-2021 00:00:00
{'AR'} {'AMU200ATA'} 2021 3 28 0 0 0 2 -2.9659e-08 3.7474e-11 28-Mar-2021 00:00:00
{'AR'} {'ANKR00TUR'} 2021 3 28 0 0 0 2 1.9425e-08 3.7349e-11 28-Mar-2021 00:00:00
{'AR'} {'AREG00PER'} 2021 3 28 0 0 0 2 0.00046999 3.7485e-11 28-Mar-2021 00:00:00
{'AR'} {'ASCG00SHN'} 2021 3 28 0 0 0 2 -3.5686e-08 3.7378e-11 28-Mar-2021 00:00:00
{'AS'} {'R16' } 2021 3 28 23 59 30 1 -1.3127e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R17' } 2021 3 28 23 59 30 1 0.00041179 NaN 28-Mar-2021 23:59:30
{'AS'} {'R18' } 2021 3 28 23 59 30 1 7.1344e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R19' } 2021 3 28 23 59 30 1 -0.00013759 NaN 28-Mar-2021 23:59:30
{'AS'} {'R20' } 2021 3 28 23 59 30 1 -4.6221e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R21' } 2021 3 28 23 59 30 1 -0.00019777 NaN 28-Mar-2021 23:59:30
{'AS'} {'R22' } 2021 3 28 23 59 30 1 -0.00010502 NaN 28-Mar-2021 23:59:30
{'AS'} {'R24' } 2021 3 28 23 59 30 1 3.6747e-05 NaN 28-Mar-2021 23:59:30
>>
There are only two (2) variables past the time field at the end of the table instead of three (3), hence the NaN elements for Var11.
You can either scan the file for the location of the "END OF HEADER" record to find the number of headerlines to skip or the probably is sufficient data within the file header to compute where that is -- although if the COMMENTS are freeform, there may not be a fixed number of records there and so it may just take scanning the file first...
Either way, this is much simpler and straightforward than trying to parse the cell array...that's fraught with difficulty in comparison.
0 Kommentare
Weitere Antworten (1)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!