extracting numbers after the particular string from cell array

Question

0 Stimmen

data={'333', 'AS C37       2021 03 28 00 05 30.000000  1   -0.884071511631E-03','abvc','400 55 a','AS G17  2021  3 28  0 17 30.000000  1    0.416843065644E-03'};

For example in the above cell array, how can I extract all YYYY MM DD HH MM SS (2021 03 28 00 05 30.00 and 2021 3 28 0 17 30.0)?

The related YYYY MM DD HH MM SS values always comes after AS [A-Z][0-9][0-9] (for example, AS C37 and AS G17). So, can we define the codes for extracting these values following this rule? The original size of the data cell array is 1x400000, therefore the speed is also an important factor.

6 Kommentare
4 ältere Kommentare anzeigen 4 ältere Kommentare ausblenden

dpb am 2 Jul. 2021

In MATLAB Online öffnen

Please move Answer to Comment. -- dpb

OK, here's a part of the file -- it's nice and regular within sections so parsing won't be any real problem -- but, what, specifically, do you want/need from the file?

3.04                 C                    M                      RINEX VERSION / TYPE
CCRNXC V5.3          AIUB                 10-APR-21 11:26        PGM / RUN BY / DATE 
Center for Orbit Determination in Europe (CODE)                  COMMENT             
MGEX clock information for day 2021-087                          COMMENT             
Consistent to the middle day of the 3-day long-arc solution      COMMENT             
Clock information consistent with phase and C1W/C2W code data    COMMENT             
Satellite/receiver clock values at intervals of 30/300 sec       COMMENT             
High-rate (30 sec) clock interpolation based on phase data       COMMENT             
Product reference: DOI 10.7892/boris.75882.3                     COMMENT             
   GPS                                                           TIME SYSTEM ID      
    18                                                           LEAP SECONDS GNSS   
C  GPSEST V5.3        IGS14                                      SYS / PCVS APPLIED  
E  GPSEST V5.3        IGS14                                      SYS / PCVS APPLIED  
G  GPSEST V5.3        IGS14                                      SYS / PCVS APPLIED  
J  GPSEST V5.3        IGS14                                      SYS / PCVS APPLIED  
R  GPSEST V5.3        IGS14                                      SYS / PCVS APPLIED  
C  GPSEST V5.3        CODE.OSB @ ftp.aiub.unibe.ch/CODE/         SYS / DCBS APPLIED  
E  GPSEST V5.3        CODE.OSB @ ftp.aiub.unibe.ch/CODE/         SYS / DCBS APPLIED  
G  GPSEST V5.3        CODE.OSB @ ftp.aiub.unibe.ch/CODE/         SYS / DCBS APPLIED  
J  GPSEST V5.3        CODE.OSB @ ftp.aiub.unibe.ch/CODE/         SYS / DCBS APPLIED  
R  GPSEST V5.3        CODE.OSB @ ftp.aiub.unibe.ch/CODE/         SYS / DCBS APPLIED  
     2    AR    AS                                               # / TYPES OF DATA   
COM  CODE MGEX                                                   ANALYSIS CENTER     
     1                                                           # OF CLK REF        
BADG00RUS 12338M002                           0.000000000000E+00 ANALYSIS CLK REF    
   134    IGb14                                                  # OF SOLN STA / TRF 
BADG00RUS 12338M002            -838282106  3865777325  4987624574SOLN STA NAME / NUM 
ABMF00GLP 97103M001            2919785797 -5383744943  1774604878SOLN STA NAME / NUM 
AJAC00FRA 10077M005            4696989194   723994777  4239678729SOLN STA NAME / NUM 
ALIC00AUS 50137M001           -4052052783  4212835969 -2545104517SOLN STA NAME / NUM 
AMU200ATA 66040M002                 57569     -201376 -6359569064SOLN STA NAME / NUM 
ANKR00TUR 20805M002            4121948390  2652187845  4069023877SOLN STA NAME / NUM 
AREG00PER 42202M008            1942816431 -5804077156 -1796884336SOLN STA NAME / NUM 
ASCG00SHN 30602M004            6121151566 -1563978954  -872615291SOLN STA NAME / NUM 
ASPA00USA 50503S006           -6100260188  -996502539 -1567977179SOLN STA NAME / NUM 
AUCK00NZL 50209M001           -5105681573   461563996 -3782180963SOLN STA NAME / NUM 

dpb am 2 Jul. 2021

There may well be (probably is, no undoubtedly is) code to read these files available -- they might already have a MATLAB routine, even. Have you looked for what routines are available?

sermet OGUTCU am 2 Jul. 2021

I just want to extract all dates YYYY MM DD HH MM SS (such as 2021 03 28 00 05 30.000000) from this cell array.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

dpb am 2 Jul. 2021

Bearbeitet: dpb am 2 Jul. 2021

In MATLAB Online öffnen

1 Stimme

Oh. I see I didn't look far enough down the file -- the header stuff ends at record 170; the other data starts at record 171.

tCOD=readtable('COD0MGXFIN_20210870000_01D_30S_CLK.clk','FileType','text', ...
                                            'headerlines',170,'readvariablenames',0);
tCOD.Properties.VariableNames(3:8)={'Yr','Mn','Day','Hr','Min','Sec'};
tCOD.DateTime=datetime(tCOD{:,{'Yr','Mn','Day','Hr','Min','Sec'}});

leaves you with

>> [head(tCOD);tail(tCOD)]
ans =
  16×12 table
     Var1         Var2          Yr     Mn    Day    Hr    Min    Sec    Var9       Var10         Var11             DateTime      
    ______    _____________    ____    __    ___    __    ___    ___    ____    ___________    __________    ____________________
    {'AR'}    {'BADG00RUS'}    2021    3     28      0     0      0      2       0.00044149    3.7396e-11    28-Mar-2021 00:00:00
    {'AR'}    {'ABMF00GLP'}    2021    3     28      0     0      0      2      -0.00024309     3.739e-11    28-Mar-2021 00:00:00
    {'AR'}    {'AJAC00FRA'}    2021    3     28      0     0      0      2      -0.00038427    3.7166e-11    28-Mar-2021 00:00:00
    {'AR'}    {'ALIC00AUS'}    2021    3     28      0     0      0      2      -2.4277e-09    3.7381e-11    28-Mar-2021 00:00:00
    {'AR'}    {'AMU200ATA'}    2021    3     28      0     0      0      2      -2.9659e-08    3.7474e-11    28-Mar-2021 00:00:00
    {'AR'}    {'ANKR00TUR'}    2021    3     28      0     0      0      2       1.9425e-08    3.7349e-11    28-Mar-2021 00:00:00
    {'AR'}    {'AREG00PER'}    2021    3     28      0     0      0      2       0.00046999    3.7485e-11    28-Mar-2021 00:00:00
    {'AR'}    {'ASCG00SHN'}    2021    3     28      0     0      0      2      -3.5686e-08    3.7378e-11    28-Mar-2021 00:00:00
    {'AS'}    {'R16'      }    2021    3     28     23    59     30      1      -1.3127e-05           NaN    28-Mar-2021 23:59:30
    {'AS'}    {'R17'      }    2021    3     28     23    59     30      1       0.00041179           NaN    28-Mar-2021 23:59:30
    {'AS'}    {'R18'      }    2021    3     28     23    59     30      1       7.1344e-05           NaN    28-Mar-2021 23:59:30
    {'AS'}    {'R19'      }    2021    3     28     23    59     30      1      -0.00013759           NaN    28-Mar-2021 23:59:30
    {'AS'}    {'R20'      }    2021    3     28     23    59     30      1      -4.6221e-05           NaN    28-Mar-2021 23:59:30
    {'AS'}    {'R21'      }    2021    3     28     23    59     30      1      -0.00019777           NaN    28-Mar-2021 23:59:30
    {'AS'}    {'R22'      }    2021    3     28     23    59     30      1      -0.00010502           NaN    28-Mar-2021 23:59:30
    {'AS'}    {'R24'      }    2021    3     28     23    59     30      1       3.6747e-05           NaN    28-Mar-2021 23:59:30
>> 

There are only two (2) variables past the time field at the end of the table instead of three (3), hence the NaN elements for Var11.

You can either scan the file for the location of the "END OF HEADER" record to find the number of headerlines to skip or the probably is sufficient data within the file header to compute where that is -- although if the COMMENTS are freeform, there may not be a fixed number of records there and so it may just take scanning the file first...

Either way, this is much simpler and straightforward than trying to parse the cell array...that's fraught with difficulty in comparison.