read ASCII data with varying lengths of space as delimiters

Question

Pq Yang am 3 Apr. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/515190-read-ascii-data-with-varying-lengths-of-space-as-delimiters

Bearbeitet: dpb am 4 Apr. 2020

subset.txt

I am having trouble in reading a structured dataset. The columns are separeted by spaces which have different lengths. Besides the varying lengths of space for delimiters, the column widths are varying as well.

I have tried all kinds of appraoches and functions, and searached online, but couldnot find a solution.

Can someone help me? Please.

A subset from a huge dataset is attached.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

dpb am 3 Apr. 2020

Need clear definition of the format -- does look like fixed-width but not absolutely positive about whether the apparent missing fields are defined or skipped.

One presumes somewhere whoever created these files also built a toolset to read them and published a definitive description.

I'd look for the first first rather than reinvent the wheel and lacking that w/o the second it's going to be a lot of detective work.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

per isakson am 3 Apr. 2020

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/515190-read-ascii-data-with-varying-lengths-of-space-as-delimiters#answer_423872

Bearbeitet: per isakson am 4 Apr. 2020

In MATLAB Online öffnen

subset.txt is a fixed-width text file. No doubt about that.

Since R2018b (R2017a?) it is straight forward to read fixed-width text file with readtable(). Before that it was a bit tricky to read fixed-width text files. There are a number of threads here in Answers on the topic.

The documentation includes a good description on how to read fixed-width text files. See fixedWidthImportOptions, Import options object for fixed-width text files.

IMO: The error prone part is to determine the widths of the columns. Here is a start.

copy&paste the column header line to the command window
use regexp() to determine the starting position of each column. The column headers start with a letter (the first one is special).

>> chr = '# DTG                LOCATION            NAME  ...'
>> cac = regexp( chr, '\<\w', 'start' );
>> cac
cac =
     3    22    42    90   110   130   150   170   190   210   230   250   270   290   310   330   350

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

dpb am 4 Apr. 2020

Bearbeitet: dpb am 4 Apr. 2020

In MATLAB Online öffnen

Is there only one header line? Maybe the browser is folding the long line that confused the issue...ah! Indeed it is. That makes the file look much more regular than with folded lines. Indeed with the comment character to skip header lines presuming don't need them (altho then the column header record will be skipped too, probably; I've never tested that).

Well, now have downloaded, let's see just how smart TMW was... :)

>> opt=detectImportOptions('subset.txt','FileType','fixedwidth','CommentStyle','#',"NumHeaderLines",20,'ReadVariableNames',1)
opt = 
  FixedWidthImportOptions with properties:
   Format Properties:
                  Whitespace: '\b\t '
                  LineEnding: {'\n'  '\r'  '\r\n'}
                CommentStyle: {'#'}
               EmptyLineRule: 'skip'
                    Encoding: 'windows-1252'
   Replacement Properties:
                 MissingRule: 'fill'
             ImportErrorRule: 'fill'
            ExtraColumnsRule: 'addvars'
            PartialFieldRule: 'keep'
   Variable Import Properties: Set types by name using setvartype
               VariableNames: {'x_DTG', 'Var2', 'LOCATIO' ... and 12 more}
               VariableTypes: {'datetime', 'duration', 'char' ... and 12 more}
              VariableWidths: [10.00 9.00 9.00 21.00 4.00 8.00 2.00 35.00 19.00 16.00 99.00 20.00 20.00 40.00 55.00]
       SelectedVariableNames: {'x_DTG', 'Var2', 'LOCATIO' ... and 12 more}
             VariableOptions: Show all 15 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
       PreserveVariableNames: false
   Location Properties:
                   DataLines: [22.00 Inf]
           VariableNamesLine: 21.00
              RowNamesColumn: 0
           VariableUnitsLine: 0
    VariableDescriptionsLine: 0 
	To display a preview of the table, use preview
>> opt.VariableNames
ans =
  1×15 cell array
  Columns 1 through 10
    {'x_DTG'}    {'Var2'}    {'LOCATIO'}    {'NNAME'}    {'Var5'}    {'Var6'}    {'Var7'}    {'LATITUDE'}    {'LONGITUD'}    {'EALTI'}
  Columns 11 through 15
    {'TUDEU_BOOL_10T_…'}    {'EWP_SEA_10T_D'}    {'RYB_SEA_10TN_'}    {'DRYB_10T_WETB_1…'}    {'DRYB_10U_10U_SE…'}
>> 

Well, kinda' sorta' made a stab at it...will have to physically set the column widths and number variables -- it finds the time string separate from the date so inserts Var2 for it, then thinks the LOCATION column ends at the end of the shorter string data instead of the longer so it truncates the name as LOCATIO and includes the last N in the next column. But one gets an object to play with for starters.

Good trick for automagically guessing column widths, Per, will have to remember that going forward.

OP will still need to either account for the time column independently or to set the width to encompass both and read as string 'cuz can't handle the blank field on input parsing in readtable to convert to datetime.

Melden Sie sich an, um zu kommentieren.

read ASCII data with varying lengths of space as delimiters

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Antworten (1)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

read ASCII data with varying lengths of space as delimiters

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Antworten (1)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden