I want to extract data from a text file

Question

Dev am 14 Jul. 2014

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/141833-i-want-to-extract-data-from-a-text-file

Kommentiert: dpb am 15 Jul. 2014

PIM3dtry.txt

I am attaching a text file. It contains both numeric and text data. I want to build a matrix out of the numbers from 21st row in the attached file.

I want to make a single row out of the five rows after the 21st row in the attached file. And will like to delete the two rows ( which appear like this: 7.10 157.00 0.00 227.35 17.74 Densities ) which keep on coming after every five useful rows of the data.

Please let me know.

Thanks.

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Image Analyst am 14 Jul. 2014

After you push the "Choose file" button, make sure you click the "Attach File" button.

Dev am 15 Jul. 2014

Attached.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Cedric am 15 Jul. 2014

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/141833-i-want-to-extract-data-from-a-text-file#answer_145124

Bearbeitet: Cedric am 15 Jul. 2014

In MATLAB Online öffnen

Here is one way

 content = fileread( 'PIM3dtry.txt' ) ;
 tokens  = regexp( content, 'Densities\s+(.*?)\s{4}', 'tokens' ) ;
 data    = cellfun( @(c) sscanf(c, '%f')', cat(1, tokens{:}), 'Un', 0 ) ;
 data    = cat( 1, data{:} ) ;

but you'll have to check that it does what you want. An easier way would be to loop over lines with FGETL and use TEXTSCAN or SSCANF on relevant ones.

EDIT: after reading dpb's hints about using the header, here is an update

 content = fileread( 'PIM3dtry.txt' ) ;
 nAltitudes = str2double( regexp( content, 'altitudes =\s*(\d+)', ...
    'tokens', 'once' )) ;
 tokens = regexp( content, 'Altitudes\s+(.*?)[\r\n] +[\r\n]', ...
    'tokens', 'once' ) ;
 altitudes = sscanf( tokens{1}, '%f' )' ;
 tokens = regexp( content, 'Densities\s+(.*?)\s{4}', 'tokens' ) ;
 data   = cellfun( @(c) sscanf(c, '%f')', cat(1, tokens{:}), 'Un', 0 ) ;
 data   = cat( 1, data{:} ) ;

An alternative for extracting packets of 50 (# altitudes) numbers would be as in e.g.

 pattern   = sprintf( '(?<=Altitudes\s+)(\S+\s+){%d}', nAltitudes ) ;
 matches   = regexp( content, pattern, 'match', 'once' ) ;
 altitudes = sscanf( matches, '%f' ) ;

Yet, keep in mind that the standard approach would be to loop over the file's lines using FGETL and extract data using TEXTSCAN, SSCAN, etc. There would be a little more logic to implement than with the REGEXP-based approach, but at least you would have a full understanding of every bit of the code.

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

dpb am 15 Jul. 2014

In MATLAB Online öffnen

Yeah, Cedric, the header lines make it basically a repeat of the thread "Extracting Data from a Messy Text File" asked by Alison yesterday at

http://www.mathworks.com/matlabcentral/answers/141797-extracting-data-from-messy-text-file

With the added caveat that since this poster is only concerned about the once group of repeated data, it's one textscan with 'headerlines',20 and then the loop for the rest with the 'headerlines',N for the following groups repeated until end-of-file. This file despite the appearances is actually more regular than the other in that thread that had two header groups that were different before reaching the repeating pattern.

My initial hints were based on a very quick reading wherein I didn't notice that all the data weren't of interest so was planning on parsing all the headers as well.

I've always thought the Fortran way of being able to define an array of

real :: x(50)
read(lun,fmt) X

that will read to satisfy size(x) across records if needs be to be so much a better syntax from a programmer's standpoint than the C fscanf model. I've railed before about the advantages of format over C's pattern strings... :)

Dev am 15 Jul. 2014

Thank You very much. Much appreciated.

dpb am 15 Jul. 2014

BTW, in the reference thread the array is "growed" dynamically. In your case you can save quite a lot of overhead if you parse the header lines to discover the length of the file and the number of readings per section because you can then compute the total final array size and preallocate to it and fill instead of append.

For small files it's no big deal; as file sizes grow the difference performance can be dramatic indeed.

And with all kudos to Cedric and regexp has its uses, it generally isn't a performance flash. You can try the various alternatives and see; depends on how often you really have to do this and again on how large the files truly are.

Melden Sie sich an, um zu kommentieren.

Answer 2

dpb am 15 Jul. 2014

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/141833-i-want-to-extract-data-from-a-text-file#answer_145123

Actually, there's enough info in the header lines to automate the reading after you get the first header of system time, etc., then the number of lat/long steps are available which determines the number of groups and the number of densities determines the number of values read per group.

This also was clearly a Fortran application-written file (the seemingly spurious "1" and "0" are line printer carriage control characters from days of yore) and it would truly be simpler by far to write a Fortran routine to read it than writing C-style parsing, altho that can be done too, with textscan , fscanf and friends...

Why not give the above hints a go at a start, though? I've got family in so really don't have much time to devote at the moment, sorry...

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 3

Image Analyst am 14 Jul. 2014

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/141833-i-want-to-extract-data-from-a-text-file#answer_145110

Bearbeitet: Image Analyst am 15 Jul. 2014

Try readtable() or importdata().

After looking at the attached file, it's such a unique format with a lot of things that are different line by line. So you're going to have to write your own reader. Use fopen(), fgetl(), strfind(), and fclose(). Read a line, figure out what kind of line it is with strfind(), then extract the numbers and strings, and continue until all lines have been processed.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

I want to extract data from a text file

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Akzeptierte Antwort

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (2)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

I want to extract data from a text file

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Akzeptierte Antwort

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (2)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden