Filter löschen
Filter löschen

I want to extract data from a text file

2 Ansichten (letzte 30 Tage)
Dev
Dev am 14 Jul. 2014
Kommentiert: dpb am 15 Jul. 2014
I am attaching a text file. It contains both numeric and text data. I want to build a matrix out of the numbers from 21st row in the attached file.
I want to make a single row out of the five rows after the 21st row in the attached file. And will like to delete the two rows ( which appear like this: 7.10 157.00 0.00 227.35 17.74 Densities ) which keep on coming after every five useful rows of the data.
Please let me know.
Thanks.
  3 Kommentare
Image Analyst
Image Analyst am 14 Jul. 2014
After you push the "Choose file" button, make sure you click the "Attach File" button.
Dev
Dev am 15 Jul. 2014
Attached.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Cedric
Cedric am 15 Jul. 2014
Bearbeitet: Cedric am 15 Jul. 2014
Here is one way
content = fileread( 'PIM3dtry.txt' ) ;
tokens = regexp( content, 'Densities\s+(.*?)\s{4}', 'tokens' ) ;
data = cellfun( @(c) sscanf(c, '%f')', cat(1, tokens{:}), 'Un', 0 ) ;
data = cat( 1, data{:} ) ;
but you'll have to check that it does what you want. An easier way would be to loop over lines with FGETL and use TEXTSCAN or SSCANF on relevant ones.
EDIT: after reading dpb's hints about using the header, here is an update
content = fileread( 'PIM3dtry.txt' ) ;
nAltitudes = str2double( regexp( content, 'altitudes =\s*(\d+)', ...
'tokens', 'once' )) ;
tokens = regexp( content, 'Altitudes\s+(.*?)[\r\n] +[\r\n]', ...
'tokens', 'once' ) ;
altitudes = sscanf( tokens{1}, '%f' )' ;
tokens = regexp( content, 'Densities\s+(.*?)\s{4}', 'tokens' ) ;
data = cellfun( @(c) sscanf(c, '%f')', cat(1, tokens{:}), 'Un', 0 ) ;
data = cat( 1, data{:} ) ;
An alternative for extracting packets of 50 (# altitudes) numbers would be as in e.g.
pattern = sprintf( '(?<=Altitudes\s+)(\S+\s+){%d}', nAltitudes ) ;
matches = regexp( content, pattern, 'match', 'once' ) ;
altitudes = sscanf( matches, '%f' ) ;
Yet, keep in mind that the standard approach would be to loop over the file's lines using FGETL and extract data using TEXTSCAN, SSCAN, etc. There would be a little more logic to implement than with the REGEXP-based approach, but at least you would have a full understanding of every bit of the code.
  3 Kommentare
Dev
Dev am 15 Jul. 2014
Thank You very much. Much appreciated.
dpb
dpb am 15 Jul. 2014
BTW, in the reference thread the array is "growed" dynamically. In your case you can save quite a lot of overhead if you parse the header lines to discover the length of the file and the number of readings per section because you can then compute the total final array size and preallocate to it and fill instead of append.
For small files it's no big deal; as file sizes grow the difference performance can be dramatic indeed.
And with all kudos to Cedric and regexp has its uses, it generally isn't a performance flash. You can try the various alternatives and see; depends on how often you really have to do this and again on how large the files truly are.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (2)

dpb
dpb am 15 Jul. 2014
Actually, there's enough info in the header lines to automate the reading after you get the first header of system time, etc., then the number of lat/long steps are available which determines the number of groups and the number of densities determines the number of values read per group.
This also was clearly a Fortran application-written file (the seemingly spurious "1" and "0" are line printer carriage control characters from days of yore) and it would truly be simpler by far to write a Fortran routine to read it than writing C-style parsing, altho that can be done too, with textscan , fscanf and friends...
Why not give the above hints a go at a start, though? I've got family in so really don't have much time to devote at the moment, sorry...

Image Analyst
Image Analyst am 14 Jul. 2014
Bearbeitet: Image Analyst am 15 Jul. 2014
Try readtable() or importdata().
After looking at the attached file, it's such a unique format with a lot of things that are different line by line. So you're going to have to write your own reader. Use fopen(), fgetl(), strfind(), and fclose(). Read a line, figure out what kind of line it is with strfind(), then extract the numbers and strings, and continue until all lines have been processed.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by