Elegant way to extract data from text files with an arbitrary format?

1 Ansicht (letzte 30 Tage)
Brendan
Brendan am 1 Dez. 2015
Beantwortet: Brendan am 1 Dez. 2015
Hi Guys,
I need to process a large number of text files to extract numerical data. The data is fairly complex, as the files have a arbitrary format and contain several different blocks of data. To illustrate:
Boys Names:
Tom Dick Harry...
Animals:
Cat Dog Squirrel Triceratops Shark...
Rectangle Properties:
x0 y0 width height angle
0 1 4 2 30
-1 2 5 1.5 0.5
7 1 4 5 22
3 9 7.5 6 0
Some more data...
The challenge is that the data I need to access is somewhere in the middle of each file. I never know where the block (Rectangle Properties in this case) will show up. There could, for example, be a large number of records under the Names or Animals sections, which means I need to locate the Rectangle section of the file. To complicate things further - I don't know how many rectangles I need to read in.
The header "Rectangle Coordinates" only appears once in each file. The sub-header line "x0 y0...." occurs in several places (e.g. different shapes).
My current approach is:
  • Scan through the file (using fgetl) until I get to the "Rectangle Coordinates:" header.
  • Skip a line (I don't need the sub-header)
  • Read 5 items of numerical data (sscanf) from each of the subsequent lines until I reach a blank line
This works fine, but I'm wondering if there'a a more elegant approach, perhaps using regular expressions or some other technique?
The data files I'm processing are quite large and I need to extract several different blocks of data (e.g. Rectangles, Triangles, Circles). Each block has a unique header but may have a one or more sub-header lines which are not unique. The number of data items in each block varies, and there is no way to know how many items there are when I begin processing the data. This makes it difficult to produce a "one size fits all function" and the code gets pretty messy.
Any advice would be appreciated!
B
  1 Kommentar
Walter Roberson
Walter Roberson am 1 Dez. 2015
For the blocks that you need, is the order of blocks fixed?
Is the first line of the file always the same?

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Brendan
Brendan am 1 Dez. 2015
The order of the blocks is always the same, and the blocks are always present in every text files.
The first line of each file varies and, again, there is no way to know what this will be before the file is opened.

Kategorien

Mehr zu Data Import and Export finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by