Question about optimizing reading data from text file

Question

0 Stimmen

Hello, thanks for reading this,

I currently have a reader that reads in mesh files, and it works, but depending on the size of the file it can take a very long time. I was hoping I can optimize it for speed.

What I do first is read in a text file and change every line into a matrix of characters using the lines:

   cac = textscan( fid, '%[^\n]' );
   fclose(fid);
   A  = char( cac{1} );

where A is my character matrix. I then search through the text file for identifiers for data I need. How I accomplish this is by setting start of data indices and end of data indices. I basically read this line by line, and at the moment, I assume it will always be formatted in a certain way.

After I have these indices, I use sscanf functions to read the characters as %f or %x numbers and store them into matrices. This is the part where the profiler says it takes the longest to complete.

I posted the MATLAB reader function here: http://pastebin.com/FFtgXzg4, since it is a bit long to post here. My specific questions are: do I have to convert the whole text import into a character matrix, and is there any way I can do this without needing a for loop? The loops using sscanf take a very long time.

It works, but just barely so. I can send a test import file if needed.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Cedric am 24 Mai 2013

Could you post e.g. 20 lines of your data file, and define these identifiers that are are referring to?

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Jonathan Sullivan am 23 Mai 2013

Bearbeitet: Jonathan Sullivan am 23 Mai 2013

In MATLAB Online öffnen

0 Stimmen

You may want to use fread and regexp.

Without seeing your file, I can't say for sure this will produce the same result, but it should give you a good starting point.

% Using regexp and fread
fid = fopen(filename,'r');
tic;
A = regexp(fread(fid,'*char')','\n','split');
A = char( A{:} );
toc
fclose(fid);
% Using textscan
fid = fopen(filename,'r');
tic;
B = textscan(fid,'%[^\n]');
B2 = char(B{1});
toc
fclose(fid);

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Brian am 23 Mai 2013

It seems that the text scan I have goes slightly faster than the regexp/fread combination. There is one last part of the code that seems to be giving me problems:

When I have my start and end indices, I use sscanf line by line to give me the real data I need. However, some of my character matrices can be very large: sometimes spanning hundreds of thousands of rows (depending on the number of tetrahedra I have).

Is it possible to read this in any kind of intelligent fashion using sscanf line by line, or use it as a vector component, or should I look into exporting the matrix to a formatted text file and re-importing it using textread and hex2dec?

In these areas, I will always have the following combination of characters:

xxx xxx xxxx x x,

where I believe it can be split by a space delimiter. That leaves me with five hexadecimal values per row.

Melden Sie sich an, um zu kommentieren.

Question about optimizing reading data from text file

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Antworten (1)

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Kategorien

Tags

Community Treasure Hunt

Question about optimizing reading data from text file

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Antworten (1)

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Kategorien

Tags

Siehe auch

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden