Fastest Way to write data to a text file - fprintf

80 Ansichten (letzte 30 Tage)

Brian am 2 Aug. 2013

1
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/83870-fastest-way-to-write-data-to-a-text-file-fprintf

I am writing a lot of date to a text file one line at a time (1.7 million rows, 4 columns) that is comprised of different data types. I'm wondering if there is a better way to do this than 1 line at a time that might yield much faster results.

Here is what I'm doing now.

ExpSymbols = Char Array
ExpDates = Numeric Array
MyFactor = Numeric Array
FctrName = Char Array 
    ftemp = fopen('FileName','w' );
    for i = 1:length(MyFactor)
        fprintf(ftemp, '%s,%i,%f,%s\r\n',ExpSymbols(i,:), ExpDates(i,1), MyFactor(i,1),[FctrName '_ML']);
    end
    fclose(ftemp);

Thanks in advance,

Brian

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Akzeptierte Antwort

Jan am 2 Aug. 2013

2
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/83870-fastest-way-to-write-data-to-a-text-file-fprintf#answer_93462

In MATLAB Online öffnen

You can try to suppress the flushing by opening the file in the 'W' instead of the 'w':

ftemp = fopen('FileName', 'W');  % uppercase W
Fmt   = ['%s,%i,%f,', FctrName '_ML\r\n'];
for i = 1:length(MyFactor)
  fprintf(ftemp, Fmt, ExpSymbols(i,:), ExpDates(i), MyFactor(i));
end
fclose(ftemp);

9 Kommentare
7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden

Brian am 5 Aug. 2013

Bearbeitet: Brian am 5 Aug. 2013

You're right, saving the variables by themselves is much quicker than writing to a flat file. I changed my code to write to C:\Temp (as you suggested above) and the save took .97 seconds and the load took .33 seconds. The formatted flat file is 62 MB in size and the .mat file is only 15MB or so. I do need a properly formatted file for the other system to read as it can't read .mat files.

All fields need to be in one file but it sounds like you're saying that the writing of mixed data types is what's making the write unnecessarily slow. Can I write one data type at a time to the same file using a loop structure for each data type?

dpb am 5 Aug. 2013

A) Can you offload the formatting from this code to a second one that processes the .mat files and writes the formatted ones? Won't save any overall but moves it to a different place where the bottleneck might not be so evident? For example, you could have a second background process doing that conversion while the primary analyses are done interactively? All depends on the actual workflow as to whether helps or not, of course.

B) Can your target app read the data variables sequentially one after the other instead of all a record at a time as you're currently writing them? If so, sure you can write each w/o any loop at all and it will likely be faster by at least a measurable amount as Jan suggests.

C) You might just see what the text option of save does in comparison for speed--don't know it'll help but what they hey...

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

dpb am 2 Aug. 2013

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/83870-fastest-way-to-write-data-to-a-text-file-fprintf#answer_93454

Bearbeitet: dpb am 3 Aug. 2013

In MATLAB Online öffnen

It's a pita for mixed fields--I don't know of any clean way to mix them in fprintf c

I generally build the string array internally then write the whole thing...

cma=repmat(',',length(dates),1);  % the delimiter column
out=[symb cma num2str(dates) cma factor cma names];
fprintf(fid, '%s\n', out);
fid=fclose(fid);

names is a placeholder for the FactorName that I guess may be a constant? If so, it can be inserted into the format string as Jan assumed; if not needs to be built as the column of commas to concatenate however it should be.

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

dpb am 4 Aug. 2013

Bearbeitet: dpb am 4 Aug. 2013

Yeah, but why do they have to be formatted instead of stream?

You can't change the other input form for the other routine or just do that in Matlab, too, w/o having to write the files in between?

I did a little test but my machine is very old and memory limited -- before I ran out of memory it appeared to me that the in memory process helped but you can't use num2str except w/ a fixed format because it'll have different numbers of significant digits otherwise.

ADDENDUM--I reverted back to R12 to get a little more memory available for data w/o thrashing disk.

Turns out that at least there num2str() lives up to it's reputation as a performance dog--the loop beat the internal conversion hands down for larger sizes. OTOMH I can't think of another builtin way to generate the columns w/o looping constructs--sprintf() embeds a \n if use it which is ok for display purposes but not for output to file. I guess I don't have any other answer than to see if can use stream i/o instead, sorry. OBTW, the other thing that will help if you still must write it as formatted -- once you do the conversion in memory, then use fwrite to output the data.

Brian am 5 Aug. 2013

Just to convert my two numeric arrays to string takes 55 seconds. This is slower than writing the file with the mixed data types using fprintf and the 'W' argument. I'm still not sure what you are referring to when you talk about "stream." I'm not familiar with that.

dpb am 5 Aug. 2013

Also called "binary". It's unformatted i/o which has the benefits for speed of

a) full precision for float values at minimum number of bytes/entry, b) eliminates the format conversion overhead on both input and output

doc fwrite % and friends

or if could stay in Matlab then

doc save % and load is only slightly higher-level

The possible disadvantage is, of course, you can't just look at a file and read it; but who's going to manually be looking at such large files, anyway?

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.