Fast Export Method

Brian on 11 Jun 2012
I have always used the export command to export a dataset to a comma delimited text file. When the files get a little larger (50-100MB) the export function seems to run very slow. Are there other functions that are much faster than the export dataset function?
My dataset is simple (just large). Col 1 is text 2:4 are numeric.
MyDS = dataset(MyData(:,1),MyData(:,2),MyData(:,3),MyData(:,4));
export(MyDS,'file','R:\Equity_Quant\BrianB\Factor Rotation\BulkData.txt','Delimiter',',');
Thanks much, Brian
per isakson
per isakson on 12 Jun 2012
per isakson
per isakson on 12 Jun 2012
The functions save, load and whos is an alternative
save( 'my_datasets.mat', 'MyDS' )
save( 'my_datasets.mat', 'MyDS_2', '-append' )
or even faster
save( 'my_datasets.mat', 'MyDS', '-v6' )
without guarantee. (Dataset does not overload save as far as I can see.)
--- Faster method to export to text file ---
Some data. I use three columns to avoid word wrap.
N = 1e1;
MyData = randn( N, 3 );
The variant with dataset
ds = dataset( MyData(:,1), MyData(:,2), MyData(:,3) );
export( ds, 'file','c:\temp\test_ds.txt', 'Delimiter', ',' )
produces this output
A faster variant
ms = permute( MyData, [ 2, 1 ] );
fid = fopen( 'c:\temp\test_fp.txt', 'w' );
fprintf( fid, '%s,%s,%s\n', 'Var1', 'Var2', 'Var3' );
fprintf( fid, '%f,%f,%f\n', ms );
fclose( fid );
produces this output
I've run these two variants with different values for N. The faster variant is at least an order of magnitude faster.
With N=5e6 on my three years old vanilla desktop I get "Elapsed time is 16.478986 seconds." with the faster variant. That is 8 MB/s - something.
How many decimals do you need in the text file?
--- fprintf is hard to exceed ---
I added this test
dlmwrite( 'c:\temp\test_dlm.txt', MyData )
With N=5e6 I got the following elapsed times
  1. fprintf (A faster variant): Elapsed time is 16.507707 seconds.
  2. dlmwrite: Elapsed time is 202.905913 seconds. (without header)
  3. dateset.export: Elapsed time is 819.649789 seconds.
With plain Matlab I don't think there is a faster alternative. Maybe it is possible to do something faster with a MEX-function.
Brian on 13 Jun 2012
Since the database is on a server separate from my local machine, my job is running some calculations via matlab and exporting the large files to the database server so that I can do a bulk import via the SQL server. Hence the need for exporting the data. Would fprintf be my fastest option for writing to a text file?
