Asked by Aaron Levy
on 28 Aug 2019

Hello-

tl;dr: What's the best way to save an m x n array from matlab to csv, with column headers that can be any text format (ie, include spaces and [ ]), without losing precision or clipping large values?

I am using Matlab (usually 2018a on Windows 7) to do various data manipulations on localizations from single molecule microscopy. These data tables are in large m x n arrays where each row is a molecule and each column is some characteristic about the molecule (for example, it's position in x y and z, its intensity, etc). These arrays are usually ~ 100,000 x 17 to >1,000,000 x 17.

In order to visualize the data and do some further processing I don't want to do in Matlab, I use another program called ThunderSTORM, a plugin for the image analysis suite FIJI. ThunderSTORM requires a .csv file with specific column headers that denote the measure and the units. Usually the headers are of the form, for example, "x [nm]", "y [nm]", "intensity [photons]", etc.

I had been using dlmwrite to write a header line to csv, then write the data underneath that, which works, except that a) dlmwrite truncates at 4 decimal plates, which is a problem, and b) dlmwrite for some reason is treating numbers greater than 9.9999 * 10^4 as just the base value without the exponent (so, 1.001 * 10^5 imports as 1.001 only). This is a problem for filtering the data as extreme values are either retained or removed.

When I used writetable instead of dlmwrite, I did not have this behavior and values >10^5 imported correctly, so it's not a ThunderSTORM thing. The only problem with using writetable is that the column names must follow the Matlab rules for variable names, so I cannot use "x [nm]" as a header. When ThunderSTORM doesn't get the column headers it expects, it sometimes tries to do mathematical conversions on that data, which I don't want (for example, it thinks the x values are in pixels, and tries to convert them to nm, when they are already in nm).

So my question is, what is the best way to save this data out to .csv, with the appropriate headers, without clipping large data (still not sure why dlmwrite is doing that) or losing precision? I was able to make a solution using fprintf, but it's clunky, and I'm wondering if there is a better way.

% what I want writetable to do, but it doesn't because the ColName variables are not allowed.

ColName = {'id' 'frame' 'x [nm]' 'y [nm]' 'z [nm]' 'sigma1 [nm]' 'sigma2 [nm]' 'intensity [photons]'...

'bkg [photons]' 'peakphot [photons]' 'crlbx [nm]' 'crlby [nm]' 'error [nm]' 'channel' 'area' 'firstrankarea' 'firstrankdensity'};

data = array2table(Tess561,'VariableNames',ColName);

writetable(data,'writetabletest.csv');

% my temporary solution that seems clunky, but does do what I want.

x = mydata(:,1:17);

fid = fopen('ouputfile.csv','w');

fprintf(fid,'id,frame,x [nm],y [nm],z [nm],sigma1 [nm],sigma2 [nm],intensity [photons],bkg [photons],peakphot [photons],CRLBX [nm],CRLBY [nm],uncertainty_xy [nm],channel,area,firstrankarea,firstrankdensity\n');

for i = 1:length(x)

fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x(i,1),x(i,2),x(i,3),x(i,4),x(i,5),x(i,6),x(i,7),x(i,8),x(i,9),x(i,10),x(i,11),x(i,12),x(i,13),x(i,14),x(i,15),x(i,16),x(i,17));

end

fclose(fid);

回答者: James Tursa
2019 年 8 月 28 日

編集済み: James Tursa
2019 年 8 月 28 日

採用された回答

Why are you using fprintf in a loop? Can't you do it all in one call? E.g., this

for i = 1:length(x)

fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x(i,1),x(i,2),x(i,3),x(i,4),x(i,5),x(i,6),x(i,7),x(i,8),x(i,9),x(i,10),x(i,11),x(i,12),x(i,13),x(i,14),x(i,15),x(i,16),x(i,17));

end

becomes this (the transpose is to get the x elements in memory in the desired printing order)

fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x.');

P.S., Since you are creating a text file instead of a binary file, you should probably be using 'wt' instead of 'w' in your fopen( ) call.

James Tursa
on 28 Aug 2019

Aaron Levy
2019 年 9 月 1 日

Thanks

I ended up using fprintf with %f (actually what I ended up caring more about were the number of retained decimals places and it not clipping numbers larger than 5 significant digits). It works well.

James Tursa
on 3 Sep 2019

Note that you can specify the number of digits to print. E.g., %15.7f

Sign in to comment.

回答者: Luna
2019 年 8 月 28 日

Hi,

csvwrite is not recommended right now but you can try it.

Aaron Levy
2019 年 8 月 28 日

I'll note that my for loop fprintf solution takes foreeevverrrr for the number of localizaitons I have (~2 million). Like minutes.

I had looked at writematrix, but it doesn't allow headers, and I can't use it in Matlab 2018, which I am stuck with at the moment. The problem with csvwrite is the same as baseline dlmwrite - it will truncate at 5 significant digits, which is I suppose what's happening with dlmwrite described above (ie, 100,000 becomes 1).

I guess I can specify some insane precision in dlmwrite, like precision 20 or something, since sometimes I have values that are like 4000.092583028582. Setting the precision super high does work, but does seem a little excessive. But if that's what I have to do, at least it does work (and is much faster than the loop, unsurprisingly).

ie

fid = fopen(savename,'w');

fprintf(fid,'%s\n',header);

fclose(fid);

dlmwrite(savename,mydatatosave,'-append','precision',20);

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.