MATLAB Answers

0

What is the best way to save to .csv without losing precision and with column headers?

Asked by Aaron Levy on 28 Aug 2019
Latest activity Commented on by James Tursa
on 3 Sep 2019
Hello-
tl;dr: What's the best way to save an m x n array from matlab to csv, with column headers that can be any text format (ie, include spaces and [ ]), without losing precision or clipping large values?
I am using Matlab (usually 2018a on Windows 7) to do various data manipulations on localizations from single molecule microscopy. These data tables are in large m x n arrays where each row is a molecule and each column is some characteristic about the molecule (for example, it's position in x y and z, its intensity, etc). These arrays are usually ~ 100,000 x 17 to >1,000,000 x 17.
In order to visualize the data and do some further processing I don't want to do in Matlab, I use another program called ThunderSTORM, a plugin for the image analysis suite FIJI. ThunderSTORM requires a .csv file with specific column headers that denote the measure and the units. Usually the headers are of the form, for example, "x [nm]", "y [nm]", "intensity [photons]", etc.
I had been using dlmwrite to write a header line to csv, then write the data underneath that, which works, except that a) dlmwrite truncates at 4 decimal plates, which is a problem, and b) dlmwrite for some reason is treating numbers greater than 9.9999 * 10^4 as just the base value without the exponent (so, 1.001 * 10^5 imports as 1.001 only). This is a problem for filtering the data as extreme values are either retained or removed.
When I used writetable instead of dlmwrite, I did not have this behavior and values >10^5 imported correctly, so it's not a ThunderSTORM thing. The only problem with using writetable is that the column names must follow the Matlab rules for variable names, so I cannot use "x [nm]" as a header. When ThunderSTORM doesn't get the column headers it expects, it sometimes tries to do mathematical conversions on that data, which I don't want (for example, it thinks the x values are in pixels, and tries to convert them to nm, when they are already in nm).
So my question is, what is the best way to save this data out to .csv, with the appropriate headers, without clipping large data (still not sure why dlmwrite is doing that) or losing precision? I was able to make a solution using fprintf, but it's clunky, and I'm wondering if there is a better way.
% what I want writetable to do, but it doesn't because the ColName variables are not allowed.
ColName = {'id' 'frame' 'x [nm]' 'y [nm]' 'z [nm]' 'sigma1 [nm]' 'sigma2 [nm]' 'intensity [photons]'...
'bkg [photons]' 'peakphot [photons]' 'crlbx [nm]' 'crlby [nm]' 'error [nm]' 'channel' 'area' 'firstrankarea' 'firstrankdensity'};
data = array2table(Tess561,'VariableNames',ColName);
writetable(data,'writetabletest.csv');
% my temporary solution that seems clunky, but does do what I want.
x = mydata(:,1:17);
fid = fopen('ouputfile.csv','w');
fprintf(fid,'id,frame,x [nm],y [nm],z [nm],sigma1 [nm],sigma2 [nm],intensity [photons],bkg [photons],peakphot [photons],CRLBX [nm],CRLBY [nm],uncertainty_xy [nm],channel,area,firstrankarea,firstrankdensity\n');
for i = 1:length(x)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x(i,1),x(i,2),x(i,3),x(i,4),x(i,5),x(i,6),x(i,7),x(i,8),x(i,9),x(i,10),x(i,11),x(i,12),x(i,13),x(i,14),x(i,15),x(i,16),x(i,17));
end
fclose(fid);

  0 Comments

Sign in to comment.

Tags

2 Answers

回答者: James Tursa
2019 年 8 月 28 日
編集済み: James Tursa
2019 年 8 月 28 日
 採用された回答

Why are you using fprintf in a loop? Can't you do it all in one call? E.g., this
for i = 1:length(x)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x(i,1),x(i,2),x(i,3),x(i,4),x(i,5),x(i,6),x(i,7),x(i,8),x(i,9),x(i,10),x(i,11),x(i,12),x(i,13),x(i,14),x(i,15),x(i,16),x(i,17));
end
becomes this (the transpose is to get the x elements in memory in the desired printing order)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x.');
P.S., Since you are creating a text file instead of a binary file, you should probably be using 'wt' instead of 'w' in your fopen( ) call.

  4 Comments

Show 1 older comment
In general, when trying to preserve significant digits when printing, I would recommend using some form of 'g' format instead of 'f' format. You might try that.
Aaron Levy 2019 年 9 月 1 日
Thanks
I ended up using fprintf with %f (actually what I ended up caring more about were the number of retained decimals places and it not clipping numbers larger than 5 significant digits). It works well.
Note that you can specify the number of digits to print. E.g., %15.7f

Sign in to comment.


回答者: Luna
2019 年 8 月 28 日

Hi,
Have you tried the functions writematrix or csvwrite ?
csvwrite is not recommended right now but you can try it.

  1 Comment

Aaron Levy 2019 年 8 月 28 日
I'll note that my for loop fprintf solution takes foreeevverrrr for the number of localizaitons I have (~2 million). Like minutes.
I had looked at writematrix, but it doesn't allow headers, and I can't use it in Matlab 2018, which I am stuck with at the moment. The problem with csvwrite is the same as baseline dlmwrite - it will truncate at 5 significant digits, which is I suppose what's happening with dlmwrite described above (ie, 100,000 becomes 1).
I guess I can specify some insane precision in dlmwrite, like precision 20 or something, since sometimes I have values that are like 4000.092583028582. Setting the precision super high does work, but does seem a little excessive. But if that's what I have to do, at least it does work (and is much faster than the loop, unsurprisingly).
ie
fid = fopen(savename,'w');
fprintf(fid,'%s\n',header);
fclose(fid);
dlmwrite(savename,mydatatosave,'-append','precision',20);

Sign in to comment.