# Manipulating large sets of data, asking for advice on a few things, cells and numeric array operations, with performance in mind.

Bob photonics am 26 Mär. 2020
Currently I'm working with large data sets,
I've saved those data set as matlab files with the two biggest files being 9.5GB and 5.9GB.
These files contain a cell array each of 1x8 (this is done for addressibility and to prevent mixing up data from each of the 8 cells and I specifically wanted to avoid eval).
Each cell then contains a 3D double matrix, for one it's 1001x2002x201 and the other it is 2003x1001x201 (when process it I chop of 1 row at the end to get it to 2002).
Now I'm already running my script and processing it on a server (64 cores and plenty of RAM, matlab crashed on my laptop, as I need more than 12GB ram on windows). Nonetheless it still takes several hours to finish running my script and I still need to do some extra operations on the data which is why I'm asking advice.
For some of the large cell arrays, I need to find the maximum value of the entire set of all 8 cells, normally I would run a for loop to get the maximum of each cel and store each value in a temporay numeric array and then use the function max again. This will work for sure I'm just wondering if there's a better more efficient way.
After I find the maximum I need to do a manipulation over all this data as well, normally I would do something like this for an array:
B=A./maxvaluefound;
A(B > a) = A(B > a)*constant;
Now I could put this in a for loop, adress each cell and run this, however I'm not sure how efficient that would be though. Do you think there's a better way then a for loop that's not extremely complicated/difficult to implement?
There's one more thing I need to do which is really important, each cell as I said before is a slice (consider it time), while inside each slide is the value for a 3D matrix/plot. Now I need to integrate the data so that I get more slices. The reason I need to do this that I need to create slices/frames/plots to create a movie/gif. I'm planning on plotting the 3d data using scatter3 where this data is represented by color. I plan on using alpha values to make it see through so that one can actually see the intensity in this 3d plot. However I understand how to use griddata but apparently it's quite slow. Some of the other methods where hard to understand. Thus what would be the best way to interpolate these (time) slices in an efficient way over the different cells in the cell array? Please explain it if you can, preferably with an example.
I've added a pic for the Linux server info I'm running it on below, note I can not update the matlab version unfortunately, it's R2016a.
I've also attached part of my code to give a better idea of what I'm doing:
if (or(L03==2,L04==2)) % check if this section needs to be executed based on parameters set at top of file
E_field_650nm_intAll=cell(1,8); %create empty cell array
parfor ee=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
E_field_650nm_intAll{ee}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cell 1-8
% tt=0;
for qq=1:2:xres
% tt=1+tt;
tt=(qq+1)/2;
T1=griddata(Xsall{ee},Ysall{ee},EfieldsAll{ee}(:,:,qq)',XIT,ZIT,'natural'); %change non-uniform to uniform gridded data
% T1 = interp2(Xs1,Ys1,Efields(:,:,qq)',XIT,ZIT,'spline');
E_field_650nm_intAll{ee}(:,:,tt)=T1; %fill up each cell with uniform data
% clear T1
end
% clear qq tt
end
clear T1
clear qq tt
clear ee
save('../savelargefile.mat', 'E_field_650nm_intAll', '-v7.3')
end
if (L05==2) % check if this section needs to be executed based on parameters set at top of file
if ~exist('E_field_650nm_intAll','var') % if variable not in workspace load it
end
parfor tt=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
CFxLight{tt}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cell 1-8
for qq=1:xres
CFs=Cafluo3D{tt}(1:lxq2,:,qq)'; %tranpose matrix for point-wise multiplication
CFxLight{tt}(:,:,qq)=CFs.*E_field_650nm_intAll{tt}(:,:,qq); %point-wise multiple the two large matrices for each cell and put in new cell array
% clear CFs
end
end
clear CFs
clear qq tt
save('../saveanotherlargefile.mat', 'CFxLight', '-v7.3')
end
Thank you for reading through my entire post I certainly appreciate. I would also greatly appreciate any help/tips/advice.
R2016a

