# Plot a series of boxplots from one column of a series of cells in a cell array

9 views (last 30 days)
I have a cell array C (140x1). Each cell contains a 1200x2 double. What I want to ultimately achieve is one plot of multiple boxplots (one boxplot for the second column of each cell - C{i,1}(:,2)) without having to type out an individual line 140 times (because I'll have to do this for many different batches of files).
P = 'C:\filepath';
S = dir(fullfile(P,'*.csv'));
N = numel(S);
M = cell(N,1);
C = cell(N,1);
for k = 1:N
M{k} = [M, C{k,1}(:,2)];
end
###Error message:
###Error using horzcat
###Dimensions of arrays being concatenated are not consistent. Consider converting
###input arrays to the same type before concatenating.
I've tried to separate out the cell array in the loop into 140 variable, I've tried to only read one column of the .csv file, and I've tried to create the boxplot inside the loop, but nothing I try seems to work.
My ultimate aim is something that looks like this:
But this was created painfully (manually), and I don't have the time to do this for every batch of files.
Does anyone have any ideas? Thanks

I think that the problem is that you are using CELL, instead of a simple matrix. See my example:
C =[];
for k=1:120
C1 = k*rand(1200,1);
C2 = k*ones(1200,1);
C = [C;[C1 C2]];
end
boxplot(C(:,1),C(:,2))
I allocate a matrix that is empty, but will be adding in the first column a series of random numbers, just to make them different I multiply by k which is the counter. This counter is the one that you will use to iterate over your data. In the second column I just place the value of k, which will group all the values that you have in that group. Append over as you calculate.
Chage your variable A from a cell to a matrix and store your values in the first column, and the group in the second. That should work.

Agata Wozniak on 25 Nov 2019
Works perfectly well for mu purpose too. Thanks for a great hint.
great!

Hi
What I would do would be to loop over your cell, read the values that you want to display (second column right?) and place in a new matrix (not cell) in the first column, and in the second column place the value of the current cell position, so that you create a matrix with 2 columns and 140X1200 rows, the first column would have the values and the second the group that will be used to create the boxplot.
If you have many categories, but some of these are related, you might consider displaying as 3D boxplots:
Hope this helps.

Shaula Garibbo on 23 Oct 2019
I can't figure out how to do that either, though.
The closest I get is:
S = dir(fullfile(P,'*.csv'));
M = numel(S);
C = cell(M,1);
for k = 1:M
C{k} = csvread(fullfile(P,S(k).name)); % creates cell array 109x1 (each cell contains 1200x2 double)
end
N = length(C{k,1}(:,2)); % =1200
A = cell(N,M); %preallocating 140x1200 matrix
for j = 1:M
A{j} = C{j,1}(:,2);
end
But this just overwrites the first column of A, leaving the rest of the preallocated matrix empty. How do you concatenate the contents of different cells of a cell array?
Thanks
Shaula Garibbo on 23 Oct 2019
P.S. Thank you for the 3D boxplot suggestion, but I think the data will be easier to present in 2D (it's not too so many categories, just a LOT of recordings of the same measurement)

Shaula Garibbo on 12 Nov 2019
In the end I managed to force something through for my work deadline (see code below). However, my code was very slow, so next time I will try your method (I imagine I will have to attempt this again in a couple of weeks, so I'll get back to you then!
%filepath
P = 'D:\LoVe\LoVe2018\Analysis\daybb';
%structure of all .csv files in filepath
S = dir(fullfile(P,'*.csv'));
%defining how many .csv files in folder
M = numel(S);
%preallocating C
C = cell(M,1);
%creating cell array of .csv file content (time = column 1, spl = column 2)
for k = 1:M
end
% extracts spl (column 2)
L = cellfun(@(x) x(:,2),C,'uni',0);
% extracts column one (time) from each cell in C
t = cellfun(@(x) x(:,1),C,'uni',0);
%getting the sizes of all files
for b = 1:M
[m(b,1), n(b,1)] = size(t{b,1});
end
%finding the maximum size of the files
full = max(m);
%finding location of files that are less than the maximum
[row, col] = find(m < full);
%number of locations that the files are less than maximum (for the loop)
sizeRowCol = numel(row);
%padding shorter files with NaN so they are of equal length
for ll = 1:sizeRowCol
t{row(ll),col(ll)}(end+1:full)=nan;
L{row(ll),col(ll)}(end+1:full)=nan;
end
%horizontally concatenates all the spl and time values
for j = 1:M
spl(:,j) = L{j,1}(:,1);
time(:,j) = t{j,1}(:,1);
end
%removing the first row of spl and time (all zero)
spl(1,:) = [];
time(1,:) = [];
%finding mean spl on each day - saved as average variable
run('avg_spl_from_csv.m');
%calculating n standard deviation
sig = nanstd(spl).*3;
%mean spl + 2sigma
thresholdmax = average + sig;
%mean spl - 2sigma
thresholdmin = average - sig;
%removing spl values out of the sd range
indices = find(spl <= thresholdmin | spl >= thresholdmax);
spl(indices) = nan;
boxplot(spl);
set(gca,'XTickLabel',[]);
Shaula Garibbo on 12 Nov 2019
I think I did try your method actually but every .csv file had a different number of rows so I ended up panicking about time, brain-dumping every idea I had down into a script, and then produced that enormous thing (which does work, but I wouldn't want to use it again!!).