MATLAB Answers

How to find the min, max and mean values of 34 timetables stored in a 1 x 34 cell and add them as extra columns to the respective timetables?

15 views (last 30 days)
Karl Dilkington
Karl Dilkington on 30 Apr 2020
Edited: Sindar on 13 May 2020
I have 34 CSV files that each consist of N rows and 3 columns. My code loops through one CSV file at a time (storing them in a 1 x 34 cell) and firstly converts them into a N x 2 timetable, TT{jj}. I then filter this timetable by a timerange (the difference between two dates and times in format dd/MM/uuuu HH:mm) to leave all the values within that timerange which is stored in TT2{jj}. The timetables have 2 columns (date and time, temperature).
I now want to know how to find the min, max and mean values of the temperature column for each timetable and how to add these values to the original timetable to create a N x 5 timetable (date and time, temperature, min temp, max temp, mean temp) which would look like this:
e.g. for jj=1, the final table would look like:
Date/Time Temperature Min Max Mean
21/02/2020 08:00 20 16 20 18.25
21/02/2020 08:03 16
21/02/2020 08:06 18
21/02/2020 08:09 19
etc etc
Then it would loop again for jj=2 etc
The loop to create TT2{jj} is:
for jj = 1:34
thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory
T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19
TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2
TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR
end
At the end of the loop (above) I export the 34 tables to one spreadsheet that has 34 tabs.
I have no idea how to proceed so any help would be appreciated.

  0 Comments

Sign in to comment.

Answers (3)

Sindar
Sindar on 30 Apr 2020
Is this what you want to do:
  • Load in a given table (say jj=1)
  • create TT2{1}
  • find the min of TT{1}.temperature
  • add a third column to TT{1} containing this min in every row
  • repeat for mean, max
  • repeat for TT{2:34}
If so, then:
for jj = 1:34
thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory
T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19
TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2
TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR
TT2{jj}.min_T=repelem(min(TT2{jj}{:,2}), size(TT2{jj},1), 1);
TT2{jj}.max_T=repelem(max(TT2{jj}{:,2}), size(TT2{jj},1), 1);
TT2{jj}.mean_T=repelem(mean(TT2{jj}{:,2}), size(TT2{jj},1), 1);
end
If you want to modify T (or likewise TT):
...
T{jj}.min_T=repelem(min(TT2{jj}{:,2}), size(T{jj},1), 1);
...

  4 Comments

Show 1 older comment
Sindar
Sindar on 13 May 2020
This should work, though there may be (vastly) more efficient methods
for jj = 1:34
thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory
T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19
TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2
TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR
% add the min to the first row of a new column
% this will throw a warning and fill the remaining rows with 0
TT2{jj}{1,'min_T'}=min(TT2{jj}{:,2};
% overwrite defaults (0) with NaN
TT2{jj}{2:end,'min_T'}=missing;
% repeat for max,mean
TT2{jj}{1,'max_T'}=max(TT2{jj}{:,2};
TT2{jj}{2:end,'max_T'}=missing;
TT2{jj}{1,'mean_T'}=mean(TT2{jj}{:,2};
TT2{jj}{2:end,'mean_T'}=missing;
% this will print to, e.g., table1.xls with blanks where the NaNs are
writetimetable(TT2{jj},"table"+jj+".xls")
end
Sindar
Sindar on 13 May 2020
if you only need the min/etc. in the xls file, but not in Matlab, you could print the 2-column table first then add the summary values in a second print statement:
for jj = 1:34
thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory
T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19
TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2
TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR
tmp_table = table(min(TT2{jj}{:,2},max(TT2{jj}{:,2},mean(TT2{jj}{:,2},'VariableNames',{'min';'max';'mean'})
writetimetable(TT2{jj},"table"+jj+".xls")
writetable(tmp_table,"table"+jj+".xls",'Range','C1:D2')
end
Sindar
Sindar on 13 May 2020
Reading the updated question, I see you want the data in different sheets of the same xls file. This sort of syntax will work for whichever method
writetimetable(TT2{jj},"table.xls",'Sheet',jj)

Sign in to comment.


Guillaume
Guillaume on 1 May 2020
The simplest thing would be to add one column to each timetable to indicate the timetable of origin, then concatenate all these timetables into one timetable. Then with just one call to groupsummary, you can get your desired output.
However, if you get the mean, min and max for each timetable, so get one scalar value for each stat per timetable, I'm a bit unclear why you still want to store a datetime. Doesn't it become meaningless?
Anyway:
TT = cell(size(files));
for fileidx = 1:numel(files)
t = readtable(files(fileidx).name, 'Headerlines', 19, 'ReadVariableNames', true);
TT{fileidx} = table2timetable(t(:, [1, 3]))
TT{fileidx}.FileIndex(:) = fileidx; %add column with file number
end
alltimetables = vertcat(TT{:}); %concatenate all in one timetable
alltimetables = alltimetables(TR, :); %keep only desired timerange
ttstats = groupsummary(alltimetables, 'FileIndex', {'mean', 'min', 'max'}) %get mean min max for each FileIndex
Note that if you're using sufficiently recent version of matlab I'd replace the loop by:
TT = cell(size(files));
opts = detectImportOptions(files(1).name, 'NumHeaderLines', 19, 'ReadVariableNames', true);
opts.SelectedVariableNames = [1, 3]; %don't bother reading 2nd column
for fileidx = 1:numel(files)
TT{fileidx} = readtimetable(files(fileidx).name, opts);
TT{fileidx}.FileIndex(:) = fileidx; %add column with file number
end
%rest of code stays the same

  1 Comment

Karl Dilkington
Karl Dilkington on 13 May 2020
Thanks for your answer. I've amended my question to include a bit more detail and clarity on exactly what I need. At the end of my loop I export all 34 timetables to a spreadsheet with 34 tabs so I still need the date/time. I want the min, max and mean values so I don't have to do it manually for each timetable/tab in Excel.
I'm assuming the code to replace the loop is more efficient than what I have?
Thanks for your help.

Sign in to comment.


Peter Perkins
Peter Perkins on 5 May 2020
As others have said, it seems to make little sense to create new variables in each timetable, each of which are a column vector of a constant. Maybe you want something like the following.
First, make something like your data:
n = 3;
tt_list = cell(n,1);
for i = 1:3
X = rand(5,1);
Time = datetime(2020,5,i)+days(rand(5,1));
tt_list{i} = timetable(Time,X);
end
Now get the stats for each timetable, and put those in a table that also includes your cell array of timetables:
t = table(tt_list,zeros(n,1),zeros(n,1),zeros(n,1),'VariableNames',["Data" "Mean" "Min" "Max"]);
for i = 1:n
t.Mean(i) = mean(tt_list{i}.X);
t.Min(i) = min(tt_list{i}.X);
t.Max(i) = max(tt_list{i}.X);
end
From that, you end up with
>> t
t =
3×4 table
Data Mean Min Max
_______________ _______ _______ _______
{5×1 timetable} 0.67375 0.4607 0.94475
{5×1 timetable} 0.56289 0.15039 0.9865
{5×1 timetable} 0.52956 0.26661 0.91785
That's the brute force way. As Guillaume suggests, you might find it convenient to put all your timetables in one longer one. The following gets you essentially yhe same table as above.
tt = vertcat(tt_list{:});
tt.Source = repelem(1:n,5)';
fun = @(x) deal(mean(x),min(x),max(x));
t = rowfun(fun,tt,'GroupingVariable','Source','NumOutputs',3, ...
'OutputFormat','table','OutputVariableNames',["Mean" "Min" "Max"])
>> t =
3×5 table
Source GroupCount Mean Min Max
______ __________ _______ _______ _______
1 5 0.67375 0.4607 0.94475
2 5 0.56289 0.15039 0.9865
3 5 0.52956 0.26661 0.91785
I used rowfun; splitapply or groupsummary would also work.

  1 Comment

Karl Dilkington
Karl Dilkington on 13 May 2020
Thanks for your answer. I've amended my question to include a bit more detail and clarity on exactly what I need. At the end of my loop I export all 34 timetables to a spreadsheet with 34 tabs so I still need the date/time. I want the min, max and mean values so I don't have to do it manually for each timetable/tab in Excel.
Thanks for your help.

Sign in to comment.


Translated by