I need to break down the flow data into groups of data with months for every year.
13 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Harjas
am 20 Jun. 2022
Kommentiert: Cris LaPierre
am 26 Jun. 2022
For example: All the entries for the month of Jan for all the years (suppose 2010-2022) are stored in one group. Similarly, all the entries for the month of Feb for all the years.(suppose 2010-2022) are stored in second group and so on. My end goal is to calculate the percentile of monthly data for all the years.
Date Flow
12/1/2010 100
12/2/2010 200
12/3/2010 150
12/4/2010 176
.....
.....
12/1/2022 145
I tried to create the vlookup function to create the matrix but i wasn't able to. So, I decided to group all months together and then perform operations on them.
0 Kommentare
Akzeptierte Antwort
Cris LaPierre
am 20 Jun. 2022
6 Kommentare
Cris LaPierre
am 25 Jun. 2022
I think the best approach here is to create a second table that is a subset of the full data set, but just contains the date ranges of interest. Then you can use groupsummary to compute the desired stats grouped by year. I don't know what you want to use for percentile, so I use 42.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1043800/Example.xlsx',...
'ReadVariableNames',false)
% Use 'day of year' to identify dates within the desired range for all years
drng = day(datetime(2004,9,27):caldays(1):datetime(2004,10,27),'dayofyear');
d = day(T.Var1,'dayofyear');
% Create subtable
idx= ismember(d,drng);
Tsub = T(idx,:)
% compute stats for each year
Tstats = groupsummary(Tsub,'Var1','year',{@(A) prctile(A,42),'max','min'},'Var2');
Tstats.Properties.VariableNames = ["Tyr","GroupCnt","Tmin","Tmax","Tpct"]
Cris LaPierre
am 26 Jun. 2022
Leap years are going to cause some issues with my previous answer. Unfortunately, I couldn't find a slick way to do this, so here's a brute force approach.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1043800/Example.xlsx',...
'ReadVariableNames',false);
% Define range of interest
startD = datetime(2004,9,27);
stopD = datetime(2004,10,27);
% find data for all years between the indicated month/days
y = unique(year(T.Var1));
idx = zeros(height(T),1);
for i = 1:length(y)
idx = idx + isbetween(T.Var1,datetime(y(i),month(startD),day(startD)),...
datetime(y(i),month(stopD),day(stopD)));
end
Tsub = T(logical(idx),:)
% Compute summary stats
Tstats = groupsummary(Tsub,'Var1','year',{@(A) prctile(A,42),'max','min'},'Var2');
Tstats.Properties.VariableNames = ["Tyr","GroupCnt","Tmin","Tmax","Tpct"]
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Data Preprocessing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!