Duplicate x,y in table; get min/max of variables to new table

1 Ansicht (letzte 30 Tage)
I have a large dataset, small subset attached. There are duplicate x,y values. I need a cleaned table with "var1_max, var1_min, var2_max, var2_min" etc. for every duplicate "x,y"
I'm just not sure where to start setting up the problem. Thanks for any pointers to get going. Using Matlab R2018a, no fancy add-ons/packages
file_in = 'example.csv';
data_in = readtable(file_in);
%%table_out.Properties.VariableNames = {'id', 'x', 'y', 'var1_max', 'var1_min', ...
%% 'var2_max', 'var2_min', 'var3_max', 'var3_min', 'var4_max', 'var4_min', ...
%% 'var5_max', 'var5_min', 'var6_max', 'var6_min', 'var7_max', 'var7_min'};
  3 Kommentare
Cris LaPierre
Cris LaPierre am 2 Nov. 2019
Not sure I understand completely. I have the same questions a dpb. It looks like you've highlighted groups, but rows 12-14 only have duplicate x values, not x and y.
For the min/max values you want, do you want the overall min/max or just the min/max within the group (those that have the same x,y values)?
For example, rows 2 and 3 above do not have values for var1-3. What values should they have?
Rows 6-10 do have values there. What should the min/max be?
And finally, 12-14 have same x but different y. How should this be handled?
newbie9
newbie9 am 2 Nov. 2019
Bearbeitet: newbie9 am 2 Nov. 2019
Please disregard highlight on rows 12-14, it is in error.
I think I should add "var1_mid" ... "var7_mid" to handle cases where x,y are unique. Example in row 1:
  • x = 31.74053 | y = 27.18408 | var1_max = NaN | var1_min = NaN | var1_mid = 90 | var2_max = NaN | var2_min = NaN | var2_mid = 54 | ...etc.
Answers to other questions below:
For the min/max values you want, do you want the overall min/max or just the min/max within the group (those that have the same x,y values)?
  • just the min/max within the group (those that have the same x,y values)
For example, rows 2 and 3 above do not have values for var1-3. What values should they have?
  • NaN
Rows 6-10 do have values there. What should the min/max be?
  • x = 31.79403 | y = 27.26837 | var1_max = 79 | var1_min = 68 | var1_mid = NaN | var2_max = 251 | var2_min = 193 | var2_mid = NaN | ...etc.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Cris LaPierre
Cris LaPierre am 2 Nov. 2019
One potential solution that is simple to implement is to use groupsummary. You can have your data grouped by x and y, and have it return the within-group min, mean, and max for table variables you specify.
However, you prescribe wanting NaN for mid-value if there are min/max values. This won't do that .It will provide values for all 3 statistics. Also note that group summary will only return one row for each group. You can use the join function if you want to merge the two together.
newData = groupsummary(data_in,{'x','y'},{'min','mean','max'},["var1","var2","var3","var4","var5","var6","var7"])
  1 Kommentar
newbie9
newbie9 am 2 Nov. 2019
This is perfect, thank you. The groupsummary returns a GroupCount, so I can just use that such that if > 1 then mean = NaN

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Tables finden Sie in Help Center und File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by