Dividing grouped data into equally sized subgroups
24 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Maximilian Hauschel
am 5 Aug. 2021
Kommentiert: Maximilian Hauschel
am 7 Aug. 2021
Hello community,
for my current project i need to divide the values of a table for each group (parent-group) into 10 eqaully sized subgroups. The values within a parent-group are arranged hierarchically. By creating the subgroups, the hierarchical order has to remain, so that the first subgroup of a parent-group contains the lowest 10% of the values, the second subgroup of a parent group contains the next larger 10%-stack (>10% - 20%) of the values and so on. In the following, i will give an example:
Data = array2table(rand(1000,1));
Data.Properties.VariableNames{1} = 'Value';
ParentGroups = [1:1:10]';
Data.ParentGroups = repelem(ParentGroups,100);
Data = sortrows(Data,{'ParentGroups','Value'});
My first approach was to create the deciles of each group, which separate each parent-group into 10 equally sized parts. The problem here is, that the function i used simply outputs the values of the deciles, but there is no reference to their location in the original data. For creating the deciles, I used the following code:
decileFnc = @(x) prctile(x,[0:10:100]);
deciles = varfun(decileFnc,Data,'GroupingVariables','ParentGroups','InputVariables',1);
deciles.Fun_Value = string(deciles.Fun_Value);
deciles = [deciles(:,1:2),regexp(deciles.Fun_Value, '\s+', 'split')];
doubleFnc = @(x) double(x);
deciles = [deciles(:,1:2),varfun(doubleFnc,deciles,'InputVariables',3:13)];
The last block of code is used to unstack the output of varfun, i just shared it for the sake of completeness.
I'm really hoping for some help here and appreciate every answer.
2 Kommentare
Peter Perkins
am 5 Aug. 2021
Maximillian, it's not clear what you are trying to do. Are you trying to tag each of the 1000 rows in the original data with the within-group decile that its value falls in, but preserve the original order of the data?
By the way, see the splitvars function. I thnk it replaces your last chunk of code.
Akzeptierte Antwort
Weitere Antworten (1)
Peter Perkins
am 6 Aug. 2021
>> x = rand(16,1);
>> g = repmat((1:2)',8,1);
>> t = table(g,x)
t =
16×2 table
g x
_ _______
1 0.81472
2 0.90579
1 0.12699
2 0.91338
1 0.63236
2 0.09754
1 0.2785
2 0.54688
1 0.95751
2 0.96489
1 0.15761
2 0.97059
1 0.95717
2 0.48538
1 0.80028
2 0.14189
Here's a function that takes a vector and returns the quartile bin in which each value falls.
function qbin = fun(x)
q = quantile(x,[0 .25 .5 .75 1]);
qbin = discretize(x,q);
end
Apply that function to each group of rows in the table.
>> tq = rowfun(@fun, t, "GroupingVariables","g")
tq =
16×3 table
g GroupCount Var3
_ __________ ____
1 8 3
1 8 1
1 8 2
1 8 2
1 8 4
1 8 1
1 8 4
1 8 3
2 8 3
2 8 3
2 8 1
2 8 2
2 8 4
2 8 4
2 8 2
2 8 1
rowfun has returned its output table in group order. To attach the quartile bin numbers to the correct rows of the original data, use the second output of sort.
>> [~,ord] = sort(t.g);
>> t.qbin(ord) = tq.Var3
t =
16×3 table
g x qbin
_ _______ ____
1 0.81472 3
2 0.90579 3
1 0.12699 1
2 0.91338 3
1 0.63236 2
2 0.09754 1
1 0.2785 2
2 0.54688 2
1 0.95751 4
2 0.96489 4
1 0.15761 1
2 0.97059 4
1 0.95717 4
2 0.48538 2
1 0.80028 3
2 0.14189 1
To make the result easier to confirm, sort by x within group; notice that the quartile bin numbers are as expected.
sortrows(t,["g" "qbin"])
ans =
16×3 table
g x qbin
_ _______ ____
1 0.12699 1
1 0.15761 1
1 0.63236 2
1 0.2785 2
1 0.81472 3
1 0.80028 3
1 0.95751 4
1 0.95717 4
2 0.09754 1
2 0.14189 1
2 0.54688 2
2 0.48538 2
2 0.90579 3
2 0.91338 3
2 0.96489 4
2 0.97059 4
Siehe auch
Kategorien
Mehr zu Dates and Time finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!