Splitting data array into sub arrays
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Francis Chabot
am 19 Jan. 2021
Kommentiert: Francis Chabot
am 4 Feb. 2021
Hi, I want to split the data of an array by the first 2 numbers of each data. This is to split firms by the first 2 digit of their SIC codes.
Example: 3301 would be in the sub array 33 which would represent a sector.
Any suggestions would be helpful.
Regards,
Frank
2 Kommentare
dpb
am 19 Jan. 2021
Would need more than just one sample number to know the format...if it's always the first two digits of a variable-length number/string, then converting to character and extracting the first two characters is probably as simple a solution as any.
If it's always a four-digit number like the example, then
mod(v,100)
would work, but wouldn't if were a five- or three-digit number.
Need a complete definition of the input patterns possible.
Akzeptierte Antwort
dpb
am 20 Jan. 2021
>> SIC=[3301, 4502, 3306, 4602, 4510].';
>> splitapply(@median,SIC,findgroups(fix(SIC/100)))
ans =
3303.50
4506.00
4602.00
>>
NB: The use of mod above was in error, dunno how I came up with that, but as long as they're all four-digit codes, the above should be about as easy as it gets.
7 Kommentare
dpb
am 3 Feb. 2021
Bearbeitet: dpb
am 3 Feb. 2021
I showed you how to do that -- use fix(SIC/100) as the grouping variable.
However, that said, I see that I forgot to include that in the previous code and just used SIC.
The way things work in identifying the grouping variable, you can't pass a calculation in rowfun so you need to create the variable for the purpose.
>> tSIC.Industry=fix(tSIC.SIC/100); % define industry code for grouping
>> tSIC=tSIC(:,[1 end 2]); % rearrange table for convenience
>> tSIC % show resulting table
tSIC =
2×3 table
SIC Industry Data
____ ________ ____________________
3301 33 12 32 21 92
4502 45 32 45 32 65
>> rowfun(@median,tSIC,'GroupingVariables','Industry', ...
'InputVariables','Data', ...
'OutputVariableNames','Median')
ans =
2×3 table
Industry GroupCount Median
________ __________ ______
33 1 26.5
45 1 38.5
>>
As noted above, will need to modify to handle multiple cases of the same ID by using the anonymous function and the 'all' parameter to compute overall group median.
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Time Series Events finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!