Most appropriate data structure for multi-level nested dataset
3 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Dom Janetzko
am 25 Jun. 2018
Kommentiert: Dom Janetzko
am 27 Jun. 2018
I have 720 excel files (24 participants x 30 conditions) with 8 sheets (subconditions) each containing 14 variables further divided into 8 cells of double data.
I read them into a nested cell array by using 4 for-loops to the appropriate cell index (see file attached; I had to crop out the last 6 columns in order to upload, the rest of the array is the same though).
However, I now need to analyze the data by aggregating over different levels of this hierachy (e.g. mean of segment 2 over all participants and all conditions). With my data storage I cannot see a way of efficiently performing such an operation when segment is stored on the lowest level.
Therefore I was wondering, how my way of storing the data could be improved or if there is a way to apply the needed functions over different levels of hierachy.
0 Kommentare
Akzeptierte Antwort
Jeff Miller
am 26 Jun. 2018
Maybe your best bet is to use a much simpler table data structure. Each row in the table would correspond to one combination of distinguishable conditions, and certain columns would label the conditions. From your description, you would need label columns for at least the participant, the condition, and the subcondition. There might also be another column or two labelling cell, etc. Other table columns hold the data values for each condition; maybe there are 14 in your case.
Then you can select out arbitrary sets of rows by specifying the relevant values in the label colums, and do what you want with them (e.g., average).
You might be able to use a lot of the RawRT functionality, even though that toolkit was written for a very specific sort of data that is probably not what you have.
3 Kommentare
Jeff Miller
am 27 Jun. 2018
> Now I'm just wondering how I can unnest the cell data structure into such a flat format.
If the data are coming from external files, it might actually be easier just to read them in again, creating the appropriate labels for each row.
> In order to perform the analysis, I would probably split it up by variable of interest
There are two advantages to having all the variables in a single table: (a) you only need to create the indicators once, and (b) you have the different variables together in case you want to look at them together (e.g., check for correlations).
There are also some MATLAB built-ins that apply functions to various subsets of tables (e.g., splitapply), but I haven't used them.
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Tables finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!