Most appropriate data structure for multi-level nested dataset

3 Ansichten (letzte 30 Tage)
I have 720 excel files (24 participants x 30 conditions) with 8 sheets (subconditions) each containing 14 variables further divided into 8 cells of double data.
I read them into a nested cell array by using 4 for-loops to the appropriate cell index (see file attached; I had to crop out the last 6 columns in order to upload, the rest of the array is the same though).
However, I now need to analyze the data by aggregating over different levels of this hierachy (e.g. mean of segment 2 over all participants and all conditions). With my data storage I cannot see a way of efficiently performing such an operation when segment is stored on the lowest level.
Therefore I was wondering, how my way of storing the data could be improved or if there is a way to apply the needed functions over different levels of hierachy.

Akzeptierte Antwort

Jeff Miller
Jeff Miller am 26 Jun. 2018
Maybe your best bet is to use a much simpler table data structure. Each row in the table would correspond to one combination of distinguishable conditions, and certain columns would label the conditions. From your description, you would need label columns for at least the participant, the condition, and the subcondition. There might also be another column or two labelling cell, etc. Other table columns hold the data values for each condition; maybe there are 14 in your case.
Then you can select out arbitrary sets of rows by specifying the relevant values in the label colums, and do what you want with them (e.g., average).
You might be able to use a lot of the RawRT functionality, even though that toolkit was written for a very specific sort of data that is probably not what you have.
  3 Kommentare
Jeff Miller
Jeff Miller am 27 Jun. 2018
> Now I'm just wondering how I can unnest the cell data structure into such a flat format.
If the data are coming from external files, it might actually be easier just to read them in again, creating the appropriate labels for each row.
> In order to perform the analysis, I would probably split it up by variable of interest
There are two advantages to having all the variables in a single table: (a) you only need to create the indicators once, and (b) you have the different variables together in case you want to look at them together (e.g., check for correlations).
There are also some MATLAB built-ins that apply functions to various subsets of tables (e.g., splitapply), but I haven't used them.
Dom Janetzko
Dom Janetzko am 27 Jun. 2018
> If the data are coming from external files, it might actually be easier just to read them in again, creating the appropriate labels for each row.
They came from 720 external excel files which I didn't want to read in again. Therefore I was looking for a solution to rearrange the data in the cell array. I actually did find a way using nested for loops (again) writing to the respective cells. Now my data is nicely ordered.
> There are two advantages to having all the variables in a single table: (a) you only need to create the indicators once, and (b) you have the different variables together in case you want to look at them together (e.g., check for correlations).
You were right! I stayed with all the variables in one table.
I was actually also looking for a way to transfer the data from MATLAB to R. With your approach I could just write the table to csv and import it. Your data structure also lead to a neat little way to structure data in R called "tidy data". My analysis now works perfectly. Thanks once again!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Tables finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by