Calculations on Dataset Arrays

This example shows how to perform calculations on dataset arrays.

Import the data from the comma-separated text file testScores.csv.

ds = dataset('File','testScores.csv','Delimiter',',')
ds =

LastName            School                Test1    Test2    Test3
{'Jeong'   }        {'XYZ School'}        90       87       93
{'Collins' }        {'XYZ School'}        87       85       83
{'Torres'  }        {'XYZ School'}        86       85       88
{'Phillips'}        {'ABC School'}        75       80       72
{'Ling'    }        {'ABC School'}        89       86       87
{'Ramirez' }        {'ABC School'}        96       92       98
{'Lee'     }        {'XYZ School'}        78       75       77
{'Walker'  }        {'ABC School'}        91       94       92
{'Garcia'  }        {'ABC School'}        86       83       85
{'Chang'   }        {'XYZ School'}        79       76       82

There are 3 test scores for each of 10 students, in wide format.

Average dataset array variables.

Compute the average (mean) test score for each student in the dataset array, and store it in a new variable, TestAvg. Test scores are in columns 3 to 5.

Use double to convert the specified dataset array variables into a numeric array. Then, calculate the mean across the second dimension (across columns) to get the test average for each student.

ds.TestAvg = mean(double(ds(:,3:5)),2);
ds(:,{'LastName','TestAvg'})
ans =

LastName            TestAvg
{'Jeong'   }            90
{'Collins' }            85
{'Torres'  }        86.333
{'Phillips'}        75.667
{'Ling'    }        87.333
{'Ramirez' }        95.333
{'Lee'     }        76.667
{'Walker'  }        92.333
{'Garcia'  }        84.667
{'Chang'   }            79

Summarize the dataset array using a grouping variable.

Compute the mean and maximum average test scores for each school.

stats = grpstats(ds,'School',{'mean','max'},'DataVars','TestAvg')
stats =

School                GroupCount    mean_TestAvg    max_TestAvg
XYZ School    {'XYZ School'}        5               83.4              90
ABC School    {'ABC School'}        5             87.067          95.333

This returns a new dataset array containing the specified summary statistics for each level of the grouping variable, School.

Replace data values.

The denominator for each test score is 100. Convert the test score denominator to 25.

scores = double(ds(:,3:));
newScores = scores*25/100;
ds = replacedata(ds,newScores,3:5)
ds =

LastName            School                Test1    Test2    Test3    TestAvg
{'Jeong'   }        {'XYZ School'}         22.5    21.75    23.25        90
{'Collins' }        {'XYZ School'}        21.75    21.25    20.75        85
{'Torres'  }        {'XYZ School'}         21.5    21.25       22    86.333
{'Phillips'}        {'ABC School'}        18.75       20       18    75.667
{'Ling'    }        {'ABC School'}        22.25     21.5    21.75    87.333
{'Ramirez' }        {'ABC School'}           24       23     24.5    95.333
{'Lee'     }        {'XYZ School'}         19.5    18.75    19.25    76.667
{'Walker'  }        {'ABC School'}        22.75     23.5       23    92.333
{'Garcia'  }        {'ABC School'}         21.5    20.75    21.25    84.667
{'Chang'   }        {'XYZ School'}        19.75       19     20.5        79

The first two lines of code extract the test data and perform the desired calculation. Then, replacedata inserts the new test scores back into the dataset array.

The variable of test score averages, TestAvg, is now the final score for each student.

Change variable name.

Change the variable name to Final.

ds.Properties.VarNames{end} = 'Final';
ds
ds =

LastName            School                Test1    Test2    Test3    Final
{'Jeong'   }        {'XYZ School'}         22.5    21.75    23.25        90
{'Collins' }        {'XYZ School'}        21.75    21.25    20.75        85
{'Torres'  }        {'XYZ School'}         21.5    21.25       22    86.333
{'Phillips'}        {'ABC School'}        18.75       20       18    75.667
{'Ling'    }        {'ABC School'}        22.25     21.5    21.75    87.333
{'Ramirez' }        {'ABC School'}           24       23     24.5    95.333
{'Lee'     }        {'XYZ School'}         19.5    18.75    19.25    76.667
{'Walker'  }        {'ABC School'}        22.75     23.5       23    92.333
{'Garcia'  }        {'ABC School'}         21.5    20.75    21.25    84.667
{'Chang'   }        {'XYZ School'}        19.75       19     20.5        79