Calculations on Dataset Arrays
This example shows how to perform calculations on dataset arrays.
Load sample data.
Import the data from the comma-separated text file
testScores.csv
.
ds = dataset('File','testScores.csv','Delimiter',',')
ds = LastName School Test1 Test2 Test3 {'Jeong' } {'XYZ School'} 90 87 93 {'Collins' } {'XYZ School'} 87 85 83 {'Torres' } {'XYZ School'} 86 85 88 {'Phillips'} {'ABC School'} 75 80 72 {'Ling' } {'ABC School'} 89 86 87 {'Ramirez' } {'ABC School'} 96 92 98 {'Lee' } {'XYZ School'} 78 75 77 {'Walker' } {'ABC School'} 91 94 92 {'Garcia' } {'ABC School'} 86 83 85 {'Chang' } {'XYZ School'} 79 76 82
There are 3 test scores for each of 10 students, in wide format.
Average dataset array variables.
Compute the average (mean) test score for each student in the dataset array,
and store it in a new variable, TestAvg
. Test scores are in
columns 3 to 5.
Use double
to convert the specified dataset array variables
into a numeric array. Then, calculate the mean across the second dimension
(across columns) to get the test average for each student.
ds.TestAvg = mean(double(ds(:,3:5)),2); ds(:,{'LastName','TestAvg'})
ans = LastName TestAvg {'Jeong' } 90 {'Collins' } 85 {'Torres' } 86.333 {'Phillips'} 75.667 {'Ling' } 87.333 {'Ramirez' } 95.333 {'Lee' } 76.667 {'Walker' } 92.333 {'Garcia' } 84.667 {'Chang' } 79
Summarize the dataset array using a grouping variable.
Compute the mean and maximum average test scores for each school.
stats = grpstats(ds,'School',{'mean','max'},'DataVars','TestAvg')
stats = School GroupCount mean_TestAvg max_TestAvg XYZ School {'XYZ School'} 5 83.4 90 ABC School {'ABC School'} 5 87.067 95.333
This returns a new dataset array containing the specified summary statistics
for each level of the grouping variable, School
.
Replace data values.
The denominator for each test score is 100. Convert the test score denominator to 25.
scores = double(ds(:,3:)); newScores = scores*25/100; ds = replacedata(ds,newScores,3:5)
ds = LastName School Test1 Test2 Test3 TestAvg {'Jeong' } {'XYZ School'} 22.5 21.75 23.25 90 {'Collins' } {'XYZ School'} 21.75 21.25 20.75 85 {'Torres' } {'XYZ School'} 21.5 21.25 22 86.333 {'Phillips'} {'ABC School'} 18.75 20 18 75.667 {'Ling' } {'ABC School'} 22.25 21.5 21.75 87.333 {'Ramirez' } {'ABC School'} 24 23 24.5 95.333 {'Lee' } {'XYZ School'} 19.5 18.75 19.25 76.667 {'Walker' } {'ABC School'} 22.75 23.5 23 92.333 {'Garcia' } {'ABC School'} 21.5 20.75 21.25 84.667 {'Chang' } {'XYZ School'} 19.75 19 20.5 79
The first two lines of code extract the test data and perform
the desired calculation. Then, replacedata
inserts
the new test scores back into the dataset array.
The variable of test score averages, TestAvg
,
is now the final score for each student.
Change variable name.
Change the variable name to Final
.
ds.Properties.VarNames{end} = 'Final';
ds
ds = LastName School Test1 Test2 Test3 Final {'Jeong' } {'XYZ School'} 22.5 21.75 23.25 90 {'Collins' } {'XYZ School'} 21.75 21.25 20.75 85 {'Torres' } {'XYZ School'} 21.5 21.25 22 86.333 {'Phillips'} {'ABC School'} 18.75 20 18 75.667 {'Ling' } {'ABC School'} 22.25 21.5 21.75 87.333 {'Ramirez' } {'ABC School'} 24 23 24.5 95.333 {'Lee' } {'XYZ School'} 19.5 18.75 19.25 76.667 {'Walker' } {'ABC School'} 22.75 23.5 23 92.333 {'Garcia' } {'ABC School'} 21.5 20.75 21.25 84.667 {'Chang' } {'XYZ School'} 19.75 19 20.5 79
See Also
dataset
| double
| replacedata
| grpstats
Related Examples
- Stack or Unstack Dataset Arrays
- Access Data in Dataset Array Variables
- Select Subsets of Observations
- Index and Search Dataset Arrays