Main Content

Calculations When Tables Have Both Numeric and Nonnumeric Data

This example shows how to perform calculations on data in tables when they have both numeric and nonnumeric data. After you identify the table variables that contain numeric data, you can access the data in those variables by using either curly braces or dot notation. Then you can perform arithmetic operations or call functions on the numeric data and assign the result back into the table, all in one line of code. You also can use the rowfun function for calculations across the rows of a table and the varfun function for calculations along the variables. If your table has groups of data within it, you can use the groupsummary, rowfun, and varfun functions to perform calculations for each group in the table.

Read Sample Data into Table

Read data from a CSV (comma-separated values) file, testScores.csv, into a table by using the readtable function. The sample file contains test scores for 10 students who attend two different schools. The output table contains variables that have numeric data and other variables that have text data. One of these variables, School, has a fixed set of values or categories. These categories denote two groups of students within this table. Convert School to a categorical variable.

scores = readtable("testScores.csv","TextType","string");
scores.School = categorical(scores.School)
scores=10×5 table
     LastName       School      Test1    Test2    Test3
    __________    __________    _____    _____    _____

    "Jeong"       XYZ School     90       87       93  
    "Collins"     XYZ School     87       85       83  
    "Torres"      XYZ School     86       85       88  
    "Phillips"    ABC School     75       80       72  
    "Ling"        ABC School     89       86       87  
    "Ramirez"     ABC School     96       92       98  
    "Lee"         XYZ School     78       75       77  
    "Walker"      ABC School     91       94       92  
    "Garcia"      ABC School     86       83       85  
    "Chang"       XYZ School     79       76       82  

Create Subtable with Numeric Data

One straightforward way to work with the numeric data is to create a subtable that has only the numeric variables. You can create a subtable by indexing into a table using parentheses and specifying rows and variables. The subtable is a new, smaller table that contains only the specified rows and variables from the old table.

For example, create a subtable from scores that has only the test scores. Because the first two variables have nonnumeric data, you can index into this table specifying the other variables.

numericScores = scores(:,3:end)
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

     90       87       93  
     87       85       83  
     86       85       88  
     75       80       72  
     89       86       87  
     96       92       98  
     78       75       77  
     91       94       92  
     86       83       85  
     79       76       82  

Another way to specify variables is to use the vartype function to specify them by data type. This function is useful when you have a large table with many variables that have different data types. It returns a subscript that you can use to specify table variables.

numericVars = vartype("numeric")
numericVars = 
	table vartype subscript:

		Select table variables matching the type 'numeric'

numericScores = scores(:,numericVars)
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

     90       87       93  
     87       85       83  
     86       85       88  
     75       80       72  
     89       86       87  
     96       92       98  
     78       75       77  
     91       94       92  
     86       83       85  
     79       76       82  

Calculate When Subtable Has Only Numeric Data

(Since R2023a) You can perform operations on a table directly, as long as all its variables have data types that support the operations. For more information, see Direct Calculations on Tables and Timetables.

For example, scale the numeric data so the test scores are on a 25-point scale.

numericScores = numericScores .* 0.25
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

     22.5    21.75    23.25
    21.75    21.25    20.75
     21.5    21.25       22
    18.75       20       18
    22.25     21.5    21.75
       24       23     24.5
     19.5    18.75    19.25
    22.75     23.5       23
     21.5    20.75    21.25
    19.75       19     20.5

Before R2023a, you cannot use this syntax. Instead, index into the table using curly braces, or use the Variables affordance to specify all table rows and variables. These syntaxes return the same result as the previous operation and work in all releases.

numericScores{:,:} = numericScores{:,:} .* 0.25
numericScores.Variables = numericScores.Variables .* 0.25

When you use these syntaxes, they extract the table contents and concatenate them into an array, perform the calculation, and assign the results back into the table. The only requirement is that the variables must all have data types that allow them to be concatenated.

  • With curly braces, you can also specify a subset of rows and variables, as in numericScores{1:5,["Test1","Test3"]}.

  • With Variables, you always get all rows and all variables concatenated into an array.

(Since R2023a) You can also call many mathematical and statistical functions on a table directly. For example, subtract the minimum value within each table variable from that variable.

numericScores = numericScores - min(numericScores)
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

    3.75        3     5.25 
       3      2.5     2.75 
    2.75      2.5        4 
       0     1.25        0 
     3.5     2.75     3.75 
    5.25     4.25      6.5 
    0.75        0     1.25 
       4     4.75        5 
    2.75        2     3.25 
       1     0.25      2.5 

Again, before R2023a you cannot use this syntax. Instead, use either of the following syntaxes. They return the same result and work in all releases.

numericScores{:,:} = numericScores{:,:} - min(numericScores{:,:})
numericScores.Variables = numericScores.Variables - min(numericScores.Variables)

Calculate on One Variable in Any Table

In all releases, you can also perform calculations on one variable at a time by using dot notation and variable names. For example, add a correction worth five points to the last set of scores in Test3.

Because the other table variables are unaffected by operations on an individual variable, you can perform this kind of calculation in any table. It does not matter whether the other variables have numeric or nonnumeric data.

numericScores.Test3 = numericScores.Test3 + 5
numericScores=10×3 table
    Test1    Test2    Test3
    _____    _____    _____

    3.75        3     10.25
       3      2.5      7.75
    2.75      2.5         9
       0     1.25         5
     3.5     2.75      8.75
    5.25     4.25      11.5
    0.75        0      6.25
       4     4.75        10
    2.75        2      8.25
       1     0.25       7.5

Calculate Across Rows in Full Table

The full table, scores, has numeric and nonnumeric variables. In all releases, use curly-brace indexing or dot notation to perform calculations on specified rows and variables within tables.

For example, find the mean, minimum, and maximum values of the test scores for each student. Calculate these values across each row. Assign them to scores as new table variables.

One simple, useful way is to extract the data into a matrix, call a function on it, and then assign the output to a new table variable. For example, calculate the mean test scores across each row. Then add them to scores in a new table variable, TestMean. Use curly braces to extract the numeric data from Test1, Test2, and Test3 into a matrix. To calculate the mean across rows, specify the dimension as 2 when you call mean.

vars = ["Test1","Test2","Test3"];
scores.TestMean = mean(scores{:,vars},2)
scores=10×6 table
     LastName       School      Test1    Test2    Test3    TestMean
    __________    __________    _____    _____    _____    ________

    "Jeong"       XYZ School     90       87       93           90 
    "Collins"     XYZ School     87       85       83           85 
    "Torres"      XYZ School     86       85       88       86.333 
    "Phillips"    ABC School     75       80       72       75.667 
    "Ling"        ABC School     89       86       87       87.333 
    "Ramirez"     ABC School     96       92       98       95.333 
    "Lee"         XYZ School     78       75       77       76.667 
    "Walker"      ABC School     91       94       92       92.333 
    "Garcia"      ABC School     86       83       85       84.667 
    "Chang"       XYZ School     79       76       82           79 

Another way to perform calculations across rows is to use the rowfun function. You do not need to extract data from the table when using rowfun. Instead, pass the table and a function to apply to the data as input arguments to rowfun. While the syntax is a little more complex, rowfun can be useful when the function that you apply takes multiple input arguments or returns multiple output arguments.

For example, use the bounds function to find the minimum and maximum test scores. The bounds function returns two output arguments, so apply it to scores by using rowfun. The output of rowfun is a new table that has TestMin and TestMax variables. In this case, also specify "SeparateInputs" as false so that values across each row are combined into a vector before being passed to bounds.

minmaxTest = rowfun(@bounds, ...
                    scores, ...
                    "InputVariables",vars, ...
                    "OutputVariableNames",["TestMin","TestMax"], ...
                    "SeparateInputs",false)
minmaxTest=10×2 table
    TestMin    TestMax
    _______    _______

      87         93   
      83         87   
      85         88   
      72         80   
      86         89   
      92         98   
      75         78   
      91         94   
      83         86   
      76         82   

Concatenate scores and minmaxTest so that these values are in one table.

scores = [scores minmaxTest]
scores=10×8 table
     LastName       School      Test1    Test2    Test3    TestMean    TestMin    TestMax
    __________    __________    _____    _____    _____    ________    _______    _______

    "Jeong"       XYZ School     90       87       93           90       87         93   
    "Collins"     XYZ School     87       85       83           85       83         87   
    "Torres"      XYZ School     86       85       88       86.333       85         88   
    "Phillips"    ABC School     75       80       72       75.667       72         80   
    "Ling"        ABC School     89       86       87       87.333       86         89   
    "Ramirez"     ABC School     96       92       98       95.333       92         98   
    "Lee"         XYZ School     78       75       77       76.667       75         78   
    "Walker"      ABC School     91       94       92       92.333       91         94   
    "Garcia"      ABC School     86       83       85       84.667       83         86   
    "Chang"       XYZ School     79       76       82           79       76         82   

Calculate Along Specified Variables in Full Table

Find the mean score for each test. Calculate these values along the table variables.

The simplest way is to use mean. First use curly braces to extract the numeric data from Test1, Test2, and Test3 into a matrix. Then call mean to calculate the mean of each column of the matrix. The output is a numeric vector where each element is the mean of a table variable.

vars = ["Test1","Test2","Test3"];
meanOfEachTest = mean(scores{:,vars})
meanOfEachTest = 1×3

   85.7000   84.3000   85.7000

Another way to perform calculations along table variables is to use the varfun function. You do not need to extract data from the table when using varfun. Instead, pass the table and a function to apply to the data as input arguments to varfun.

Calculate the mean scores using varfun. The output is a new table with meaningful names for the table variables.

meanOfEachTest = varfun(@mean, ...
                        scores, ...
                        "InputVariables",vars)
meanOfEachTest=1×3 table
    mean_Test1    mean_Test2    mean_Test3
    __________    __________    __________

       85.7          84.3          85.7   

Calculate Using Groups of Data Within Full Table

If your table has one or more grouping variables, then you can perform calculations on groups of data within the table. You can use the values in a grouping variable to specify the groups that the rows belong to.

For example, the School variable in scores has two values, ABC School and XYZ School. You can think of these two values as categories that denote groups of data in scores. In this case, you can perform calculations by school.

To apply a function and use grouping variables, you can use the varfun function. You can specify a function, such as mean, and then use varfun to apply it to each table variable that you specify. When you also specify grouping variables, varfun applies the function to each group within each table variable.

Calculate the mean score for each test by school.

vars = ["Test1","Test2","Test3"];
meanScoresBySchool = varfun(@mean, ...
                            scores, ...
                            "InputVariables",vars, ...
                            "GroupingVariables","School")
meanScoresBySchool=2×5 table
      School      GroupCount    mean_Test1    mean_Test2    mean_Test3
    __________    __________    __________    __________    __________

    ABC School        5            87.4            87          86.8   
    XYZ School        5              84          81.6          84.6   

Starting in R2018a, you also can use the groupsummary function to perform calculations on groups of data in each table variable.

meanScoresBySchool = groupsummary(scores,"School","mean",vars)
meanScoresBySchool=2×5 table
      School      GroupCount    mean_Test1    mean_Test2    mean_Test3
    __________    __________    __________    __________    __________

    ABC School        5            87.4            87          86.8   
    XYZ School        5              84          81.6          84.6   

The syntax for groupsummary is a bit simpler. Also, you can use groupsummary to specify multiple methods at once. For example, find both the minimum and maximum scores of each test by school.

minmaxBySchool = groupsummary(scores,"School",["min","max"],vars)
minmaxBySchool=2×8 table
      School      GroupCount    min_Test1    max_Test1    min_Test2    max_Test2    min_Test3    max_Test3
    __________    __________    _________    _________    _________    _________    _________    _________

    ABC School        5            75           96           80           94           72           98    
    XYZ School        5            78           90           75           87           77           93    

To use all the predefined methods of groupsummary, specify "all" as the method. Calculate all statistics on the mean test score by school.

allStatsBySchool = groupsummary(scores,"School","all","TestMean")
allStatsBySchool=2×14 table
      School      GroupCount    mean_TestMean    sum_TestMean    min_TestMean    max_TestMean    range_TestMean    median_TestMean    mode_TestMean    var_TestMean    std_TestMean    nummissing_TestMean    nnz_TestMean    numunique_TestMean
    __________    __________    _____________    ____________    ____________    ____________    ______________    _______________    _____________    ____________    ____________    ___________________    ____________    __________________

    ABC School        5            87.067           435.33          75.667          95.333           19.667            87.333            75.667           57.967          7.6136                0                  5                  5         
    XYZ School        5              83.4              417          76.667              90           13.333                85            76.667           29.856           5.464                0                  5                  5         

Sometimes you might want to find a particular value in one table variable and then find the corresponding value in another table variable. In such cases use rowfun.

For example, find the student in each school who had the highest mean test score. The attached supporting function, findNameAtMax, returns both the highest score and the name of the student who had that score. Use rowfun to apply findNameAtMax to each group of students. The rowfun function is suitable because findNameAtMax has multiple input arguments (last names and test scores) and also returns multiple output arguments.

maxScoresBySchool = rowfun(@findNameAtMax, ...
                           scores, ...
                           "InputVariables",["LastName","TestMean"], ...
                           "GroupingVariables","School", ...
                           "OutputVariableNames",["max_TestMean","LastName"])
maxScoresBySchool=2×4 table
      School      GroupCount    max_TestMean    LastName 
    __________    __________    ____________    _________

    ABC School        5            95.333       "Ramirez"
    XYZ School        5                90       "Jeong"  

Supporting Function

function [maxValue,lastName] = findNameAtMax(names,values)
    % Return maximum value and the last name 
    % from the row at which the maximum value occurred
    [maxValue,maxIndex] = max(values);
    lastName = names(maxIndex);
end

See Also

| | | | |

Related Topics