stats
Description
Examples
Load popcorn yield data.
load popcorn.matThe columns of the 6-by-3 matrix popcorn contain popcorn yield observations in cups for the brands Gourmet, National, and Generic. The first three rows of popcorn correspond to popcorn that was popped using an air popper and the last three rows correspond to popcorn popped in oil.
Create string arrays of factor values for the brand and type of popper using the repmat function.
brand = [repmat("Gourmet",6,1); repmat("National",6,1); repmat("Generic",6,1)]; popperType = repmat(["Air";"Air";"Air";"Oil";"Oil";"Oil"], [3, 1]); factors = {brand,popperType};
Perform a two-way ANOVA to test the null hypothesis that the mean popcorn yield is not affected by the brand of popcorn and popper type.
aov = anova(factors,popcorn(:),FactorNames=["Brand","PopperType"],ModelSpecification="interactions")
aov =
2-way anova, constrained (Type III) sums of squares.
Y ~ 1 + Brand*PopperType
SumOfSquares DF MeanSquares F pValue
____________ __ ___________ ____ __________
Brand 15.75 2 7.875 56.7 7.679e-07
PopperType 4.5 1 4.5 32.4 0.00010037
Brand:PopperType 0.083333 2 0.041667 0.3 0.74622
Error 1.6667 12 0.13889
Total 22 17
Properties, Methods
By default, anova displays a component ANOVA table.
Generate a summary ANOVA table.
s = stats(aov,"summary")s=5×5 table
SumOfSquares DF MeanSquares F pValue
____________ __ ___________ _____ __________
Linear 20.25 3 6.75 48.6 5.4835e-07
NonLinear 0.083333 2 0.041667 0.3 0.74622
Regression 20.333 5 4.0667 29.28 2.5065e-06
Error 1.6667 12 0.13889
Total 22 17 1.2941
The row Linear corresponds to the terms Brand and PopperType in the ANOVA model. The small p-value in the Linear row indicates that Brand and PopperType have a statistically significant combined effect on the popcorn yield. The row NonLinear corresponds to the term Brand:PopperType. The large p-value in the NonLinear row indicates that the interaction term does not have a statistically significant effect on the popcorn yield. The small p-value in the row Regression indicates that the ANOVA model is a better predictor of the response data than the mean of the data.
Load the sample car data.
load carsmallData for the country of origin, model year, and mileage is stored in the variables Origin, Model_Year, and MPG, respectively.
Perform a two-way ANOVA to test the null hypothesis that mean mileage is not affected by the country of origin or model year.
aov = anova({Origin, Model_Year},MPG,RandomFactors=[1 2],FactorNames=["Origin" "Year"])aov =
2-way anova, constrained (Type III) sums of squares.
Y ~ 1 + Origin + Year
SumOfSquares DF MeanSquares F pValue
____________ __ ___________ ______ __________
Origin 1078.1 5 215.62 10.675 5.3303e-08
Year 2638.4 2 1319.2 65.312 5.5975e-18
Error 1737 86 20.198
Total 6005.3 93
Properties, Methods
Display an expected mean squares table for the ANOVA.
[~,ems] = stats(aov)
ems=3×5 table
Type ExpectedMeanSquares MeanSquaresDenominator DFDenominator FDenominator
________ __________________________ ______________________ _____________ ____________
Origin "random" "9.159*V(Origin)+V(Error)" 20.198 86 MS(Error)
Year "random" "29.5014*V(Year)+V(Error)" 20.198 86 MS(Error)
Error "random" "V(Error)"
The formulas for the expected mean squares of the random factors Origin and Year contain terms for their respective variance components. You can use the expected mean squares formulas to compare how much of the expected mean squares is due to the variance in the error and how much is due to the variance components of the random terms.
Input Arguments
Analysis of variance results, specified as an anova object.
The properties of aov contain the factors and response data used by
stats to compute the statistics in the ANOVA table.
Type of ANOVA table, specified as "component" or
"summary".
Example: "summary"
Data Types: char | string
Type of the sum of squares used to perform the ANOVA, specified as
"three", "two", "one", or
"hierarchical". The stats function
ignores sstype unless the ANOVA type is
"component". For a model containing main effects but no
interactions, the value of sstype influences the computations on
the unbalanced data only.
The sum of squares of a term () is defined as the reduction in the sum of squares error (SSE) obtained by adding the term to a model that excludes it. The formula for the sum of squares of a term Term has the form
where n is the number of observations, are the response data, are the factors used to perform the ANOVA, is a model that excludes Term, and is a model that includes Term. Both and are specified by SumOfSquaresType. The variables and are the sum of squares errors for and , respectively. You can specify and using one of the options for SumOfSquaresType described
in the following table.
| Option | Type of Sum of Squares |
|---|---|
"three" (default) | is the full ANOVA model specified in the property
|
"two" | is a model composed of all terms in the ANOVA model
specified in the property |
"one" | is a model composed of all the terms that precede
Term in the ANOVA model specified in the property
|
"hierarchical" | and are defined as in Type II, except powers of Term are treated as terms that contain Term. |
Example: Component="hierarchical"
Data Types: char | string
Output Arguments
ANOVA statistics, returned as a table.
The contents of s depend on the ANOVA type specified in
type.
If
typeis"component", thenscontains ANOVA statistics for each variable in the model except the constant (intercept) term. The table includes these columns for each variable:Column Description SumOfSquaresSum of squares explained by the term and calculated depending on
sstype.DFDegrees of freedom
DFof a numeric variable is 1.DFof a categorical variable is the number of dummy variables created for the category (number of categories – 1).DFof an error term is the difference between theDFof the total and the sum of theDFfor the model terms.DFof the total isaov.NumObservations–1.
MeanSquaresMean squares, defined by
MeanSquares=SumOfSquares/DF.MeanSquaresfor the error term is the mean squared error (MSE).FF-statistic value to test the null hypothesis that the corresponding coefficient is zero; computed by
F=MeanSquares/MSE.When the null hypothesis is true, the F-statistic follows the F-distribution.
pValuep-value of the F-statistic value
If
typeis"summary", thenscontains summary statistics of grouped terms for each row. The summary statistics are calculated using Type I sum of squares. The table includes the same columns as"component"and these rows:Row Description TotalTotal statistics
SumOfSquares— Total sum of squares, which is the sum of the squared deviations of the response around its meanDF— Sum of degrees of freedom ofRegressionandError
RegressionStatistics for the model as a whole
SumOfSquares— Model sum of squares, which is the sum of the squared deviations of the fitted value around the response mean.FandpValue— These values provide a test of whether the model as a whole fits significantly better than a degenerate model consisting of only a constant term.
LinearStatistics for linear terms
SumOfSquares— Sum of squares for linear terms, which is the difference between the model sum of squares and the sum of squares for nonlinear terms.FandpValue— These values provide a test of whether the model with only linear terms fits better than a degenerate model consisting of only a constant term.statsuses the mean squared error that is based on the full model to compute this F-value, so the F-value obtained by dropping the nonlinear terms and repeating the test is not the same as the value in this row.
NonLinearStatistics for nonlinear terms
SumOfSquares— Sum of squares for nonlinear (higher-order or interaction) terms, which is the increase in the residual sum of squares obtained by keeping only the linear terms and dropping all nonlinear terms.FandpValue— These values provide a test of whether the full model fits significantly better than a smaller model consisting of only the linear terms.
ErrorStatistics for error
SumOfSquares— Residual sum of squares, which is the sum of the squared residual valuesMeanSquares— Mean squared error, used to compute the F-statistic values forRegression,Linear, andNonLinear
If the data contains replications (multiple observations sharing the same factor values),
salso contains rows forLackOfFitandPureError.LackOfFitandPureErrorbreak downErrorfurther.LackOfFitLack-of-fit statistics
SumOfSquares— Sum of squares due to lack of fit, which is the difference between the residual sum of squares and the replication sum of squares.FandpValue— The F-statistic value is the ratio of lack-of-fitMeanSquaresto pure errorMeanSquares. The ratio provides a test of bias by measuring whether the variation of the residuals is larger than the variation of the replications. A low p-value implies that adding additional terms to the model can improve the fit.
PureErrorStatistics for pure error
SumOfSquares— Replication sum of squares, obtained by finding the sets of points with identical predictor values, computing the sum of squared deviations around the mean within each set, and pooling the computed valuesMeanSquares— Model-free pure error variance estimate of the response
Estimated mean squares information, returned as a table. The argument
ems contains a row for each term, and a row for the error. The
table returned by ems has the following variables.
Type— An indicator of whether the term is fixed or random.ExpectedMeanSquares— A formula of the expected mean squares.MeanSquaresDenominator— The value of the denominator in the calculation of the F-statistic.DFDenominator— The value of the degrees of freedom in the calculation of the F-statistic denominator.FDenominator— A formula for the denominator in the calculation of the F-statistic. The denominator changes depending on whetheraov.Formulahas random interaction terms.
You can use the ems table to determine if the
variance of a random term has a large effect on the estimated mean squares.
Data Types: table
References
[1] Dunn, O. J., and V. A. Clark. Applied Statistics: Analysis of Variance and Regression. New York: Wiley, 1974.
[2] Goodnight, J. H., and F. M. Speed. Computing Expected Mean Squares. Cary, NC: SAS Institute, 1978.
[3] Seber, G. A. F., and A. J. Lee. Linear Regression Analysis. 2nd ed. Hoboken, NJ: Wiley-Interscience, 2003.
Version History
Introduced in R2022b
See Also
anova | varianceComponent | N-Way ANOVA | One-Way ANOVA | Two-Way ANOVA
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Website auswählen
Wählen Sie eine Website aus, um übersetzte Inhalte (sofern verfügbar) sowie lokale Veranstaltungen und Angebote anzuzeigen. Auf der Grundlage Ihres Standorts empfehlen wir Ihnen die folgende Auswahl: .
Sie können auch eine Website aus der folgenden Liste auswählen:
So erhalten Sie die bestmögliche Leistung auf der Website
Wählen Sie für die bestmögliche Website-Leistung die Website für China (auf Chinesisch oder Englisch). Andere landesspezifische Websites von MathWorks sind für Besuche von Ihrem Standort aus nicht optimiert.
Amerika
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)