How to perform a linear model for effects of categorical variables on a numeric variable?

14 Ansichten (letzte 30 Tage)
Hi all!
I have the following table (table1) with 68 columns and 100 rows containing numerical and categorical data. I'm only interested in the numeric variable 'Height' and the categorical variables 'Sex', 'Age' and 'Treatment'.
load table1
% Let's summarise the content of the variables I'm interested in:
% Sex variable: Female and Male
% Age variable: Prepuberal and Adult
% Treatment variable: Drug and Control
% Height variable: Randomly assigned numerical values
I want to perform a linear model to look for 'Sex', 'Age' and 'Treatment' effects on 'Height'. Note that 'Sex', 'Age' and 'Treatment' are categorical variables, and the response variable 'Height' is numeric.
% Linear model of sex, age and treatment effects on Height
mdl = fitlm(table1,'Height~Age+Treatment+Sex')
Then, I want to perform linear models to look for interaction effects between 'Sex*Age', 'Sex*Treatment', 'Age*Treatment', 'Sex*Age*Treatment' on 'Height'.
mdlsexandage = fitlm(table1,'Height~Sex*Age')
mdlsexandtreatment = fitlm(table1,'Height~Sex*Treatment')
mdlageandtreatment = fitlm(table1,'Height~Age*Treatment')
mdlsexandageandtreatment = fitlm(table1,'Height~Sex*Age*Treatment')
Am I doing this right? How do I interpret the resulting models?
Note that table1 was randomly generated, so I don't expect p values making sense. I am only interested in learning to code the linear models, thereby interpreting them :)
Thanks for sharing your knowledge, you all are always helpful!

Akzeptierte Antwort

Jeff Miller
Jeff Miller am 14 Okt. 2024
Since your predictor variables are all categorical, it sounds like what you are looking for is most commonly called "Analysis of Variance" (ANOVA). The usual way to do that is to just fit the full model with all terms:
mdlsexandageandtreatment = fitlm(table1,'Height~Sex*Age*Treatment')
Fitting this model will give you measures looking at the height difference between males and females (aka "effect of sex on height"), the height difference between the two age groups (aka "effect of age"), the extent to which the effect of age differs between males and females ("interaction of age and gender"), etc.
You can also get estimates of some of these effects by fitting the simpler models (e.g.,Height~Sex+Age+Treatment), but that is usually a mistake because the simpler models consider any omitted effects or interactions to be error. If those omitted sources are larger, error gets inflated, making it harder to see the real effects of terms in the simpler model.
  3 Kommentare
Jeff Miller
Jeff Miller am 16 Okt. 2024
@Sara Woods Just to make sure there is no misunderstanding, your interpretations should be based on the p-values that you get from
anova(mdlsexandageandtreatment)
not on the p's that you find inside mdlsexandageandtreatment.Coefficients.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Hitesh
Hitesh am 14 Okt. 2024
Yes, your approach to fitting linear models in MATLAB using the fitlm function is correct. You can use "disp" function to get the summary of the each model which will contain estimated coefficients such as tStat and pValues.
disp(mdlsexandage);
Interpretation:
  • Coefficients: The output will include coefficients for each level of the categorical variables. These coefficients represent the change in 'Height' relative to the reference level.
  • P-values: Indicate the statistical significance of each predictor. Since the data is randomly generated, these p-values may not be meaningful in your case.
  • R-squared: Provides a measure of how well the model explains the variability of the response data.
Comparison of models can be done using metrics like adjusted R-squared to determine which model best fits the data.If interaction terms are significant, it suggests that the relationship between the predictors and the response variable is not simply additive. Instead, the effect of one predictor depends on the level of another predictor.
For more information about "disp" function, refer to the below MATLAB documentation:

Kategorien

Mehr zu Linear and Nonlinear Regression finden Sie in Help Center und File Exchange

Produkte


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by