What is the reference category in the output for a Fitlme with categorical variables and three-way interaction terms?

11 Ansichten (letzte 30 Tage)
Below table summarizes the output of a mixed linear model with random intercept and slope run on structured panel data ('tbl_early'), where the model specifies as:
lme_PrimaryHU = fitlme(tbl_early, 'logRoL ~ 1 + logLoL + logAnnLioL + Dur + PPI + AvgEffTax_1 + HU + logEP*logAP + EQ*PrimaryHU*relInsLoss_1 + Wstorm*relInsLoss_1 +Storm*PrimaryHU*relInsLoss_1 +(FFR|ID)')
'Dur' has 4 levels and therefore I understood that the output shows three levels with estimates that relate to the fourth, i.e. the reference level ('Dur_one'). From the results one could interpret that Dur_onehalf trades at a discount if compared to Dur_one, all else equal.
'HU', 'Storm', 'EQ' and 'Wstorm' are binary variables, they are not mutually exclusive (cross-sectional analysis) and there is no case in the data in which all of them would be 0. Thus the question is, which of these variables Matlab chose as reference case. !Note that some of the peril variables are used in two- or three-way interaction terms that appear a bit lower in the table! 'PrimaryHU' is a binary variable that controlls for a certain condition which impacts the potential effects from relInsLoss or 'HU', 'Storm', 'EQ' and 'Wstorm' (e.g. 'EQ' alone is positive but not significant at p<0.1, 'EQ*relInsLoss' is negative and still not significant, 'EQ*primaryHU' is negative and significant, 'EQ*relInsLoss*PrimaryHU' is positive and significant). All remaining variables are continuous.
Two-way interactions used:
  • 'logAP*logEP'
  • 'Wstorm*relInsLoss'
Three-way interactins used:
  • 'Storm*PrimaryHU*relInsLoss'
  • 'EQ*PrimaryHU*relInsLoss
Other interaction terms or underlying variables' seperate estimates should be a product of using above interaction terms.
Many thanks for any help in advance!

Akzeptierte Antwort

Peng Li
Peng Li am 10 Mai 2020
The table you copied isn't the default display from matlab, so it's difficult to tell anything from there. It's like an ANOVA output since items (including interaction items) that are categorical each corresponding to only one line.
As you mentioned, for categorical variables, regression will give explicitely which level that record is for, and the level that without an output row is the reference level. Dichotomous variable is just a specific case of categorical variable. For example if you have sex (0/1), it usually gives sex[1] xx, xx, xx, xx..., that means 0 is used as a reference. Same strategy is used to display interaction items that involve categorical variables.
You have to explicitly make them categorical as well by, e.g., tbl.sex = categorical(tbl.sex); otherwise by default it is used as a continous variable, and thus 0 is always the default reference value.
In the equation you used, FFR doesn't appear as a fixed effect. If you only want a subject specific intercept, use (1|ID) otherwise make sure that that's what you really want.
  10 Kommentare
Robert Joniec
Robert Joniec am 19 Mai 2020
Hi Peng,
sorry for the silence! Your last answer helps to complete the picture. Thank you! Regarding the level of alpha I have used the level of 0.1 to decide if variables stay in the final model or not. I think what you refer to is well described in Wasserstein, Schirm, & Lazar (2019).
The sample size is 1500-ish, however, the model is quite complex and we see some sensitivity towards how results change if the model specification is altered. The relationship between our variables are indeed not trivial and this has been one the reasons why we decided that the interaction terms are needed. Remaining sensitivity is (partially) due to what lies behind the data. (It covers 14 points in time thus the annual 'Storm' sub-sample sometimes gets as little as 50 while a good portion of the 50 would then also include 'HU' and 'EQ' in all possible combinations...). Bootstrapping standard errors proofed to give limited insights as the design matrix of subsamples often is not of full rank if the interaction terms are included (due to the underlying information of course).
In your last point you mentioned that it is possible to drop the two-way interaction and to test again. How would you actually do it? From using the term 'Storm*PrimaryHU*relInsLoss_1' Matlab automatically includes the inherent two-way interaction terms and single variables, thus it sounds like a certain exclusion that would needed to be included in the code?
All the best - Rob
Peng Li
Peng Li am 28 Mai 2020
Hi Rob,
Sorry that I overlooked this thread. This is replying your last question: you could explicitly drop the interaction item in your equation by adding "- Storm*PrimaryHU*relInsLoss_1", or you can add each item seperately. Again, to make this simpler:
y ~ x1*x2
y ~ x1 + x2 + x1:x2
these two are identical, both being with the interaction between x1 and x2.
y ~ x1 + x2
y ~ x1*x2 - x1:x2
these two are identical, both being without are the interaction item.
y ~ x1*x2*x3 - x1:x2:x3
means all main effects plus two way interactions between each pairs. The three way interaction item is abandoned by using - x1:x2:x3.
Check this
Hope this helps!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by