Hi everyone!
I am having troubles understanding how to perform the correct anova test on my data. I have 16 subjects who are being treated with 2 different treatments. For each subject and each treatment measures are taken at 6 different time points (attached is the excel file). Is repeated measures anova the best way to go? If so, what is the most apporpriate command to use?
Here is the code I have so far. The results are kind of strange so I am not sure if it is wrong or I am just interpreting it the wrong way.
Thanks,
Giulia
Q2 = xlsread('Q2.xls');
Treatment = categorical(Q2(:,2));
ID = Q2(:,1);
t = table(Treatment, Q2(:,3),Q2(:,4),Q2(:,5),Q2(:,6),Q2(:,7), Q2(:,8),'VariableNames', {'Treatment', 't1', 't2', 't3', 't4', 't5', 't6'});
Time = [1 2 3 4 5 6]';
rm = fitrm( t, 't1-t6~Treatment', 'WithinDesign', Time);
ranovatbl = ranova(rm);
disp(ranovatbl);
multcompare(rm, 'Time', 'By', 'Treatment')

1 Kommentar

Scott MacKenzie
Scott MacKenzie am 27 Apr. 2021
Bearbeitet: Scott MacKenzie am 27 Apr. 2021
Below is an Anova I just did on your data set (not using MATLAB). What you have is a 2 x 6 within-subjects design. You have two independent variables: Treatment (2 levels) and Time Point (6 levels). What was the dependent variable? In your question, you only say "measures are taken". Measures of what? Here, I'm assuming you measured something like reaction time or task completion time with milliseconds as the units.
Here is a summary of the key results: The effect of treatment on task completion time was not statistically significant, F(1,10) = 1.131, p > .05. The effect of time point on task completion time was statistically significant, F(5,50) = 5.127, p < .001. The Treatment x Time Point interaction effect was also statistically significant, F(5,50) = 6.478, p = .0001. See below for an Anova table for your data and experiment design.
I haven't figure out how to generate a table like this in MATLAB, so I'm using a different tool.
BTW, the attached file has data for 11 participants, not 16 as stated in your question.

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Adam Danz
Adam Danz am 3 Nov. 2019
Bearbeitet: Adam Danz am 16 Apr. 2021

5 Stimmen

Is repeated measures anova the best way to go?
Your design is certainly applicable to a repeated measures design but the answer to that question depends on what you're testing.
It looks like you want to perform a repeated measures ANOVA with a covariate. In addition to patient number, the treatment type is the covariate, and the 6 time points is the within-subject repeated measure.
There are 4 null hypotheses to look at:
  • Patient number has no effect on the population mean
  • Treatment type has no effect on the population mean
  • There is no interaction between patient number and treatment type (usually the most important question)
  • There is no 3-way interaction between patient number, treatment type, and treatment time.
Setting up a RM Anova with a convariate in Matlab
Your data are attached in a file named Q2unlocked.xlsx. The following lines set up a RM Anova with a covariate (written in r2019b).
% Read in the table
Q2 = readtable('Q2unlocked.xlsx');
% Replace the 6 measurment headers
Q2.Properties.VariableNames(3:8) = {'t1', 't2', 't3', 't4', 't5', 't6'};
% The within-subjects design
withinDsgn = table((1:6)','VariableNames',{'Time'});
% Run the RM Anova (note the interaction between PATIENT and TREAT!)
rm = fitrm(Q2, 't1-t6~PATIENT*TREAT', 'WithinDesign', withinDsgn);
Also see a similar example provided by Matlab.
Testing for assumptions
Before we look at the results, you must make sure your data are appropriate for a RM Anova by confirming the following assumptions.
  1. Independent observations: You collected the data randomly and without bias
  2. Normality: The measurments have an approximately normal distribution
  3. Sphericity: we'll use the Mauchly test.
% Are the data approximately normal? This should ideally be done for each factor
histogram(reshape(Q2{:,3:end},[],1))
A small rightward tail but not too bad.
% Does the data pass the sphericity test?
rm.mauchly
ans = 1×4 table
W ChiStat DF pValue _______ _______ __ _______ 0.48842 11.537 14 0.64344
Yes: a pValue < 0.05 would fail
RM Anova results
ranova(rm) or rm.ranova produce the results table showing the p-value and then 3 adjusted p-values depending on whether the the response variables have different variances (symmetry assumption).
% Produce ranova table
rm.ranova
ans = 5×8 table
SumSq DF MeanSq F pValue pValueGG pValueHF pValueLB __________ __ __________ ______ ________ ________ ________ ________ (Intercept):Time 3.9654e+06 5 7.9308e+05 2.2798 0.053221 0.071668 0.053221 0.14843 PATIENT:Time 4.0894e+06 5 8.1788e+05 2.3511 0.047012 0.064753 0.047012 0.14259 TREAT:Time 3.2657e+06 5 6.5314e+05 1.8775 0.10606 0.12652 0.10606 0.18747 PATIENT:TREAT:Time 3.9524e+06 5 7.9048e+05 2.2723 0.053916 0.072433 0.053916 0.14905 Error(Time) 3.1309e+07 90 3.4788e+05
Row 1 representing all differences across the within-subjects factors (treatment times). It has a p value of 0.053 which is just beyond the socially accepted threshold of 0.05. A boxplot will show us if this value seems reasonable.
figure()
boxplot(Q2{:,3:end})
xlabel('Treatment times')
ylabel('measurement value')
title('Data pooled between all Patients and treatment factors')
It doesn't look like there's much of an effect of treatment time when factors are combined
Row 2 shows the interaction between Patient and treatment times and is p=0.047
bpdata = [];
for i = 1:max(Q2.PATIENT) %assuming patient numbers are 1:max
bpdata = [bpdata, Q2{Q2.PATIENT==i,3:8},nan(size(unique(Q2.TREAT)))];
end
figure()
boxplot(bpdata)
arrayfun(@xline,7:7:size(bpdata,2))
xlabel('6 treatment times across 11 patients')
ylabel('measurement value')
title('Data pooled between treatment factors')
set(gca,'XTick', [])
Clearly the difference between the 6 treatment times differs across subjects (Treatment type combined)
Row 3 shows the interaction between treatment type and treatment times and is p=0.106.
bpdata = [];
for i = 1:max(Q2.TREAT) %assuming TREAT numbers are 1:max
bpdata = [bpdata, Q2{Q2.TREAT==i,3:8},nan(size(unique(Q2.PATIENT)))];
end
figure()
boxplot(bpdata)
xline(7)
xlabel('6 treatment times across 2 TREAT factors')
ylabel('measurement value')
title('Data pooled between patients')
set(gca,'XTick', [])
Other than the 2nd treatment factor there is no difference between the 2 Treatment types.
Row 4 shows the 3-way interaction between patient, treatment type, and treatment times.

14 Kommentare

Giulia Soresini
Giulia Soresini am 3 Nov. 2019
Thanks a lot!
I tried running the code and it gives me this error:
Unable to use a value of type cell as an index.
Error in HW5_Q2 (line 3)
withinDsgn = table((1:6)','VariableNames',{'Time'});
Any suggestions?
Giulia Soresini
Giulia Soresini am 3 Nov. 2019
No worries,
it worked, I had another variable called table that messed up.
Thanks a lot again
Adam Danz
Adam Danz am 3 Nov. 2019
Bearbeitet: Adam Danz am 3 Nov. 2019
Glad I could help.
I just updated the answer to clean up some of my terminology. Your "TREAT" variable is a covariate so my previous terminology (2-way) was misleading. None of the code was changed.
BTW, never name a variable "table" since that overrides the table() function which is a commonly used function, especially in statistics tests.
deejt
deejt am 16 Apr. 2021
Thank you so much @Adam Danz for this incredible explanation!!
I have one question though, what factors do you mean with this ? thanks!
% Are the data approximately normal? This should ideally be done for each factor
histogram(reshape(Q2{:,3:end},[],1))
Adam Danz
Adam Danz am 16 Apr. 2021
Thanks, @deenujt. The factors are the columns of data in Q2{:,3:end}.
deejt
deejt am 27 Apr. 2021
Thank you @Adam Danz!
I have one more question: if your data does not pass the test of sphericity, do you then have to always the corrected p-values from the rm table? I also calculated the epsilon, but how exactly do I have to combine the results from the epsilon correction to the rm table and the reporting of p- values?
% Produce ranova table: rm.ranova table
rm.mauchly
Scott MacKenzie
Scott MacKenzie am 27 Apr. 2021
Bearbeitet: Scott MacKenzie am 27 Apr. 2021
The Anova table in Adam's reponse can't be correct. There were 2 levels of the Treatment independent variable, so the first df with the F-statistic for the Treatment main effect is 1 (not 5 as shown in Adam's table).
@Scott MacKenzie the repeated measures design in my answer is
rm = fitrm(Q2, 't1-t6~PATIENT*TREAT', 'WithinDesign', withinDsgn);
which contains 6 repeated responses (t1-t6) and two predictors with an interaction term. That's why there are 5 df.
Adam Danz
Adam Danz am 27 Apr. 2021
@deejt, to be honest, I avoid parametric statistics largely for this exact reason and it's becoming a growing trend to use non-parametric tests whenever possible.
See wiki for what to do when sphericity tests fail.
A nonparametric alternative to the rm anova: Friedman test.
For a short list of academic journal articles on why we should show using statistical significance testing, see the end of this answer.
Scott MacKenzie
Scott MacKenzie am 27 Apr. 2021
Bearbeitet: Scott MacKenzie am 28 Apr. 2021
Adam, sorry, but concerning your comment on the degrees of freedom, this is simply wrong. Remember, we are talking about the main effect of Treatment in Giulia's data set. Treatment is the primary independent variable. Determining whether or not it has a signficant effect on the dependent variable is the main reason for doing the analysis of variance in the first place. But, your Anova table provides no insight on this issue.
It is a simple fact that the first degree of freedom in the F-statistic for a main effect is n-1 where n is the number of conditions or levels in the independent variable. Just look in the Anova table I provide above or in any statistics textbook. Treatment is an independent variable with two levels, so its df is 1. But, your Anova table has no line entry with df = 1. The only reasonable conclusion is that there is either no main effect for Treatment given in your table, or, if there is, it is wrong.
This is probably the wrong place for dialog like this. Sorry. I think I'll going to put together a separate question on this point, as I, like Giulia, am interested in using MATLAB for an analysis of variance on data obtained in a factorial experiment.
Adam Danz
Adam Danz am 28 Apr. 2021
Bearbeitet: Adam Danz am 28 Apr. 2021
Hi Scott I'm not a statitics expert and have met very few people who are. Not many people here answer stats questions so your commentary is welcomed.
The repeated measures ANOVA table that Matlab generates (not me) is a produced by ranova and is similar to the examples from the documentation here and here. You can review the Wilkinson Notation here. The model includes a patient term, a treatment group term, and an interaction term between patient and treatment group. The table it produces includes a term representing all differences across the within-subjects factors (time) and terms for all interactions between the within-subject terms and all between-subject model terms, as explained in the ranova documentation.
The df for the Time term should be 5 since there are 6 measurements. The df for treatment*time interaction should also be 5 since there are 2 treatment levels and 5*1=5. But I don't know why the patient*time df is 5 in the table. I'd have to go through the code to figure it out since the documentation on statistical functions in Matlab tends to be scant.
These data could also be reshaped to perform an n-way analysis of variance using anovan which contains the df you were expecting.
% Reshape data
data = readmatrix('Q2unlocked.xlsx');
M = repmat(data(:,1:2),6,1);
M(:,end+1) = repelem((1:6)',size(data,1),1);
rmeas = data(:,3:end);
M(:,end+1) = rmeas(:);
[p, tbl] = anovan(M(:,end),{M(:,1), M(:,2), M(:,3)},'model','interaction',...
'varnames',{'Participants','Treatment','Time'})
Although, I'm sure the model used to generate the table in your comment differed. Several people in the past have asked why anova tables differ between software and those questions rarely get solved. I don't advise anyone to use statistical significance testing to begin with so I only dive in when needed.
Lastly, you bring up a good point that this model may not be the one that answers the OP's question. Perhaps specifying the within-groups model when calling ranova is the better choice. Instead of rm.ranova which is the same as ranova(rm),
ranova(rm,'WithinModel','Time')
which results in
10×8 table
SumSq DF MeanSq F pValue pValueGG pValueHF pValueLB
__________ __ __________ ________ ________ ________ ________ ________
(Intercept) 2.9816e+07 1 2.9816e+07 6.914 0.017011 0.017011 0.017011 0.017011
PATIENT 2.4972e+06 1 2.4972e+06 0.57908 0.45653 0.45653 0.45653 0.45653
TREAT 1.6441e+05 1 1.6441e+05 0.038124 0.84738 0.84738 0.84738 0.84738
PATIENT:TREAT 9.5387e+05 1 9.5387e+05 0.22119 0.64378 0.64378 0.64378 0.64378
Error 7.7623e+07 18 4.3124e+06
(Intercept):Time 2.1085e+07 1 2.1085e+07 5.7044 0.028088 0.028088 0.028088 0.028088
PATIENT:Time 1.8057e+06 1 1.8057e+06 0.48853 0.49352 0.49352 0.49352 0.49352
TREAT:Time 5.4909e+05 1 5.4909e+05 0.14856 0.70443 0.70443 0.70443 0.70443
PATIENT:TREAT:Time 8.4605e+05 1 8.4605e+05 0.2289 0.6381 0.6381 0.6381 0.6381
Error(Time) 6.6531e+07 18 3.6962e+06
Scott MacKenzie
Scott MacKenzie am 28 Apr. 2021
Adam. Thanks for your thoughful comments. Yes, it is sometimes perplexing that different stats packages give different results for the same data. The table you provide using anovan comes very close to matching the Anova table I provided. In fact all the mean squares are identical, which is certainly good. Oddly enough, the F-statistics differ -- not good. In my table, the Treatment effect is not significant F(1,10) = 1.131, p > .05). The F-statistic (1.131) is the ratio MS-treatment / MS-treatment_by_participant. It seems anovan uses a different ratio: MS-treatment / MS-error. I don't believe that's correct, considering that the factor is within-subjects not between-subjects. This also means that the second df would be 50 (from the error MS line), not 10. Again, I don't think that's right. I ran these data through JMP (from SAS) and got
Here, we see the F-statistics agree with my table but not with MATLAB's anovan table. Hmmm, interesting.
Scott MacKenzie
Scott MacKenzie am 29 Apr. 2021
Bearbeitet: Scott MacKenzie am 29 Apr. 2021
Problem solved!
T = readtable('q2_data.xlsx');
T.Properties.VariableNames = {'T1_TP1', 'T1_TP2', 'T1_TP3', 'T1_TP4', 'T1_TP5', 'T1_TP6', ...
'T2_TP1', 'T2_TP2', 'T2_TP3', 'T2_TP4', 'T2_TP5', 'T2_TP6' };
withinDesign = table([1 1 1 1 1 1 2 2 2 2 2 2]',[1:6 1:6]','VariableNames',{'Treatment','TimePeriod'});
withinDesign.Treatment = categorical(withinDesign.Treatment);
withinDesign.TimePeriod = categorical(withinDesign.TimePeriod);
rm = fitrm(T, 'T1_TP1-T2_TP6~1', 'WithinDesign', withinDesign);
ranova(rm, 'WithinModel', 'Treatment*TimePeriod')
This script will generate an Anova table with the exact same SS, MS, df, F-statistics, and p-values as in the table above and in the table in my first comment on this question. For this, I rearranged the data originally posted by Giulia into an 11x12 matrix. See the attached spreadsheet for the rearranged data and some explanatory comments.
Full disclosure. The script above is not mine. It's a slightly modified verion of a script provided by Jeff Miller in response to a question I posted yesterday.
Adam Danz
Adam Danz am 29 Apr. 2021
Thanks for the update, Scott. I saw Jeff's solution earlier today and also noticed the difference in withinDesign. I look forward to examining it closer and perhaps updating my answer when I have a chance to do so, shortly.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by