Regression with several dummy variables

3 Ansichten (letzte 30 Tage)
Maria
Maria am 17 Aug. 2014
Bearbeitet: dpb am 18 Aug. 2014
I have a cell type variable with 20000 rows and 700 columns. I present here an example of the first 9 columns:
C1 C2 C3 C4 C5 C6 C7 C8 C9
A={ 0 0 0 13 16 11 17 26 12 %row 1 is irrelevant
12 0 0 1 0 0 0 0 0
13 0 0 0 1 0 0 0 0
16 0 0 0 0 1 0 0 0
18 0 0 0 0 0 1 0 0
26 0 0 1 0 0 0 0 0
41 0 0 0 0 0 0 1 0}
I am trying to perform a regression.
C1 is simply and ID code; C2 is my binary dependent variable y. C3 is a dummy variable x (the elements, 0 or 1, are numbers), whose coefficient β (and if possible standard deviation) I want to interpret. From C4 onwards I have dummy variables (here the elements, 0 or 1, are logicals) that I also want to include in my regression to control for certain effects.
I most likely should use fitlm or regress functions but I am not being successful. Can someone help me? Thank you very much.
  2 Kommentare
dpb
dpb am 17 Aug. 2014
Sounds hopeless, nearly, with the number of variables, but guess you'll not know until actually try.
First, are the dummy variables coded to be independent? That is, are the large number having come from fewer variables but of different levels or are they actually all separate effects?
Maria
Maria am 17 Aug. 2014
The large numbers did come from fewer variables but of different levels.

Melden Sie sich an, um zu kommentieren.

Antworten (1)

dpb
dpb am 17 Aug. 2014
Bearbeitet: dpb am 17 Aug. 2014
Given the response to the previous question, should be just
y=A{1}(2:end,2); % y response variable
x=A{1}{2:end,3:end}; x=[ones(size(x,1),1 x]; % predictor variables plus constant term
[b,bint,~,~,stats] = regress(y,x);
As said, all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
  6 Kommentare
Maria
Maria am 18 Aug. 2014
It gives this error: Warning: X is rank deficient to within machine precision. > In regress at 84. Do you know why? Thanks
dpb
dpb am 18 Aug. 2014
Bearbeitet: dpb am 18 Aug. 2014
Yep...as suspected would be the case given the number of dummy variables, at least one column is the same as another. It'll be very difficult to find an encoding that won't lead to the problem I'd guess.
You can always try
rank(x)
to get an estimate of how many problems you have...
I repeat the final synopsis from my initial answer --
...all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
It's that last phrase about being independent that's the rub.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Descriptive Statistics finden Sie in Help Center und File Exchange

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by