Multicolinearity/Regression/PCA and choice of optimal model (2nd try)
4 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hi there
I have a set of results and 4 "candidate" explanatory variables. Those variables are correlated beteween each other (only two of them are not correlated with one another). What I want to figure out is wich one(s) of them is (are) the best at explaining the results.
I understand stepwise regression is screwed by the multicolinearity (I tried to run it and it all went fine until I tried to put the interactions in the mix)
I tried an ANOVA, two of them are significant, but I get NaNs when I ask about interactions.
I tried to run a PCA among all the explanatory variables but 1) i don't understand how the PCA isnt concerned with the results I am trying to explain and 2) I don't understand the results I am getting with pcacov: what do those coefficients in the matrix mean ? How am I supposed to rank the variables ?
Does it make sense ? Thank you very much ps: i also learned about the Akaike information cirterium but i am unsure how this would apply here. I hope something more simple could help me because it feels like trying to crush a fly with a bulldozer
0 Kommentare
Akzeptierte Antwort
Richard Willey
am 13 Apr. 2012
I'm attaching some code that might provide helpful
I also have a two part blog posting on this same subject that provides a bit more depth...
%%Introduction to using LASSO
% This demo explains how to start using the lasso functionality introduced
% in R2011b. It is motivated by an example in Tibshirani’s original paper
% on the lasso.
% Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
% J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288).
% The data set that we’re working with in this demo is a wide
% dataset with correlated variables. This data set includes 8 different
% variables and only 20 observations. 5 out of the 8 variables have
% coefficients of zero. These variables have zero impact on the model. The
% other three variables have non negative values and impact the model
%%Clean up workspace and set random seed
clear all
clc
% Set the random number stream
rng(1981);
%%Creating data set with specific characteristics
% Create eight X variables
% The mean of each variable will be equal to zero
mu = [0 0 0 0 0 0 0 0];
% The variable are correlated with one another
% The covariance matrix is specified as
i = 1:8;
matrix = abs(bsxfun(@minus,i',i));
covariance = repmat(.5,8,8).^matrix;
% Use these parameters to generate a set of multivariate normal random numbers
X = mvnrnd(mu, covariance, 20);
% Create a hyperplane that describes Y = f(X)
Beta = [3; 1.5; 0; 0; 2; 0; 0; 0];
ds = dataset(Beta);
% Add in a noise vector
Y = X * Beta + 3 * randn(20,1);
%%Use linear regression to fit the model
b = regress(Y,X);
ds.Linear = b;
%%Use a lasso to fit the model
[B Stats] = lasso(X,Y, 'CV', 5);
disp(B)
disp(Stats)
%%Create a plot showing MSE versus lamba
lassoPlot(B, Stats, 'PlotType', 'CV')
%%Identify a reasonable set of lasso coefficients
% View the regression coefficients associated with Index1SE
ds.Lasso = B(:,Stats.Index1SE);
disp(ds)
Create a plot showing coefficient values versus L1 norm
lassoPlot(B, Stats)
Run a Simulation
% Preallocate some variables
MSE = zeros(100,1);
mse = zeros(100,1);
Coeff_Num = zeros(100,1);
Betas = zeros(8,100);
cv_Reg_MSE = zeros(1,100);
for i = 1 : 100
X = mvnrnd(mu, covariance, 20);
Y = X * Beta + randn(20,1);
[B Stats] = lasso(X,Y, 'CV', 5);
Shrink = Stats.Index1SE - ceil((Stats.Index1SE - Stats.IndexMinMSE)/2);
Betas(:,i) = B(:,Shrink) > 0;
Coeff_Num(i) = sum(B(:,Shrink) > 0);
MSE(i) = Stats.MSE(:, Shrink);
regf = @(XTRAIN, ytrain, XTEST)(XTEST*regress(ytrain,XTRAIN));
cv_Reg_MSE(i) = crossval('mse',X,Y,'predfun',regf, 'kfold', 5);
end
Number_Lasso_Coefficients = mean(Coeff_Num);
disp(Number_Lasso_Coefficients)
MSE_Ratio = median(cv_Reg_MSE)/median(MSE);
disp(MSE_Ratio)
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Gaussian Process Regression finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!