Accelerating the pace of engineering and science

# lassoglm

Lasso or elastic net regularization for generalized linear model regression

## Syntax

B = lassoglm(X,Y)
[B,FitInfo] = lassoglm(X,Y)
[B,FitInfo] = lassoglm(X,Y,distr)
[B,FitInfo] = lassoglm(X,Y,distr,Name,Value)

## Description

B = lassoglm(X,Y) returns penalized maximum-likelihood fitted coefficients for a generalized linear model of the response Y to the data matrix X. Y are assumed to have a Gaussian probability distribution.

[B,FitInfo] = lassoglm(X,Y) returns a structure containing information about the fits.

[B,FitInfo] = lassoglm(X,Y,distr) fits the model using the probability distribution type for Y as specified in distr.

[B,FitInfo] = lassoglm(X,Y,distr,Name,Value) fits regularized generalized linear regressions with additional options specified by one or more Name,Value pair arguments.

## Input Arguments

 X Numeric matrix with n rows and p columns. Each row represents one observation, and each column represents one predictor (variable). Y When distr is not 'binomial', Y is a numeric vector or categorical array of length n, where n is the number of rows of X. Y(i) is the response to row i of X. When distr is 'binomial', Y is either a: Numeric vector of length n, where each entry represents success (1) or failure (0)Logical vector of length n, where each entry represents success or failureCategorical array of length n, where each entry represents success or failureTwo column numeric matrix, where the first column contains the number of successes for each observation, and the second column contains the total number of trials distr Distributional family for the nonsystematic variation in the responses, a string. Choices: 'normal''binomial''poisson''gamma''inverse gaussian' By default, lassoglm uses the canonical link function corresponding to distr. Specify another link function using the 'link' name-value pair.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Alpha'

Scalar value from 0 to 1 (excluding 0) representing the weight of lasso (L1) versus ridge (L2) optimization. Alpha = 1 represents lasso regression, and other values represent elastic net optimization. Alpha close to 0 approaches ridge regression. See Definitions.

Default: 1

'CV'

Method lassoglm uses to estimate deviance:

• K, a positive integer — lassoglm uses K-fold cross validation.

• cvp, a cvpartition object — lassoglm uses the cross-validation method expressed in cvp. You cannot use a 'leaveout' partition with lassoglm.

• 'resubstitution'lassoglm uses X and Y to fit the model and to estimate the deviance, without cross validation.

Default: 'resubstitution'

'DFmax'

Maximum number of nonzero coefficients in the model. lassoglm returns results for Lambda values that satisfy this criterion.

Default: Inf

'Lambda'

Vector of nonnegative Lambda values. See Lasso.

• If you do not supply Lambda, lassoglm estimates the largest value of Lambda that gives a nonnull model. In this case, LambdaRatio gives the ratio of the smallest to the largest value of the sequence, and NumLambda gives the length of the vector.

• If you supply Lambda, lassoglm ignores LambdaRatio and NumLambda.

Default: Geometric sequence of NumLambda values, the largest just sufficient to produce B = 0

'LambdaRatio'

Positive scalar, the ratio of the smallest to the largest Lambda value when you do not explicitly set Lambda.

If you set LambdaRatio = 0, lassoglm generates a default sequence of Lambda values, and replaces the smallest one with 0.

Default: 1e-4

Specify the mapping between the mean µ of the response and the linear predictor Xb.

ValueDescription
'comploglog'

log( –log((1–µ))) = Xb

'identity', default for the distribution 'normal'

µ = Xb

'log', default for the distribution 'poisson'

log(µ) = Xb

'logit', default for the distribution 'binomial'

log(µ/(1 – µ)) = Xb

'loglog'

log( –log(µ)) = Xb

'probit'

Φ–1(µ) = Xb, where Φ is the normal (Gaussian) CDF function

'reciprocal', default for the distribution 'gamma'

µ–1 = Xb

p (a number), default for the distribution 'inverse gaussian' (with p = –2)

µp = Xb

Cell array of the form {FL FD FI}, containing three function handles, created using @, that define the link (FL), the derivative of the link (FD), and the inverse link (FI). Equivalently, can be a structure of function handles with field Link containing FL, field Derivative containing FD, and field Inverse containing FI.

User-specified link function (see Custom Link Function)

'MCReps'

Positive integer, the number of Monte Carlo repetitions for cross validation.

• If CV is 'resubstitution' or a cvpartition of type 'resubstitution', MCReps must be 1.

• If CV is a cvpartition of type 'holdout', MCReps must be greater than 1.

Default: 1

'NumLambda'

Positive integer, the number of Lambda values lassoglm uses when you do not set Lambda. lassoglm can return fewer than NumLambda fits if the deviance of the fits drops below a threshold fraction of the null deviance (deviance of the fit without any predictors X).

Default: 100

'Offset'

Numeric vector with the same number of rows as X. lassoglm uses Offset as an additional predictor variable, but keeps its coefficient value fixed at 1.0.

'Options'

Structure that specifies whether to cross validate in parallel, and specifies the random stream or streams. Create the Options structure with statset. Option fields:

• UseParallel — Set to true to compute in parallel. Default is false.

• UseSubstreams — Set to true to compute in parallel in a reproducible fashion. To compute reproducibly, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'. Default is false.

• StreamsRandStream object or cell array consisting of one such object. If you do not specify Streams, lassoglm uses the default stream.

'PredictorNames'

Cell array of strings representing names of the predictor variables, in the order in which they appear in X.

Default: {}

'RelTol'

Convergence threshold for the coordinate descent algorithm (see Friedman, Tibshirani, and Hastie [3]). The algorithm terminates when successive estimates of the coefficient vector differ in the L2 norm by a relative amount less than RelTol.

Default: 1e-4

'Standardize'

Boolean value specifying whether lassoglm scales X before fitting the models.

Default: true

'Weights'

Observation weights, a nonnegative vector of length n, where n is the number of rows of X. At least two values must be positive.

Default: 1/n * ones(n,1)

## Output Arguments

B

Fitted coefficients, a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values.

FitInfo

Structure containing information about the model fits.

Field in FitInfoDescription
AlphaValue of Alpha parameter, a scalar.
DevianceDeviance of the fitted model for each value of Lambda, a 1-by-L vector.
If cross validation was performed, the values for Deviance represent the estimated expected deviance of the model applied to new data, as calculated by cross validation. Otherwise, Deviance is the deviance of the fitted model applied to the data used to perform the fit.
DFNumber of nonzero coefficients in B for each Lambda value, a 1-by-L vector.
InterceptIntercept term β0 for each linear model, a 1-by-L vector.
LambdaLambda parameters in ascending order, a 1-by-L vector.

If you set the CV name-value pair to cross validate, the FitInfo structure contains additional fields.

Field in FitInfoDescription
IndexMinDevianceIndex of Lambda with value LambdaMinDeviance, a scalar.
Index1SEIndex of Lambda with value Lambda1SE, a scalar.
LambdaMinDevianceLambda value with minimum expected deviance, as calculated by cross validation, a scalar.
Lambda1SELargest Lambda such that Deviance is within one standard error of the minimum, a scalar.
SEStandard error of Deviance for each Lambda, as calculated during cross validation, a 1-by-L vector.

## Examples

expand all

### Lasso Regularization of a Generalized Linear Model

Construct data from a Poisson model, and identify the important predictors using lassoglm.

Create data with 20 predictors, and Poisson responses using just three of the predictors, plus a constant.

rng('default') % for reproducibility
X = randn(100,20);
mu = exp(X(:,[5 10 15])*[.4;.2;.3] + 1);
y = poissrnd(mu);

Construct a cross-validated lasso regularization of a Poisson regression model of the data.

[B FitInfo] = lassoglm(X,y,'poisson','CV',10);

Examine the cross-validation plot to see the effect of the Lambda regularization parameter.

lassoPlot(B,FitInfo,'plottype','CV');

The green circle and dashed line locate the Lambda with minimal cross-validation error. The blue circle and dashed line locate the point with minimal cross-validation error plus one standard deviation.

Find the nonzero model coefficients corresponding to the two identified points.

minpts = find(B(:,FitInfo.IndexMinDeviance))
minpts =

3
5
6
10
11
15
16
min1pts = find(B(:,FitInfo.Index1SE))
min1pts =

5
10
15

The coefficients from the minimal plus one standard error point are exactly those coefficients used to create the data.

expand all

A link function f(μ) maps a distribution with mean μ to a linear model with data X and coefficient vector b using the formula

f(μ) = Xb.

Find the formulas for the link functions in the Link name-value pair description. Here, "typical" means a link function that is typically used for the listed distribution.

Distributional FamilyLink Function (typical, {default})
'normal'{'identity'}
'binomial''comploglog', 'loglog', 'probit', {'logit'}
'poisson'{'log'}
'gamma'{'reciprocal'}
'inverse gaussian'{-2}

### Lasso

For a nonnegative value of λ, lasso solves the problem

$\underset{{\beta }_{0},\beta }{\mathrm{min}}\left(\frac{1}{N}\text{Deviance}\left({\beta }_{0},\beta \right)+\lambda \sum _{j=1}^{p}|{\beta }_{j}|\right),$

where

• Deviance is the deviance of the model fit to the responses using intercept β0 and predictor coefficients β. The formula for Deviance depends on the distr parameter you supply to lassoglm. Minimizing the λ-penalized deviance is equivalent to maximizing the λ-penalized log likelihood.

• N is the number of observations.

• λ is a nonnegative regularization parameter corresponding to one value of Lambda.

• Parameters β0 and β are scalar and p-vector respectively.

As λ increases, the number of nonzero components of β decreases.

The lasso problem involves the L1 norm of β, as contrasted with the elastic net algorithm.

### Elastic Net

For an α strictly between 0 and 1, and a nonnegative λ, elastic net solves the problem

$\underset{{\beta }_{0},\beta }{\mathrm{min}}\left(\frac{1}{N}\text{Deviance}\left({\beta }_{0},\beta \right)+\lambda {P}_{\alpha }\left(\beta \right)\right),$

where

${P}_{\alpha }\left(\beta \right)=\frac{\left(1-\alpha \right)}{2}{‖\beta ‖}_{2}^{2}+\alpha {‖\beta ‖}_{1}=\sum _{j=1}^{p}\left(\frac{\left(1-\alpha \right)}{2}{\beta }_{j}^{2}+\alpha |{\beta }_{j}|\right).$

Elastic net is the same as lasso when α = 1. For other values of α, the penalty term Pα(β) interpolates between the L1 norm of β and the squared L2 norm of β. As α shrinks toward 0, elastic net approaches ridge regression.

## References

[1] Tibshirani, R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society, Series B, Vol. 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320, 2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, Vol. 33, No. 1, 2010. http://www.jstatsoft.org/v33/i01

[4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd edition. Springer, New York, 2008.

[5] Dobson, A. J. An Introduction to Generalized Linear Models, 2nd edition. Chapman & Hall/CRC Press, New York, 2002.

[6] McCullagh, P., and J. A. Nelder. Generalized Linear Models, 2nd edition. Chapman & Hall/CRC Press, New York, 1989.

[7] Collett, D. Modelling Binary Data, 2nd edition. Chapman & Hall/CRC Press, New York, 2003.