Fit linear regression model
specifies additional options using one or more name-value pair arguments. For
example, you can specify which variables are categorical, perform robust
regression, or use observation weights.mdl
= fitlm(___,Name,Value
)
To access the model properties of the LinearModel
object
mdl
, you can use dot notation. For example,
mdl.Residuals
returns a table of the raw, Pearson,
Studentized, and standardized residual values for the model.
After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder™. For details, see Introduction to Code Generation.
The main fitting algorithm is QR decomposition. For robust fitting, fitlm
uses
M-estimation to formulate estimating equations and solves them using the method of Iteratively Reweighted Least Squares (IRLS).
fitlm
treats a categorical predictor as follows:
A model with a categorical predictor that has L levels
(categories) includes L – 1 indicator variables. The model uses the first category as a
reference level, so it does not include the indicator variable for the reference
level. If the data type of the categorical predictor is
categorical
, then you can check the order of categories
by using categories
and reorder the
categories by using reordercats
to customize the
reference level. For more details about creating indicator variables, see Automatic Creation of Dummy Variables.
fitlm
treats the group of L – 1 indicator variables as a single variable. If you want to treat
the indicator variables as distinct predictor variables, create indicator
variables manually by using dummyvar
. Then use the
indicator variables, except the one corresponding to the reference level of the
categorical variable, when you fit a model. For the categorical predictor
X
, if you specify all columns of
dummyvar(X)
and an intercept term as predictors, then the
design matrix becomes rank deficient.
Interaction terms between a continuous predictor and a categorical predictor with L levels consist of the element-wise product of the L – 1 indicator variables with the continuous predictor.
Interaction terms between two categorical predictors with L and M levels consist of the (L – 1)*(M – 1) indicator variables to include all possible combinations of the two categorical predictor levels.
You cannot specify higher-order terms for a categorical predictor because the square of an indicator is equal to itself.
fitlm
considers
NaN
, ''
(empty character vector),
""
(empty string), <missing>
, and
<undefined>
values in tbl
,
X
, and Y
to be missing values.
fitlm
does not use observations with missing values in the fit.
The ObservationInfo
property of a fitted model indicates whether or not
fitlm
uses each observation in the fit.
For reduced computation time on high-dimensional data sets, fit a linear regression model using the fitrlinear
function.
To regularize a regression, use fitrlinear
, lasso
, ridge
, or plsregress
.
fitrlinear
regularizes a regression
for high-dimensional data sets using lasso or ridge regression.
lasso
removes redundant
predictors in linear regression using lasso or elastic net.
ridge
regularizes a regression with
correlated terms using ridge regression.
plsregress
regularizes a
regression with correlated terms using partial least squares.
[1] DuMouchel, W. H., and F. L. O'Brien. “Integrating a Robust Option into a Multiple Regression Computing Environment.” Computer Science and Statistics: Proceedings of the 21st Symposium on the Interface. Alexandria, VA: American Statistical Association, 1989.
[2] Holland, P. W., and R. E. Welsch. “Robust Regression Using Iteratively Reweighted Least-Squares.” Communications in Statistics: Theory and Methods, A6, 1977, pp. 813–827.
[3] Huber, P. J. Robust Statistics. Hoboken, NJ: John Wiley & Sons, Inc., 1981.
[4] Street, J. O., R. J. Carroll, and D. Ruppert. “A Note on Computing Robust Regression Estimates via Iteratively Reweighted Least Squares.” The American Statistician. Vol. 42, 1988, pp. 152–154.