FeatureSelectionNCARegression
Feature selection for regression using neighborhood component analysis (NCA)
Description
FeatureSelectionNCARegression contains the data,
                        fitting information, feature weights, and other model parameters of a
                        neighborhood component analysis (NCA) model. fsrnca learns the feature
                        weights using a diagonal adaptation of NCA and returns an instance of
                                FeatureSelectionNCARegression object. The function
                        achieves feature selection by regularizing the feature weights.
                        
Creation
Create a FeatureSelectionNCARegression object using fsrnca.
Properties
NCA Properties
This property is read-only.
Model parameters used for training the model, specified as a structure.
You can access the fields of ModelParameters using dot
            notation.
For example, for a FeatureSelectionNCARegression object named mdl, you can access the
                LossFunction value using
                mdl.ModelParameters.LossFunction.
Data Types: struct
This property is read-only.
Regularization parameter used for training this model, specified as a scalar. For
                n observations, the best Lambda value that
            minimizes the generalization error of the NCA model is expected to be a multiple of
                1/n.
Data Types: double
This property is read-only.
Name of the fitting method used to fit this model, specified as one of the following:
- 'exact'— Perform fitting using all of the data.
- 'none'— No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call to- fsrnca.
- 'average'— The software divides the data into partitions (subsets), fits each partition using the- exactmethod, and returns the average of the feature weights. You can specify the number of partitions using the- NumPartitionsname-value argument.
This property is read-only.
Name of the solver used to fit this model, specified as one of the following:
- 'lbfgs'— Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm
- 'sgd'— Stochastic gradient descent (SGD) algorithm
- 'minibatch-lbfgs'— stochastic gradient descent with LBFGS algorithm applied to mini-batches
This property is read-only.
Relative convergence tolerance on the gradient norm for the 'lbfgs'
            and 'minibatch-lbfgs' solvers, specified as a positive scalar
            value.
Data Types: double
This property is read-only.
Maximum number of iterations for optimization, specified as a positive integer value.
Data Types: double
This property is read-only.
Maximum number of passes for 'sgd' and
                'minibatch-lbfgs' solvers, specified as a positive integer.
            Every pass processes all
            of the observations in the data.
Data Types: double
This property is read-only.
Initial learning rate for
                                                  'sgd' and
                                                  'minibatch-lbfgs' solvers,
                                                  specified as a positive real scalar.
                                                  The
                                                  learning rate decays over iterations starting at
                                                  the value specified for
                                                  InitialLearningRate.
Use the
                                                  NumTuningIterations and
                                                  TuningSubsetSize to control
                                                  the automatic tuning of initial learning rate in
                                                  the call to fsrnca.
Data Types: double
This property is read-only.
Verbosity level indicator, specified as a nonnegative integer. Possible values are:
- 0 — No convergence summary 
- 1 — Convergence summary, including norm of gradient and objective function value 
- >1 — More convergence information, depending on the fitting algorithm. When you use the - 'minibatch-lbfgs'solver and verbosity level > 1, the convergence information includes the iteration log from intermediate mini-batch LBFGS fits.
Data Types: double
This property is read-only.
Initial feature weights, specified as a p-by-1 vector of positive
            real scalars, where p is the number of predictors in
                X. For more information about feature weights, see Neighborhood Component Analysis (NCA) Feature Selection.
Data Types: double
This property is read-only.
Feature weights, specified as a
                                                  p-by-1 numeric vector or a
                                                  p-by-m
                                                  numeric matrix, where p is the
                                                  number of predictor variables after dummy
                                                  variables are created for categorical variables
                                                  (for more details, see
                                                  ExpandedPredictorNames).
If FitMethod is
                                                  'average', then
                                                  FeatureWeights is a
                                                  p-by-m
                                                  matrix. m is the number of
                                                  partitions specified via the
                                                  NumPartitions name-value
                                                  argument in the call to
                                                  fsrnca.
The absolute value of
                                                  FeatureWeights(k) is a measure
                                                  of the importance of predictor
                                                  k. A
                                                  FeatureWeights(k) value that is
                                                  close to 0 indicates that predictor
                                                  k does not influence the
                                                  response in Y. For more
                                                  information about feature weights, see Neighborhood Component Analysis (NCA) Feature Selection.
Data Types: double
This property is read-only.
Fit information, specified as a structure with the following fields.
| Field Name | Meaning | 
|---|---|
| Iteration | Iteration index | 
| Objective | Regularized objective function for minimization | 
| UnregularizedObjective | Unregularized objective function for minimization | 
| Gradient | Gradient of regularized objective function for minimization | 
- For classification, - UnregularizedObjectiverepresents the negative of the leave-one-out accuracy of the NCA classifier on the training data.
- For regression, - UnregularizedObjectiverepresents the leave-one-out loss between the true response and the predicted response when using the NCA regression model.
- For the - 'lbfgs'solver,- Gradientis the final gradient. For the- 'sgd'and- 'minibatch-lbfgs'solvers,- Gradientis the final mini-batch gradient.
- If - FitMethodis- 'average', then- FitInfois an m-by-1 structure array, where m is the number of partitions specified via the- NumPartitionsname-value argument.
You can access the fields of FitInfo using dot notation. For
            example, for a FeatureSelectionNCARegressionobject named mdl, you can access the
                Objective field using
            mdl.FitInfo.Objective.
Data Types: struct
Other Regression Properties
This property is read-only.
Number of observations in the training data (X and
                Y) after removing NaN or
                Inf values, specified as a scalar.
Data Types: double
This property is read-only.
Predictor means, specified as a p-by-1 vector for standardized
            training data. In this case, the predict method centers predictor
            matrix X by subtracting the respective element of
                Mu from every column.
If data is not standardized during training, then Mu is
            empty.
Data Types: double
This property is read-only.
Predictor standard deviations, specified as a p-by-1 vector for
            standardized training data. In this case, the predict method scales
            predictor matrix X by dividing every column by the respective
            element of Sigma after centering the data using
                Mu.
If data is not standardized during training, then Sigma is
            empty.
Data Types: double
This property is read-only.
Predictor values used to train this model, specified as a matrix or a table. Each
            column of X represents one predictor (variable), and each row
            represents one observation. 
Data Types: single | double | table
This property is read-only.
Response values used to train this model, specified as a numeric vector of size n, where n is the number of observations.
Data Types: double
This property is read-only.
Observation weights used to train this model, specified as a numeric vector of size n. The sum of observation weights is n.
Data Types: double
This property is read-only.
Categorical predictor indices, specified as a vector of positive integers.
                CategoricalPredictors contains index values indicating that the
            corresponding predictors are categorical. The index values are between 1 and
                p, where p is the number of predictors used to
            train the model. If none of the predictors are categorical, then this property is empty
                ([]).
Data Types: single | double
This property is read-only.
Response variable name, specified as a character vector.
Data Types: char
This property is read-only.
Predictor variable names in order of their appearance in the predictor data,
                        specified as a cell array of unique character vectors. The length of
                                PredictorNames is equal to the number of
                        variables in the training data X used as predictor
                        variables.
Data Types: cell
This property is read-only.
Expanded predictor names, specified as a cell array of unique character vectors.
If the model uses encoding for categorical variables, then
                ExpandedPredictorNames includes the names that describe the
            expanded variables. Otherwise, ExpandedPredictorNames is the same as
                PredictorNames.
Data Types: cell
Object Functions
| loss | Evaluate accuracy of learned feature weights on test data | 
| predict | Predict responses using neighborhood component analysis (NCA) regression model | 
| refit | Refit neighborhood component analysis (NCA) model for regression | 
| selectFeatures | Select important features for NCA classification or regression | 
Examples
Load the sample data.
load imports-85The first 15 columns contain the continuous predictor variables, whereas the 16th column contains the response variable, which is the price of a car. Define the variables for the neighborhood component analysis model.
Predictors = X(:,1:15); Y = X(:,16);
Fit a neighborhood component analysis (NCA) model for regression to detect the relevant features.
mdl = fsrnca(Predictors,Y);
The returned NCA model, mdl, is a FeatureSelectionNCARegression object. This object stores information about the training data, model, and optimization. You can access the object properties, such as the feature weights, using dot notation.
Plot the feature weights.
plot(mdl.FeatureWeights,"o") xlabel("Feature Index") ylabel("Feature Weight") grid on

The weights of the irrelevant features are zero. The Verbose=1 option in the call to fsrnca displays the optimization information on the command line. You can also visualize the optimization process by plotting the objective function versus the iteration number.
plot(mdl.FitInfo.Iteration,mdl.FitInfo.Objective,"o-") grid on xlabel("Iteration Number") ylabel("Objective")

The ModelParameters property is a struct that contains more information about the model. You can access the fields of this property using dot notation. For example, see if the data was standardized or not.
mdl.ModelParameters.Standardize
ans = logical
   0
0 means that the data was not standardized before fitting the NCA model. You can standardize the predictors when they are on very different scales using the Standardize=true name-value argument in the call to fsrnca.
Version History
Introduced in R2016b
See Also
predict | fsrnca | refit | loss | selectFeatures
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Website auswählen
Wählen Sie eine Website aus, um übersetzte Inhalte (sofern verfügbar) sowie lokale Veranstaltungen und Angebote anzuzeigen. Auf der Grundlage Ihres Standorts empfehlen wir Ihnen die folgende Auswahl: .
Sie können auch eine Website aus der folgenden Liste auswählen:
So erhalten Sie die bestmögliche Leistung auf der Website
Wählen Sie für die bestmögliche Website-Leistung die Website für China (auf Chinesisch oder Englisch). Andere landesspezifische Websites von MathWorks sind für Besuche von Ihrem Standort aus nicht optimiert.
Amerika
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)