Feature Selection using ReliefF function in Regression Learner App

9 Ansichten (letzte 30 Tage)
XT
XT am 13 Okt. 2022
Beantwortet: Drew am 6 Nov. 2023

I want to put the features selected by ReliefF function into some regression model. Rt is the response and the others are var. I have upload the sampledata which including 'TWOcff1' and 'TWOcharmm1'. When I use TWOcharmm1, the model can work. When I use TWOcff1, the error occured. TWOcff1 and TWOcharmm1 have almost the same varities, just a few varities different in values. The response is the same, just different in rank.

  1 Kommentar
XT
XT am 14 Okt. 2022
I also found that when I changed the rank of the sample of TWOcharmm1, the error occured.

Melden Sie sich an, um zu kommentieren.

Antworten (2)

Rohit
Rohit am 24 Feb. 2023
This seems to be a problem when selecting features with RReliefF function and having some predictors(columns) in data with constant values. This is a known bug and developers at MathWorks are working on it. As a workaround you can normalize the data before using RReliefF and training the model.

Drew
Drew am 6 Nov. 2023
This issue has been fixed in R2023b, which is available now.
The issue occured when ranking features by RReliefF and having a constant predictor in a training set. This could occur, for example, in a training set for one of the cross-validation folds, even though it doesn't occur when building a model on the full training set. In R2023b, this condition is now handled correctly. In R2023b, to see in more detail how feature normalization is handled prior to ranking features by RReliefF, take these steps:
(1) Load the attached TWOcff1 data into Regression Learner
(2) Do feature ranking using RReliefF. Choose to keep 269/270 features
(3) Build the default tree model
(4) Select the tree model that was just built, then use the "Generate Function" option in Regression Learner to generate code for training the model and doing cross validation. In the snippet of generated code seen below, one can see the normalization steps that are taken prior to feature ranking with RReliefF. This code ensures that columns with zero variance are set to values of zero (rather than NaN), after the default "zscore" normalization using the "normalize" function.
% Feature Ranking and Selection
% Replace Inf/-Inf values with NaN to prepare data for normalization
trainingPredictors = standardizeMissing(trainingPredictors, {Inf, -Inf});
% Normalize data for feature ranking
isZeroVarianceColumn = varfun(@(x) isnumeric(x) && (var(x) == 0), trainingPredictors, 'OutputFormat', 'uniform');
predictorMatrix = normalize(trainingPredictors, "DataVariable", ~foldIsCategoricalPredictor);
predictorMatrix{:, isZeroVarianceColumn} = 0;
Note that the feature normalization used before feature ranking (if any, it depends on the feature ranking technique) is independent of the feature normalization that occurs (if any) before the model training.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by