Genetic Algorithm. About ga function in matlab and feature selection

Question

0 Stimmen

I am using Genetic algorithm for feature selection. I used the built-in function of matlab

In the above function from where I will get the selected features? what is the value stored in x? For what it was used? Whether I have to include other steps for feature selection?What will be the nvars?Is it the number of features inputed to genetic algorithm?Kindly clarify..Thanks in advance

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Walter Roberson am 13 Apr. 2022

0 Stimmen

The function fun that you pass in, is responsible accepting trial vectors of model parameters, and evaluating the "cost" associated with those particular parameters. The return value, x, that gets returned, is the vector of model parameters that resulted in the lowest "cost".

In the case of feature selection, the trial parameters you pass in could potentially include a vector of integer decision variables, restricted to 0 or 1, with 0 meaning that the corresponding feature is not selected, and 1 meaning that it is selected. To select no more than N features, you could add a linear constraint that the sum of those decision variables is <= N.

As the return is the trial parameters that resulted in the lowest cost, then if you did integer decision variables like I describe, then that section of the output vector would tell you which features were selected (1) or not (0)

The return value from ga() does not inherently tell you about selected features: you have to arrange your function so that the set of selected features can be calculated from the inputs, and then after when you get the best parameters out, use them to say which features were selected.

Another approach instead of binary decision variables would be to use a vector of integer constrained variables, each between 1 and the number of features, effectively listing off which are selected.

nvars is the total number of model parameters that are to be varied. The output will be of the length indicated by nvars. This is not necessarily the same as the number of features, since you might have extra variables not being used as decision variables, or you might have chosen to encode by feature number instead of by binary decision variables.

13 Kommentare
11 ältere Kommentare anzeigen 11 ältere Kommentare ausblenden

Walter Roberson am 14 Apr. 2022

In MATLAB Online öffnen

fun=@(t,p)mse(mdl,testdata, testlabel) ;

That function expects up to two inputs.

It then ignores the inputs, and calculates mse for the exact same values of mdl, testdata, and testlabel each time. Nothing useful will be done with that as it is exactly the same calculation each time it is attempted.

Note also that ga() will only be passing one parameter to fun -- the vector of trial values. But that is not going to happen to be important here because the function ignores the inputs it does get passed.

The ga() that you are doing does not help select features.

There are three major ways to do feature selection here:

you could interpret the vector of trial inputs as being a binary vector that indicates which features to include; in this case you would pass a number of variables the same as the number of features, and you would configure their lower bound to be 0 and their upper bound to be 1 and mark them as integer. The 1 values in the output vector show which features are selected.
you could interpret the vector of trial inputs as being a vector of integer indices that indicate which features are active; in this case you would pass a number of variables the same as the number of features you want to select, and you would configure their lower bound to 1 and their upper bound to the number of features, and mark them as integer. The output vector is the vector of indices of features
you could interpret the vector of trial inputs as being a vector of weights for each component; in this case you would pass a number of variables the same as the number of features, and you would configure the lower bound to 0 and the upper bound to 1, and do not mark them as integer. The output vector would be the vector of weights; you would sort in descending order and take the highest-weighted ones as your features

All three of these would require some adjustment to your function that calculates mse.

Little Flower am 6 Jul. 2022

@Walter Roberson

Thank you.. I have another doubt..Which test is suitable to find the statistical significance between features??

Little Flower am 6 Jul. 2022

I have a dataset of size (m,n) matrix. Here is the number of obeservation and n is the number of features. These m samples lie in c classes. For eg. c=4. Now my question is can i do any of the statistical significance test for giving the whole matrix as input or i have to seperate it in terms of classes. Which method is preferred in my case?. Whether it is student t test or anova or some other test?

Melden Sie sich an, um zu kommentieren.

Genetic Algorithm. About ga function in matlab and feature selection

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

13 Kommentare
11 ältere Kommentare anzeigen 11 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

Genetic Algorithm. About ga function in matlab and feature selection

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

13 Kommentare 11 ältere Kommentare anzeigen 11 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

13 Kommentare
11 ältere Kommentare anzeigen 11 ältere Kommentare ausblenden