MATLAB Answers

Questions on classification learner App

4 views (last 30 days)
Hello everyone.
I have started using the classification learner app and I have some questions I would like to ask. I will use Matlab's ovarian cancer data-set as an example to illustrate my issues.
1) In the case where we might be missing the response class for an observation (e.g. if response type was coming from histology and histology was not performed for the specific observation, but the predictors'data is available), is it preferable to set the missing observation's response to another, extra, class (e.g. 'unknown') or is it better not to use the observation at all?
2) When enabling PCA to reduce the dimensionality of the observations (in the ovarian cancer data-set, PCA reduces the number of predictors from 4000 to 215 and is using 7/215 features), can we know which features (obs in the ovarian cancer data-set) are the ones that PCA has kept?
3) When exporting a trained model to make predictions for new data and PCA was used dung training, what extra arguments do we need to use when calling: newPredictions = myExportedModel.predictFcn(newData) to ensure that the function knows that PCA was used during training myExportedModel?
Many thanks in advance for your help!
Regards, Ioannis

Accepted Answer

Nagarjuna Manchineni
Nagarjuna Manchineni on 26 May 2017
1. It depends on the classifier and the data/application you are using. For example, if you are trying to solve your classification problem using a linear classifier that predicts whether cancer is there or not? In this case making a third category (unknown) is not going to help. Whereas if you are trying to group all the data into clusters () then making them as "NAN" or 'Unknown' helps you.
2. Principal component analysis is a quantitatively rigorous method for achieving this simplification. The method generates a new set of variables, called principal components. Each principal component is a linear combination of the original variables. All the principal components are orthogonal to each other, so there is no redundant information. The principal components as a whole form an orthogonal basis for the space of the data. For example, in the cancer dataset, if you are using x predictors and then MATLAB PCA reduces this to y (<=x). These are not the actual data (columns) which you are using, these are derived columns out of the predictors by MATLAB. If you want to see the data of these 7 components out of the trained classifier, then you can use the following command
>> trainedClassifier.PCACoefficients
Also, for seeing how to use the trained classifier, use the following command, this command will give the whole description on how this particular model should be used and how to predict the response variable from the input data
>> trainedClassifier.HowToPredict
3. MATLAB trained model will know whether PCA is used or not, so it will handle the conversions, you just need to pass the observation which you want to test. However, if you want to ensure that if the trained classifier used PCA before then, you can use the above suggested 'HowToPredict' function.
See the following documentation link that explains about PCA:

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by