How to deal with imbalanced dataset classification by support vector machine

37 Ansichten (letzte 30 Tage)
I have a dataset that is heavily skewed in one class. The training with support vector machine (SVM), by either fitcsvm.m or fitcecoc.m, cannot give desirable results. The accuracy for the class that has more samples is more than 90%, but for the class with much fewer samples is barely 70%. Is there any way to improve the training by SVM? or other methods that can be used to tackle the umbablanced data training?

Akzeptierte Antwort

Aditya Mittal
Aditya Mittal am 21 Apr. 2020
Hi,
There are some ways which can be used to balance the dataset before fitting to the classifier to get the better result. These methods are as follows:
  • Under Sampling- Removing the unwanted or repeated data from the majority class and keep only a part of these useful points. In this way, there can be some balance in the data.
  • Over Sampling- Try to get more data points for the minority class. Or try to replicate some of the data points of the minority class in order to increase cardinality.
  • Generate Data- You can decide to generate synthetic data for the minority class for balancing the data. This can be done using SMOTE method. Below is the link to use SMOTE method-
  • https://www.mathworks.com/matlabcentral/fileexchange/38830-smote-synthetic-minority-over-sampling-technique
The results vary according to the problem. And accuracy is not always the best performance matric when evaluating imbalanced data. Therefore you should try different performance metrics which can give better insight.
  • Confusion matrix
  • Precision
  • Recall
  • F1 score
Try fitting the data to various machine learning models like hybrid or ensemble machine learning algorithms (e.g. Adaboost), or deep learning models can be used in order to receive better results.
  4 Kommentare
Kenta
Kenta am 11 Jul. 2020
The answer from Dr. Aditya Mittal is very informative. The example of oversampling is posted here. I hope it helps you.
Esmeralda Ruiz Pujadas
Esmeralda Ruiz Pujadas am 22 Mär. 2023
You cannot use those methods directly, you are touching the validation. And SVM is different than deep learning. You cannot especify directly the validation in svm....

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Statistics and Machine Learning Toolbox finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by