dealing imbalanced data in neural network

I want to use deep learning network for classification problem. I have an issue of imbalanced data, means one of the classes have less training examples than the others.
I know there is an option to remove training data from the other classes, but I wonder if there is other solution. For example, is there an option to modify the cost layer such that the cost of miss classification a specific class will be larger? Thanks,

 Akzeptierte Antwort

Greg Heath
Greg Heath am 12 Jun. 2018

1 Stimme

There many ways to deal with unbalanced classes when there is no more real data available. Over the decades I have used the following
1. Use the summary statistics of small classes to simulate more data
2. Design multiple nets using the smaller classes and subsets of the larger classes.
Then combine the answers.
3. Use a cost matrix to enhance the influence of the small subsets
and/or reduce the influence of the larger subsets
4. A combination of the above.
The basis of the techniques can be understood by examining the following term in the Bayesian Risk
Cij * Pi * p(i|x)
which involves the probability density, a prori probability and the classification cost.
Hope this helps.
Thank you for formally accepting my answer
Greg

3 Kommentare

Tally
Tally am 14 Jun. 2018
Bearbeitet: Tally am 14 Jun. 2018
Thanks Greg.
regarding option 3 (use a cost matrix), is it possible to do it using the matlab neural network toolbox. This toolbox is very convenience allowing me to easily define layers, but those layers seems like black box that cannot be modified. So I can define loss function using the builtin softmaxLayer and classificationLayer but I don't see how I can modify it such that different classes will get different costs. Does the nn toolbox allows custom loss function?
Ariel Liebman
Ariel Liebman am 13 Apr. 2020
I am also trying to find how to change the classification cost matrix for a Matlab Shallow NN. I saw in another post you mentioned you answered this on usenet but I don't know what's going on with usenet these days. Seems very complicated to get on and search something! Haven't used it for 15 years. It is much harder now :-)
Kenta
Kenta am 11 Jul. 2020
For the imbalanced dataset, over-sampling is also effective. The demo is posted below. I hope it helps you.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Deep Learning Toolbox finden Sie in Hilfe-Center und File Exchange

Gefragt:

am 12 Jun. 2018

Kommentiert:

am 11 Jul. 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by