I am training a neural network for classification to work with several classes that have some class imbalance. I am using the error weight functionality of train to play with compensating for the class imbalance through "cost-sensitive" learning.
There is documentation for how the error weights are applied with the mse cost function. I would like to know how the error weights are applied when using the crossentropy cost function. Is it implemented as roughly as
Sum_s^N Sum_i^K w(s,i) * ( -t(s,i) * log(y(s,i)) )
where w is the weight for sample s and class i, N is the sample size, and K is the number of classes? (In my case the weight for all samples are the same, only the weights of the classes change in proportion to their frequency in the training set. Also, I'm just using a simple feedforward network with one hidden layer, and using trainscg.)
Do these error weights simply modify the networks outputs, or does it actually modify the training (backpropagation) algorithm? For reference as to what I mean see M. Kukar, I. Kononenko, Cost-sensitive learning with neural networks, 1998. My question is specifically trying to asking if Matlab uses something like the "Minimization of the misclassification costs" method mentioned in that paper, or something like two of the other methods such as "Adapting the output of the network", or "Cost-sensitive classification"
Finally, and partially related, since we can set weights for individual samples, could an ensamble method be created using an AdaBoost like method using these error weights that are given to the train function?