Why is my neural network performing worse as the number of hidden layers increases?
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
bitslice
am 2 Aug. 2015
Kommentiert: Greg Heath
am 5 Aug. 2015
Hello, I am currently using the Matlab Neural Network toolbox to experiment with the Iris dataset. I am training with "trainlm" algorithm, and I decided to see what would happen if I trained with 1:20 hidden layers. I was not expecting any change in the classification error, but when I do this, I get the following output:
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/147648/image.jpeg)
I have been looking for a solution, but I cannot explain why the classification error begins to jump, or even increases at all as the number of hidden layers increases.
Thank You
0 Kommentare
Akzeptierte Antwort
Greg Heath
am 2 Aug. 2015
The ultimate goal is too obtain a net that performs well on non-training data that comes from the same or similar source as the training data. This is called GENERALIZATION.
Frequent causes of failure are
1. Not enough weights to adequately characterize the training data
2. Training data does not adequately characterize the salient features of non-training data because of measurement error, transcription error, noise, interference, insufficient sampling size and variability
3. Fewer training equations than unknown weights.
4. Random weight initialization
Various techniques use to mitigate these causes are
1. Remove bad data and outliers (plots help)
2. Use enough training data to sufficiently characterize non-training data.
3. Use enough weights to adequately characterize the training data
4. Use more training equations than unknown weights. The stability of
solutions w.r.t. noise and errors increases as the ratio increases.
5. Use the best of multiple random initialization & data-division designs
6. K-fold Cross-validation
7. Validation Stopping
8. Regularization
For the iris_dataset
[ I N ] = size(input) % [ 4 150 ]
[ O N ] = size(target) % [ 3 150 ]
Assuming the default 0.7/0.15/0.15 trn/val/tst data division, the number of training equations is approximately
Ntrneq = 0.7*N*O % 315
Assuming the default I-H-O node topology, the number of unknown equations is
Nw = (I+1)*H+(H+1)*O = (I+O+1)*H + O
Obviously, Nw < Ntrneq when H <= Hub (upper bound) where
Hub = floor( (Ntrneq-O)/(I+O+1)) % 39
Expecting decent solutions for H <= 20 seems reasonable. However, to
mitigate the random initial weights and data division, design 10 nets for each value
I have posted zillions of examples in both the NEWSGROUP and ANSWERS. I use patternnet for classification.
Hope this helps.
Thank you for formally accepting my answer
Greg
5 Kommentare
Weitere Antworten (1)
Walter Roberson
am 2 Aug. 2015
Each layer is initialized randomly. If you do not provide enough data to train the effects of the randomness out, then you have the effect of the cumulative randomness.
3 Kommentare
Walter Roberson
am 2 Aug. 2015
Greg Heath has written several times about the amount of data that one should use, but I cannot think of good keywords at the moment to search for.
Siehe auch
Kategorien
Mehr zu Sequence and Numeric Feature Data Workflows finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!