- Improper weight initializations
- High Learning Rate
- Batch normalizations.
- https://www.mathworks.com/matlabcentral/answers/337587-how-to-avoid-nan-in-the-mini-batch-loss-from-traning-convolutional-neural-network
- https://www.mathworks.com/matlabcentral/answers/1917165-training-loss-is-nan-deep-learning
- https://www.mathworks.com/matlabcentral/answers/92319-how-are-nan-values-in-the-input-data-for-a-neural-network-taken-into-account-while-training-the-netw