Due to the limitation of GPU memory, a deeplearning network can't learn like 16 samples in a batch.
So can I compute the gradients for a batch of 8 samples, and update the network gradients with 2 batches' gradients?
If I compute the gradients of a deeplearning network by
[gradients,state,loss] = dlfeval(@modelGradient,dlNet,xTrain,yTrain);
So after 2 batches, I get gradients1, gradients2, state1, state2, loss1, and loss2.
For my instant opinion, I think the total gradients should be the mean of gradients1 and gradients2.
But how can I compute the state values? Is it also the mean of state1 and state2? Thank you.