Training network with a large validation set running out of memory

6 Ansichten (letzte 30 Tage)
Jordan Houri
Jordan Houri am 14 Sep. 2018
Kommentiert: Brian Derstine am 5 Nov. 2021
I am training U-Net on a Windows 7 machine with two NVIDIA Titan X GPUs with a data set of 14787 images, of which I dedicate the standard 15% (2218 images) to the validation set. However, during the first validation, I can see the computer's physical memory shoot up by several GB, and then freezing the computer. The network trains fine when reducing the validation set to just 20 images, and barely makes it with 100.
I have no idea what is going wrong. Would it be possible to modify the trainNetwork function (or any of its child functions) so that I can maintain a large validation set, but only have it read a random subset of, say, 40 of the validation images per validation cycle?
I have tried to implement this by modifying the predict() function in ValidationReporter.m to take a random subset of 40 indices in the variable "Data", but this didn't seem to do anything to help.
  2 Kommentare
Emmanuel Koumandakis
Emmanuel Koumandakis am 21 Apr. 2020
I am having the same issue and I see no one has posted a solution. I'm using Matlab R2020a on Ubuntu w/ 2 GTX1080 Ti cards and 32GB of RAM, training DeepLab v3+ on a fairly small dataset. The RAM usage shoots up after every validation and isn't freed completely afterwards resulting in 'Out of memory' errors after a few epochs.
Brian Derstine
Brian Derstine am 26 Jul. 2021
you're not the only one who has noticed this bug. set your validation size super small or run without validation seems to be the only workaround: https://www.mathworks.com/matlabcentral/answers/570004-out-of-memory-error-when-using-validation-while-training-a-dagnetwork?s_tid=answers_rc1-1_p1_Topic

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Joss Knight
Joss Knight am 16 Sep. 2018
The validation uses the same MiniBatchSize as you are using for training to break your data up into chunks. So you might have some luck if you reduce your MiniBatchSize.
  1 Kommentar
Brian Derstine
Brian Derstine am 5 Nov. 2021
Here's the response I got to a recent support ticket: "You are correct. In MATLAB R2021a, there is a bug in the Neural Network Toolbox where, depending on the workflow, if the validation data is large, you may run out of memory on the GPU. This has been reported in image segmentation and LSTM workflows.
The workaround is to reduce the validation data set size or train without validation data. Reducing the "miniBatchSize" does not fix this issue.
A patch for this bug was made in MATLAB R2021b. You may want to consider using this version of MATLAB to avoid encountering this issue."

Melden Sie sich an, um zu kommentieren.

Produkte


Version

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by