Improve GPU utilization during regression deep learning
10 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I'm having trouble improving GPU utilization on, I think, a fairly straightforward deep learning example, and wonder if there is anything clearly being done incorrectly - I'm not an expert on this field, and so am not quite sure exactly what information is most relevant to provide.
I'm using a 3090 GPU, the actual neural net architecture is a few fully-connected layers, each with ~100 neurons. The input data is a featureInput with 3 inputs, and ~20k points, going to one regression output.
The relatively sparse training options are as follows:
options = trainingOptions("adam", ...
MaxEpochs=500, ...
Shuffle="every-epoch", ...
InitialLearnRate=0.001,...
MiniBatchSize=128);
However, when I train the network, I only reach ~10% gpu utilization. I'm assuming that somehow I'm either being bottlenecked by some other step of the process.
My goal ultimately is actually to train the model ~100s of times, each with different choices of initial data. So in that sense, though my input data is relatively small (which perhaps is leading to a bottleneck?), I'm hoping to find some way to paralellize multiple trainings on the same gpu. Is this possible, or is there some other thing I've clearly overlooked when it comes to improving the utilization?
1 Kommentar
Joss Knight
am 12 Apr. 2023
What is your data? What does the MATLAB Profiler say about where time is being spent? Have you tried to maximize the MiniBatchSize to improve throughput?
Antworten (1)
Aishwarya Shukla
am 2 Mai 2023
Hi @Adam Shaw
It's hard to say exactly what's causing the low GPU utilization without more information, but here are a few potential issues to consider:
- Batch size: With a mini-batch size of 128, it's possible that your GPU is underutilized because the batches are too small to fully occupy the GPU. You could try increasing the batch size to see if that improves GPU utilization.
- Data loading: If your data loading process is slow, then the GPU may be waiting for data to arrive during training, leading to low utilization. Consider using data augmentation techniques or pre-loading your data onto the GPU to improve data loading performance.
- Model complexity: Your neural network may not be complex enough to fully utilize the GPU. Consider adding more layers or increasing the number of neurons per layer to see if that improves GPU utilization.
- Other system constraints: It's possible that your GPU is being bottlenecked by other system constraints, such as CPU or memory bandwidth. You can monitor these metrics during training to see if they are limiting GPU utilization.
Regarding parallel training, it is possible to train multiple models simultaneously on the same GPU using parallel computing libraries such as PyTorch's DistributedDataParallel or TensorFlow's MirroredStrategy. However, keep in mind that training multiple models on the same GPU will increase memory usage, potentially leading to memory errors or slower training times.
1 Kommentar
Joss Knight
am 7 Mai 2023
Or perhaps, since you're using MATLAB not python, use MATLAB to train multiple models such as described in our documentation .
Siehe auch
Kategorien
Mehr zu Parallel and Cloud finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!