Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I am using the "trainnet" function to train a relatively shallow MLP network. To accelerate processing, I use two different GPUs: an RTX 2070 Super and an RTX 4060 Ti.
The RTX 2070 Super produces the expected output for all iterations. However, the RTX 4060 Ti quickly terminates the training process because the StepNorm value approaches zero. I suspect this issue is related to the memory bandwidth difference—256-bit for the RTX 2070 Super versus 128-bit for the RTX 4060 Ti—which might affect numerical precision during parallel computations.
When I checked the SingleDoubleRatio, I found that:
- RTX 2070 Super: 32
- RTX 4060 Ti: 64
According to the MATLAB documentation, a SingleDoubleRatio of 32 indicates more double-precision computation, while 64 indicates less. I attempted to manually enforce precision control for the GPU but was unsuccessful.
How can I resolve this issue and ensure stable training on the RTX 4060 Ti? Any insights would be greatly appreciated.
Thank you!
0 Kommentare
Antworten (0)
Siehe auch
Kategorien
Mehr zu Pattern Recognition and Classification finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!