Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training

2 Ansichten (letzte 30 Tage)
Keonwook Kim
Keonwook Kim am 11 Mär. 2025
I am using the "trainnet" function to train a relatively shallow MLP network. To accelerate processing, I use two different GPUs: an RTX 2070 Super and an RTX 4060 Ti.
The RTX 2070 Super produces the expected output for all iterations. However, the RTX 4060 Ti quickly terminates the training process because the StepNorm value approaches zero. I suspect this issue is related to the memory bandwidth difference—256-bit for the RTX 2070 Super versus 128-bit for the RTX 4060 Ti—which might affect numerical precision during parallel computations.
When I checked the SingleDoubleRatio, I found that:
  • RTX 2070 Super: 32
  • RTX 4060 Ti: 64
According to the MATLAB documentation, a SingleDoubleRatio of 32 indicates more double-precision computation, while 64 indicates less. I attempted to manually enforce precision control for the GPU but was unsuccessful.
How can I resolve this issue and ensure stable training on the RTX 4060 Ti? Any insights would be greatly appreciated.
Thank you!

Antworten (0)

Kategorien

Mehr zu Pattern Recognition and Classification finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by