- Action Bounds: Ensure that the action bounds are correctly defined. If the boundaries are too restrictive, the agent might struggle to learn effective actions.
- Normalization: Normalizing the inputs and outputs can significantly impact training stability. Consider normalizing both state and action values to a common range (e.g., [0, 1]).
- Custom Environment: Verify that your custom environment is correctly implemented. Double-check the reward function, state representation, and action space.
- Exploration Noise: TD3 relies on exploration noise to encourage exploration. Ensure that the noise level is appropriate during training.
TD3算法训练时动作总是输出边界值
28 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
我在使用TD3算法训练完成后,无论训练过程中奖励曲线是否收敛,动作总是输出边界值或者输出完全不正确。我的state的值在0-20000,动作边界在0-15000.是哪里出了问题,是自定义环境创建的不正确还是哪里?需要对输入输出进行归一化吗
0 Kommentare
Antworten (1)
UDAYA PEDDIRAJU
am 14 Mär. 2024
Hi 泽宇,
Regarding your issue with the TD3 algorithm where actions always output at boundary values regardless of whether the reward curve converges.
It’s essential to investigate a few potential factors:
you can refer to the documentation TD3: https://www.mathworks.com/help/reinforcement-learning/ug/td3-agents.html.
Siehe auch
Kategorien
Mehr zu Big Data Processing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!