Modifying the control actions to safe ones before storing in the experience buffer during SAC agent training.
3 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Ahmed R. Sayed
am 18 Jan. 2022
Beantwortet: Ahmed R. Sayed
am 21 Sep. 2022
Hello everyone,
I am implementing a safe off-policy DRL SAC algorithm. Using an iterative convex optimization algorithm moves actions into a safe region. However, this algorithm is applied in the environment. Therefore, the existing rlSACAgent still store unsafe actions in the buffer, and the agent cannot learn the modified actions. Therefore, the iterative algorithm will be supplied with unlearned actions and takes more time to converge. My question is:
How can I store the modified actions in the experience buffer instead of the unsafe ones?
Illustrative figure:
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/866360/image.png)
Many thanks for your help.
0 Kommentare
Akzeptierte Antwort
Weitere Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!