How can I simulate Direct ADP?
Ältere Kommentare anzeigen
I want to simulate this article ( On-Line Learning Control by Association and Reinforcement ) but I have a problem in obtaining optimum weight for critic neural network. The critic neural network error in this article is [ e_c = J(t) - (J(t-1) - r(t)) ] , and at the begining the critic weights are selected randomly. My question is that, at the begining we dont have any J(t-1) and also we know that J(t) and r(t) are positive functions, so if we consider J(t-1) = 0, then J(t) will converg to -r(t) and become a negative number that is false.

Antworten (0)
Kategorien
Mehr zu Reinforcement Learning Toolbox finden Sie in Hilfe-Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!