关于算法伪代码5.2,5.3 #21
Libertax-coder
started this conversation in
General
Replies: 1 comment 1 reply
-
如果策略在一个Episode里面没有使用是可以拿到外面的 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
想请问一下,这两个伪代码中Policy improvement这个步骤是不是应该放在第二个for循环的外面。
第二个for 循环遍历的是一个完整的episode,在循环内更新策略好像没什么用,是不是应该在遍历完一整个episode之后进行一次Policy improvement?
就像书里说的——”Then, the policy can be improved in an episode-by-episode fashion.“
Beta Was this translation helpful? Give feedback.
All reactions