浏览代码

[Documentation] SetReward method (#1996)

Added a paragraph in the docs/Learning-Environment-Design-Agents.md document regarding the use of SetReward and how it is different from AddReward
/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
0d6a24c5
共有 1 个文件被更改,包括 7 次插入3 次删除
  1. 10
      docs/Learning-Environment-Design-Agents.md

10
docs/Learning-Environment-Design-Agents.md


Brain to control the Agent while watching how it accumulates rewards.
Allocate rewards to an Agent by calling the `AddReward()` method in the
`AgentAction()` function. The reward assigned in any step should be in the range
[-1,1]. Values outside this range can lead to unstable training. The `reward`
value is reset to zero at every step.
`AgentAction()` function. The reward assigned between each decision
should be in the range [-1,1]. Values outside this range can lead to
unstable training. The `reward` value is reset to zero when the agent receives a
new decision. If there are multiple calls to `AddReward()` for a single agent
decision, the rewards will be summed together to evaluate how good the previous
decision was. There is a method called `SetReward()` that will override all
previous rewards given to an agent since the previous decision.
### Examples

正在加载...
取消
保存