浏览代码

Additional best practicese

/tag-0.2.0
GitHub 7 年前
当前提交
f7e33e56
共有 1 个文件被更改,包括 7 次插入4 次删除
  1. 11
      docs/best-practices.md

11
docs/best-practices.md


## General
* It is often helpful to being with the simplest version of the problem, to ensure the agent can learn it. From there increase
complexity over time.
complexity over time. This can either be done manually, or via Curriculum Learning, where a set of lessons which progressively increase in difficulty are presented to the agent ([learn more here](../docs/curriculum.md)).
* For locomotion tasks, a small positive reward (+0.1) for forward progress is typically used.
* If you want the agent the finish a task quickly, it is often helpful to provide a small penalty every step (-0.1).
* For locomotion tasks, a small positive reward (+0.1) for forward velocity is typically used.
* If you want the agent the finish a task quickly, it is often helpful to provide a small penalty every step (-0.05) that the agent does not complete the task. In this case completion of the task should also coincide with the end of the episode.
* Overly-large negative rewards can cause undesirable behavior where an agent learns to avoid any behavior which might produce the negative reward, even if it is also behavior which can eventually lead to a positive reward.
* The magnitude of each state variable should be normalized to around 1.0.
* Rotation information on GameObjects should be recorded as `state.Add(transform.rotation.eulerAngles.y/180.0f-1.0f);` rather than `state.Add(transform.rotation.y);`.
* Positional information of relevant GameObjects should be encoded in relative coordinates wherever possible. This is often relative to the agent position.
* Be sure to set the action-space-size to the number of used actions, and not greater, as doing the latter can interfere with the efficency of the training process.
正在加载...
取消
保存