Unity 机器学习代理工具包 (ML-Agents) 是一个开源项目,它使游戏和模拟能够作为训练智能代理的环境。
您最多选择25个主题 主题必须以中文或者字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符
 
 
 
 
 

1.2 KiB

Environment Design Best Practices

General

  • It is often helpful to being with the simplest version of the problem, to ensure the agent can learn it. From there increase complexity over time.
  • When possible, It is often helpful to ensure that you can complete the task by using a Player Brain to control the agent.

Rewards

  • The magnitude of any given reward should typically not be greater than 1.0 in order to ensure a more stable learning process.
  • Positive rewards are often more helpful to shaping the desired behavior of an agent than negative rewards.
  • For locomotion tasks, a small positive reward (+0.1) for forward progress is typically used.
  • If you want the agent the finish a task quickly, it is often helpful to provide a small penalty every step (-0.1).

States

  • The magnitude of each state variable should be normalized to around 1.0.
  • States should include all variables relevant to allowing the agent to take the optimally informed decision.
  • Categorical state variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (ie 3 -> 0, 0, 1).

Actions

  • When using continuous control, action values should be clipped to an appropriate range.