Unity 机器学习代理工具包 (ML-Agents) 是一个开源项目,它使游戏和模拟能够作为训练智能代理的环境。
您最多选择25个主题 主题必须以中文或者字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符
 
 
 
 
 

2.1 KiB

Environment Design Best Practices

General

  • It is often helpful to being with the simplest version of the problem, to ensure the agent can learn it. From there increase complexity over time. This can either be done manually, or via Curriculum Learning, where a set of lessons which progressively increase in difficulty are presented to the agent (learn more here).
  • When possible, It is often helpful to ensure that you can complete the task by using a Player Brain to control the agent.

Rewards

  • The magnitude of any given reward should typically not be greater than 1.0 in order to ensure a more stable learning process.
  • Positive rewards are often more helpful to shaping the desired behavior of an agent than negative rewards.
  • For locomotion tasks, a small positive reward (+0.1) for forward velocity is typically used.
  • If you want the agent the finish a task quickly, it is often helpful to provide a small penalty every step (-0.05) that the agent does not complete the task. In this case completion of the task should also coincide with the end of the episode.
  • Overly-large negative rewards can cause undesirable behavior where an agent learns to avoid any behavior which might produce the negative reward, even if it is also behavior which can eventually lead to a positive reward.

States

  • States should include all variables relevant to allowing the agent to take the optimally informed decision.
  • Categorical state variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (ie 3 -> 0, 0, 1).
  • Rotation information on GameObjects should be recorded as state.Add(transform.rotation.eulerAngles.y/180.0f-1.0f); rather than state.Add(transform.rotation.y);.
  • Positional information of relevant GameObjects should be encoded in relative coordinates wherever possible. This is often relative to the agent position.

Actions

  • When using continuous control, action values should be clipped to an appropriate range.
  • Be sure to set the action-space-size to the number of used actions, and not greater, as doing the latter can interfere with the efficency of the training process.