浏览代码

Updated the diagrams on the docs (#3779)

* Updated the diagrams

* Addressing comments
/develop/gym-wrapper
GitHub 5 年前
当前提交
a9ad3dbc
共有 4 个文件被更改,包括 243 次插入661 次删除
  1. 39
      docs/ML-Agents-Overview.md
  2. 198
      docs/images/learning_environment_basic.png
  3. 545
      docs/images/learning_environment_example.png
  4. 122
      docs/images/learning_environment_full.png

39
docs/ML-Agents-Overview.md


complex behaviors by hand is challenging and prone to errors.
With ML-Agents, it is possible to _train_ the behaviors of such NPCs (called
**agents**) using a variety of methods. The basic idea is quite simple. We need
**Agents**) using a variety of methods. The basic idea is quite simple. We need
to define three entities at every moment of the game (called **environment**):
- **Observations** - what the medic perceives about the environment.

- **Agents** - which is attached to a Unity GameObject (any character within a
scene) and handles generating its observations, performing the actions it
receives and assigning a reward (positive / negative) when appropriate. Each
Agent is linked to a Policy.
Agent is linked to a Behavior.
every character in the scene. While each Agent must be linked to a Policy, it is
every character in the scene. While each Agent must be linked to a Behavior, it is
the same Policy type. In our sample game, we have two teams each with their own medic.
the same Behavior. In our sample game, we have two teams each with their own medic.
but both of these medics can have the same Policy. Note that these two
medics have the same Policy because their _space_ of observations and
actions are similar. This does not mean that at each instance they will have
identical observation and action _values_. In other words, the Policy defines the
space of all possible observations and actions, while the Agents connected to it
(in this case the medics) can each have their own, unique observation and action
values. If we expanded our game to include tank driver NPCs, then the Agent
attached to those characters cannot share a Policy with the Agent linked to the
but both of these medics can have the same Behavior. Note that these two
medics have the same Behavior. This does not mean that at each instance they will have
identical observation and action _values_. If we expanded our game to include
tank driver NPCs, then the Agent
attached to those characters cannot share its Behavior with the Agent linked to the
medics (medics and drivers have different actions).
<p align="center">

We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role
the Python API and External Communicator play. Before we dive into those
details, let's summarize the earlier components. Each character is attached to
an Agent, and each Agent has a Policy. The Policy receives observations
and rewards from the Agent and returns actions. The Academy ensures that all the
an Agent, and each Agent has a Behavior. The Behavior can be thought as a function
that receives observations
and rewards from the Agent and returns actions. The Learning Environment through
the Academy (not represented in the diagram) ensures that all the
Note that in a single environment, there can be multiple Agents and multiple Behaviors
at the same time. These Behaviors can communicate with Python through the communicator
but can also use a pre-trained _Neural Network_ or a _Heuristic_. Note that it is also
possible to communicate data with Python without using Agents through _Side Channels_.
One example of using _Side Channels_ is to exchange data with Python about
_Environment Parameters_. The following diagram illustrates the above.
<p align="center">
<img src="images/learning_environment_full.png"
alt="More Complete Example ML-Agents Scene Block Diagram"
border="10" />
</p>
## Training Modes

198
docs/images/learning_environment_basic.png

之前 之后
宽度: 661  |  高度: 568  |  大小: 13 KiB

545
docs/images/learning_environment_example.png
文件差异内容过多而无法显示
查看文件

122
docs/images/learning_environment_full.png

之前 之后
宽度: 1783  |  高度: 1019  |  大小: 71 KiB
正在加载...
取消
保存