浏览代码

Documentation Update (#1339)

* Documentation Update

* addressed comments

* new images for the recorder

* Improvements to the docs

* Address the comments

* Core_ML typo

* Updated the links to inference repo

* Put back Inference-Engine.md

* fix typos : brain

* Readd deleted file

* fix typos

* Addressed comments
/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
bd4a8db2
共有 20 个文件被更改,包括 425 次插入354 次删除
  1. 2
      README.md
  2. 54
      docs/Basic-Guide.md
  3. 2
      docs/FAQ.md
  4. 16
      docs/Getting-Started-with-Balance-Ball.md
  5. 2
      docs/Learning-Environment-Best-Practices.md
  6. 14
      docs/Learning-Environment-Create-New.md
  7. 6
      docs/Learning-Environment-Design-Academy.md
  8. 2
      docs/Learning-Environment-Design-Agents.md
  9. 6
      docs/Learning-Environment-Design-Brains.md
  10. 10
      docs/Learning-Environment-Design.md
  11. 6
      docs/Learning-Environment-Examples.md
  12. 2
      docs/Learning-Environment-Executable.md
  13. 4
      docs/ML-Agents-Overview.md
  14. 48
      docs/Migrating.md
  15. 10
      docs/Training-Imitation-Learning.md
  16. 2
      docs/Training-ML-Agents.md
  17. 4
      docs/Training-PPO.md
  18. 4
      docs/Training-on-Microsoft-Azure.md
  19. 182
      docs/images/demo_component.png
  20. 403
      docs/images/demo_inspector.png

2
README.md


* For more information, in addition to installation and usage instructions, see
our [documentation home](docs/Readme.md).
* If you are a researcher interested in a discussion of Unity as an AI platform, see a pre-print of our [reference paper on Unity and the ML-Agents Toolkit](https://arxiv.org/abs/1809.02627). Also, see below for instructions on citing this paper.
* If you have used a version of the ML-Agents toolkit prior to v0.6, we strongly
* If you have used an earlier version of the ML-Agents toolkit, we strongly
recommend our [guide on migrating from earlier versions](docs/Migrating.md).
## Additional Resources

54
docs/Basic-Guide.md


## Setting up the ML-Agents Toolkit within Unity
In order to use the ML-Agents toolkit within Unity, you need to change some
Unity settings first. You will also need to have appropriate inference backends
installed in order to run your models inside of Unity. See [here](Inference-Engine.md)
for more information.
In order to use the ML-Agents toolkit within Unity, you first need to change a few
Unity settings.
1. Launch Unity
2. On the Projects dialog, choose the **Open** option at the top of the window.

5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,
**iOS** or **Android**):
1. Option the **Other Settings** section.
1. Expand the **Other Settings** section.
## Setting up the Inference Engine
We provide pre-trained models for all the agents in all our demo environments.
To be able to run those models, you'll first need to set-up the Inference
Engine. The Inference Engine is a general API to
run neural network models in Unity that leverages existing inference libraries such
as TensorFlowSharp and Apple's Core ML. Since the ML-Agents Toolkit uses TensorFlow
for training neural network models, the output model format is TensorFlow and
the model files include a `.tf` extension. Consequently, you need to install
the TensorFlowSharp backend to be able to run these models within the Unity
Editor. You can find instructions
on how to install the TensorFlowSharp backend [here](Inference-Engine.md).
Once the backend is installed, you will need to reimport the models : Right click
on the `.tf` model and select `Reimport`.
## Running a Pre-trained Model
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall/Scenes` folder

3. In the `Ball 3D Agent` Component: Drag the **3DBallLearning** located into
3. In the `Ball 3D Agent` Component: Drag the **3DBallLearning** Brain located in
__Note__ : You can modify multiple game objects in a scene by selecting them all at once using the search bar in the Scene Hierarchy.
__Note__ : You can modify multiple game objects in a scene by selecting them all at
once using the search bar in the Scene Hierarchy.
6. Drag the `3DBall` model file from the `Assets/ML-Agents/Examples/3DBall/TFModels`
folder to the **Model** field of the **3DBallLearning**.
6. Drag the `3DBallLearning` model file from the `Assets/ML-Agents/Examples/3DBall/TFModels`
folder to the **Model** field of the **3DBallLearning** Brain.
7. Click the **Play** button and you will see the platforms balance the balls
using the pretrained model.

### Adding a Brain to the training session
Since we are going to build this environment to conduct training, we need to add
the Brain to the training session. This allows the Agents linked to that Brain
to communicate with the external training process when making their decisions.
To set up the environment for training, you will need to specify which agents are contributing
to the training and which Brain is being trained. You can only perform training with
a `Learning Brain`.
1. Assign the **3DBallLearning** to the agents you would like to train and the **3DBallPlayer** Brain to the agents you want to control manually.
__Note:__ You can only perform training with an `Learning Brain`.
1. Assign the **3DBallLearning** Brain to the agents you would like to train.
__Note:__ You can assign the same Brain to multiple agents at once : To do so, you can
use the prefab system. When an agent is created from a prefab, modifying the prefab
will modify the agent as well. If the agent does not synchronize with the prefab, you
can hit the Revert button on top of the Inspector.
Alternatively, you can select multiple agents in the scene and modify their `Brain`
property all at once.
__Note:__ Assigning a Brain to an agent (dragging a Brain into the `Brain` property of
the agent) means that the Brain will be making decision for that agent. Whereas dragging
a Brain into the Broadcast Hub means that the Brain will be exposed to the Python process.
The `Control` checkbox means that in addition to being exposed to Python, the Brain will
be controlled by the Python process (required for training).
![Set Brain to External](images/mlagents-SetBrainToTrain.png)

2
docs/FAQ.md


## Cannot drag Model into Learning Brain
You migh not have the appropriate backend required to import the model. Refer to the
You might not have the appropriate backend required to import the model. Refer to the
[Inference Engine](Inference-Engine.md) for more information on how to import backends
and reimport the asset.

16
docs/Getting-Started-with-Balance-Ball.md


The Academy object for the scene is placed on the Ball3DAcademy GameObject. When
you look at an Academy component in the inspector, you can see several
properties that control how the environment works.
The **Broadcast Hub** keeps track of which brains will send data during training,
If a brain is added to the hub, his data will be sent to the external training
The **Broadcast Hub** keeps track of which Brains will send data during training,
If a Brain is added to the hub, his data will be sent to the external training
control the agents linked to the brain to train them.
control the agents linked to the Brain to train them.
The **Training** and **Inference Configuration** properties
set the graphics and timescale properties for the Unity application.
The Academy uses the **Training Configuration** during training and the

### Brain
Brains are assets that exist in your project folder. The Ball3DAgents are connected
to a brain, for example : the **3DBallLearning**.
to a Brain, for example : the **3DBallLearning**.
A Brain doesn't store any information about an Agent, it just
routes the Agent's collected observations to the decision making process and
returns the chosen action to the Agent. Thus, all Agents can share the same

You can create brain objects by selecting `Assets ->
Create -> ML-Agents -> Brain`. There are 3 kinds of brains :
The **Learning Brain** is a brain that uses a Neural Network to take decisions.
You can create Brain objects by selecting `Assets ->
Create -> ML-Agents -> Brain`. There are 3 kinds of Brains :
The **Learning Brain** is a Brain that uses a Neural Network to take decisions.
When the Brain is checked as `Control` in the Academy **Broadcast Hub**, the
external process will be taking decisions for the agents
and generate a neural network when the training is over. You can also use the

The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the intructions in
follow the instructions in
[Using an Executable](Learning-Environment-Executable.md).
### Observing Training Progress

2
docs/Learning-Environment-Best-Practices.md


## General
* It is often helpful to start with the simplest version of the problem, to
ensure the agent can learn it. From there increase complexity over time. This
ensure the agent can learn it. From there, increase complexity over time. This
can either be done manually, or via Curriculum Learning, where a set of
lessons which progressively increase in difficulty are presented to the agent
([learn more here](Training-Curriculum-Learning.md)).

14
docs/Learning-Environment-Create-New.md


## Add Brains
The Brain object encapsulates the decision making process. An Agent sends its
observations to its Brain and expects a decision in return. The type of the brain
observations to its Brain and expects a decision in return. The type of the Brain
1. Go to `Assets -> Create -> ML-Agents` and select the type of brain you want to
1. Go to `Assets -> Create -> ML-Agents` and select the type of Brain you want to
create. In this tutorial, we will create a **Learning Brain** and
a **Player Brain**.
2. Name them `RollerBallBrain` and `RollerBallPlayer` respectively.

setting the Brain properties so that they are compatible with our Agent code.
1. In the Academy Inspector, add the `RollerBallBrain` and `RollerBallPlayer`
brains to the **Broadcast Hub**.
Brains to the **Broadcast Hub**.
2. Select the RollerAgent GameObject to show its properties in the Inspector
window.
3. Drag the Brain `RollerBallPlayer` from the Project window to the

Also, drag the Target GameObject from the Hierarchy window to the RollerAgent
Target field.
Finally, select the the `RollerBallBrain` and `RollerBallPlayer` brains assets
Finally, select the the `RollerBallBrain` and `RollerBallPlayer` Brain assets
so that you can edit their properties in the Inspector window. Set the following
properties on both of them:

## Testing the Environment
It is always a good idea to test your environment manually before embarking on
an extended training run. The reason we have created the `RollerBallPlayer` brain
an extended training run. The reason we have created the `RollerBallPlayer` Brain
is so that we can control the Agent using direct keyboard
control. But first, you need to define the keyboard to action mapping. Although
the RollerAgent only has an `Action Size` of two, we will use one key to specify

1. Select the `RollerBallPlayer` brain to view its properties in the Inspector.
1. Select the `RollerBallPlayer` Brain to view its properties in the Inspector.
a player brain).
a **PlayerBrain**).
3. Set **Size** to 4.
4. Set the following mappings:

6
docs/Learning-Environment-Design-Academy.md


## Academy Properties
![Academy Inspector](images/academy.png)
* `Broadcast Hub` - Gathers the brains that will communicate with the external
process. Any brain added to the Broadcast Hub will be visible from the external
process. In addition, if the checkbox `Control` is checked, the brain will be
* `Broadcast Hub` - Gathers the Brains that will communicate with the external
process. Any Brain added to the Broadcast Hub will be visible from the external
process. In addition, if the checkbox `Control` is checked, the Brain will be
controllable from the external process and will thus be trainable.
* `Max Steps` - Total number of steps per-episode. `0` corresponds to episodes
without a maximum number of steps. Once the step counter reaches maximum, the

2
docs/Learning-Environment-Design-Agents.md


that you can use the same Brain in multiple Agents. How a Brain makes its
decisions depends on the kind of Brain it is. A Player Brain allows you
to directly control the agent. A Heuristic Brain allows you to create a
decision script to control the agent with a set of rules. These two brains
decision script to control the agent with a set of rules. These two Brains
do not involve neural networks but they can be useful for debugging. The
Learning Brain allows you to train and use neural network models for
your Agents. See [Brains](Learning-Environment-Design-Brains.md).

6
docs/Learning-Environment-Design-Brains.md


can also create several Brains, attach each of the Brain to one or more than one
Agent.
There are 3 kinds of brains you can use:
There are 3 kinds of Brains you can use:
* [Learning](Learning-Environment-Learning-Brains.md) – Use a
* [Learning](Learning-Environment-Design-Learning-Brains.md) – Use a
**LearningBrain** to make use of a trained model or train a new model.
* [Heuristic](Learning-Environment-Design-Heuristic-Brains.md) – Use a
**HeuristicBrain** to hand-code the Agent's logic by extending the Decision class.

* `Action Descriptions` - A list of strings used to name the available
actions for the Brain.
The other properties of the brain depend on the type of Brain you are using.
The other properties of the Brain depend on the type of Brain you are using.
## Using the Broadcast Feature

10
docs/Learning-Environment-Design.md


To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
a single Academy subclass and as many Agent subclasses
as you need. The brain assets are present in the project and should be grouped
as you need. The Brain assets are present in the project and should be grouped
together and named according to the type of agents they are compatible with.
Agent instances should be attached to the GameObject representing that Agent.

The Brain encapsulates the decision making process. Every Agent must be
assigned a Brain, but you can use the same Brain with more than one Agent.
__Note__:You can assign the same brain to multiple agents by using prefabs
or by selecting all the agents you want to attach the brain to using the
__Note__:You can assign the same Brain to multiple agents by using prefabs
or by selecting all the agents you want to attach the Brain to using the
type of brain you want to use. During training, use a **Learning Brain**
type of Brain you want to use. During training, use a **Learning Brain**
different types of Brains. You can create new kinds of brains if the three
different types of Brains. You can create new kinds of Brains if the three
built-in don't do what you need.
The Brain class has several important properties that you can set using the

6
docs/Learning-Environment-Examples.md


* Set-up: A platforming environment where the agent can push a block around.
* Goal: The agent must push the block to the goal.
* Agents: The environment contains one agent linked to a single brain.
* Agents: The environment contains one agent linked to a single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: (Continuous) 70 variables corresponding to 14
ray-casts each detecting one of three possible objects (wall, goal, or
block).

![Reacher](images/reacher.png)
* Set-up: Double-jointed arm which can move to target locations.
* Goal: The agents must move it's hand to the goal location, and keep it there.
* Goal: The agents must move its hand to the goal location, and keep it there.
* Agents: The environment contains 10 agent linked to a single Brain.
* Agent Reward Function (independent):
* +0.1 Each step agent's hand is in goal location.

2
docs/Learning-Environment-Executable.md


![3DBall Scene](images/mlagents-Open3DBall.png)
Make sure the Brains in the scene have the right type. For example, if you want
to be able to control your agents from Python, you will need to put the brain
to be able to control your agents from Python, you will need to put the Brain
controlling the Agents to be a **Learning Brain** and drag it into the
Academy's `Broadcast Hub` with the `Control` checkbox checked.

4
docs/ML-Agents-Overview.md


As mentioned previously, the ML-Agents toolkit ships with several
implementations of state-of-the-art algorithms for training intelligent agents.
In this mode, the only brain used is a **Learning Brain**. More
In this mode, the only Brain used is a **Learning Brain**. More
specifically, during training, all the medics in the
scene send their observations to the Python API through the External
Communicator (this is the behavior with an External Brain). The Python API

observations for all its Agents to the Python API when dragged into the
Academy's `Broadcast Hub` with the `Control` checkbox checked. This is helpful
for training and later inference. Broadcasting is a feature which can be
enabled all types of brains (Player, Learning, Heuristic) where the Agent
enabled all types of Brains (Player, Learning, Heuristic) where the Agent
observations and actions are also sent to the Python API (despite the fact
that the Agent is **not** controlled by the Python API). This feature is
leveraged by Imitation Learning, where the observations and actions for a

48
docs/Migrating.md


# Migrating
## Migrating from ML-Agents toolkit v0.5 to v0.6
* Brains are now Scriptable Objects instead of MonoBehaviors. This will
allow you to set Brains into prefabs and use the same brains across
scenes.
* Brains are now Scriptable Objects instead of MonoBehaviors.
* You can no longer modify the type of a Brain. If you want to switch
between `PlayerBrain` and `LearningBrain` for multiple agents,
you will need to assign a new Brain to each agent separately.
__Note:__ You can pass the same Brain to multiple agents in a scene by
leveraging Unity's prefab system or look for all the agents in a scene
using the search bar of the `Hierarchy` window with the word `Agent`.
* Remove the `Brain` GameObjects in the scene
* Remove the `Brain` GameObjects in the scene. (Delete all of the
Brain GameObjects under Academy in the scene.)
ML-Agents`
ML-Agents` for each type of the Brain you plan to use, and put
the created files under a folder called Brains within your project.
in the `Brain` GameObjects
in the `Brain` GameObjects.
appropriate Brain asset in it.
appropriate Brain ScriptableObject in it.
__Note:__ You can pass the same brain to multiple agents in a scene by
leveraging Unity's prefab system or look for all the agents in a scene
using the search bar of the `Hierarchy` window with the word `Agent`.
__Note:__ You will need to delete the previous TensorFlowSharp package
and install the new one to do inference. To correctly delete the previous
TensorFlowSharp package, Delete all of the files under `ML-Agents/Plugins`
folder except the files under `ML-Agents/Plugins/ProtoBuffer`.
* We replaced the **Internal** and **External** Brain with **Learning Brain**.
When you need to train a model, you need to drag it into the `Training Hub`
inside the `Academy` and check the `Control` checkbox.
* We removed the `Broadcast` checkbox of the Brain, to use the broadcast
functionality, you need to drag the Brain into the `Broadcast Hub`.
* When training multiple Brains at the same time, each model is now stored
into a separate model file rather than in the same file under different
graph scopes.
* We have changed the way ML-Agents models perform inference. All previous `.bytes`
files can no longer be used (you will have to retrain them). The models
produced by the training process and the shipped models have now a `.tf`
extension and use TensorflowSharp as a backend for the
[Inference Engine](Inference-Engine.md).
* To use a `.tf` model, drag it inside the `Model` property of the `Learning Brain`
## Migrating from ML-Agents toolkit v0.4 to v0.5

[curriculum learning documentation](Training-Curriculum-Learning.md)
for detailed information. In summary:
* Curriculum files for the same environment must now be placed into a folder.
Each curriculum file should be named after the brain whose curriculum it
Each curriculum file should be named after the Brain whose curriculum it
specifies.
* `min_lesson_length` now specifies the minimum number of episodes in a lesson
and affects reward thresholding.

10
docs/Training-Imitation-Learning.md


6. Launch `mlagent-learn`, and providing `./config/offline_bc_config.yaml` as the config parameter, and your environment as the `--env` parameter.
7. (Optional) Observe training performance using Tensorboard.
This will use the demonstration file to train a nerual network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.
This will use the demonstration file to train a neural network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.
### Online Training

will be the "Student." We will assume that the names of the Brain
`Assets`s are "Teacher" and "Student" respectively.
Assets are "Teacher" and "Student" respectively.
4. The Brain Parameters of both the "Teacher" and "Student" brains must be
4. The Brain Parameters of both the "Teacher" and "Student" Brains must be
5. Drag both the "Teacher" and "Student" brain into the Academy's `Broadcast Hub`
and check the `Control` checkbox on the "Student" brain.
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
and check the `Control` checkbox on the "Student" Brain.
4. Link the Brains to the desired Agents (one Agent as the teacher and at least
one Agent as a student).
5. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set

2
docs/Training-ML-Agents.md


| brain\_to\_imitate | For online imitation learning, the name of the GameObject containing the Brain component to imitate. | (online)BC |
| demo_path | For offline imitation learning, the file path of the recorded demonstration file | (offline)BC |
| buffer_size | The number of experiences to collect before updating the policy model. | PPO |
| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curiosity module. | PPO |
| curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
| epsilon | Influences how rapidly the policy can evolve during training. | PPO |
| gamma | The reward discount rate for the Generalized Advantage Estimator (GAE). | PPO |

4
docs/Training-PPO.md


The below hyperparameters are only used when `use_curiosity` is set to true.
### Curioisty Encoding Size
### Curiosity Encoding Size
`curiosity_enc_size` corresponds to the size of the hidden layer used to encode
the observations within the intrinsic curiosity module. This value should be

`curiosity_strength` corresponds to the magnitude of the intrinsic reward
generated by the intrinsic curiosity module. This should be scaled in order to
ensure it is large enough to not be overwhelmed by extrnisic reward signals in
ensure it is large enough to not be overwhelmed by extrinsic reward signals in
the environment. Likewise it should not be too large to overwhelm the extrinsic
reward signal.

4
docs/Training-on-Microsoft-Azure.md


## Pre-Configured Azure Virtual Machine
A pre-configured virtual machine image is available in the Azure Marketplace and
is nearly compltely ready for training. You can start by deploying the
is nearly completely ready for training. You can start by deploying the
[Data Science Virtual Machine for Linux (Ubuntu)](https://azuremarketplace.microsoft.com/marketplace/apps/microsoft-ads.linux-data-science-vm-ubuntu)
into your Azure subscription. Once your VM is deployed, SSH into it and run the
following command to complete dependency installation:

```
Where `<your_app>` is the path to your app (i.e.
`~/unity-volume/3DBallHeadless`) and `<run_id>` is an identifer you would like
`~/unity-volume/3DBallHeadless`) and `<run_id>` is an identifier you would like
to identify your training run with.
If you've selected to run on a N-Series VM with GPU support, you can verify that

182
docs/images/demo_component.png

之前 之后
宽度: 756  |  高度: 160  |  大小: 28 KiB

403
docs/images/demo_inspector.png

之前 之后
宽度: 786  |  高度: 540  |  大小: 63 KiB
正在加载...
取消
保存