浏览代码

Documentation 0.5 Release Check List (Part 1) (#1154)

/develop-generalizationTraining-TrainerController
Arthur Juliani 6 年前
当前提交
2cd8e250
共有 48 个文件被更改,包括 880 次插入822 次删除
  1. 5
      CODE_OF_CONDUCT.md
  2. 60
      CONTRIBUTING.md
  3. 94
      README.md
  4. 2
      docs/API-Reference.md
  5. 2
      docs/Background-TensorFlow.md
  6. 32
      docs/Basic-Guide.md
  7. 18
      docs/FAQ.md
  8. 2
      docs/Feature-Memory.md
  9. 2
      docs/Feature-Monitor.md
  10. 94
      docs/Getting-Started-with-Balance-Ball.md
  11. 4
      docs/Glossary.md
  12. 4
      docs/Installation-Windows.md
  13. 29
      docs/Installation.md
  14. 4
      docs/Learning-Environment-Best-Practices.md
  15. 77
      docs/Learning-Environment-Create-New.md
  16. 6
      docs/Learning-Environment-Design-Academy.md
  17. 147
      docs/Learning-Environment-Design-Agents.md
  18. 42
      docs/Learning-Environment-Design-Brains.md
  19. 26
      docs/Learning-Environment-Design-External-Internal-Brains.md
  20. 14
      docs/Learning-Environment-Design-Heuristic-Brains.md
  21. 24
      docs/Learning-Environment-Design-Player-Brains.md
  22. 82
      docs/Learning-Environment-Design.md
  23. 72
      docs/Learning-Environment-Examples.md
  24. 19
      docs/Learning-Environment-Executable.md
  25. 4
      docs/Limitations.md
  26. 24
      docs/ML-Agents-Overview.md
  27. 19
      docs/Migrating.md
  28. 4
      docs/Readme.md
  29. 8
      docs/Training-Curriculum-Learning.md
  30. 24
      docs/Training-Imitation-Learning.md
  31. 13
      docs/Training-ML-Agents.md
  32. 2
      docs/Training-PPO.md
  33. 2
      docs/Training-on-Amazon-Web-Service.md
  34. 15
      docs/Training-on-Microsoft-Azure.md
  35. 8
      docs/Using-TensorFlow-Sharp-in-Unity.md
  36. 8
      docs/Using-Tensorboard.md
  37. 26
      docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
  38. 4
      docs/localized/zh-CN/docs/Installation.md
  39. 4
      docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
  40. 14
      docs/localized/zh-CN/docs/Learning-Environment-Design.md
  41. 42
      docs/localized/zh-CN/docs/Learning-Environment-Examples.md
  42. 2
      docs/localized/zh-CN/docs/ML-Agents-Overview.md
  43. 183
      ml-agents/README.md
  44. 149
      docs/Python-API.md
  45. 158
      gym-unity/README.md
  46. 1
      MLAgentsSDK/README.md
  47. 127
      gym-unity/Readme.md

5
CODE_OF_CONDUCT.md


## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct/
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 1.4, available at
https://www.contributor-covenant.org/version/1/4/code-of-conduct/
[homepage]: https://www.contributor-covenant.org

60
CONTRIBUTING.md


# Contribution Guidelines
Thank you for your interest in contributing to the ML-Agents toolkit! We are incredibly
excited to see how members of our community will use and extend the ML-Agents toolkit.
To facilitate your contributions, we've outlined a brief set of guidelines
to ensure that your extensions can be easily integrated.
Thank you for your interest in contributing to the ML-Agents toolkit! We are
incredibly excited to see how members of our community will use and extend the
ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set
of guidelines to ensure that your extensions can be easily integrated.
### Communication
## Communication
First, please read through our [code of conduct](CODE_OF_CONDUCT.md),
as we expect all our contributors to follow it.
First, please read through our [code of conduct](CODE_OF_CONDUCT.md), as we
expect all our contributors to follow it.
Second, before starting on a project that you intend to contribute
to the ML-Agents toolkit (whether environments or modifications to the codebase),
we **strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) and
briefly outlining the changes you plan to make. This will enable us to provide
some context that may be helpful for you. This could range from advice and
feedback on how to optimally perform your changes or reasons for not doing it.
Second, before starting on a project that you intend to contribute to the
ML-Agents toolkit (whether environments or modifications to the codebase), we
**strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues)
and briefly outlining the changes you plan to make. This will enable us to
provide some context that may be helpful for you. This could range from advice
and feedback on how to optimally perform your changes or reasons for not doing
it.
### Git Branches
## Git Branches
Starting with v0.3, we adopted the
Starting with v0.3, we adopted the
Consequently, the `master` branch corresponds to the latest release of
Consequently, the `master` branch corresponds to the latest release of
* Corresponding changes to documentation, unit tests and sample environments
(if applicable)
* Corresponding changes to documentation, unit tests and sample environments (if
applicable)
### Environments
## Environments
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
PR explaining the nature of the environment and task.
PR explaining the nature of the environment and task.
### Style Guide
## Style Guide
When performing changes to the codebase, ensure that you follow the style
guide of the file you're modifying. For Python, we follow
[PEP 8](https://www.python.org/dev/peps/pep-0008/). For C#, we will soon be
adding a formal style guide for our repository.
When performing changes to the codebase, ensure that you follow the style guide
of the file you're modifying. For Python, we follow
[PEP 8](https://www.python.org/dev/peps/pep-0008/).
For C#, we will soon be adding a formal style guide for our repository.

94
README.md


# Unity ML-Agents Toolkit (Beta)
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source Unity plugin
that enables games and simulations to serve as environments for training
intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through
a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers
and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
These trained agents can be used for multiple purposes, including
controlling NPC behavior (in a variety of settings such as multi-agent and
adversarial), automated testing of game builds and evaluating different game
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
developers and AI researchers as it provides a central platform where advances
in AI can be evaluated on Unity’s rich environments and then made accessible
to the wider research and game developer communities.
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source
Unity plugin that enables games and simulations to serve as environments for
training intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through a
simple-to-use Python API. We also provide implementations (based on TensorFlow)
of state-of-the-art algorithms to enable game developers and hobbyists to easily
train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be
used for multiple purposes, including controlling NPC behavior (in a variety of
settings such as multi-agent and adversarial), automated testing of game builds
and evaluating different game design decisions pre-release. The ML-Agents
toolkit is mutually beneficial for both game developers and AI researchers as it
provides a central platform where advances in AI can be evaluated on Unity’s
rich environments and then made accessible to the wider research and game
developer communities.
* Train memory-enhanced Agents using deep reinforcement learning
* Train memory-enhanced agents using deep reinforcement learning
* Broadcasting of Agent behavior for supervised learning
* Broadcasting of agent behavior for supervised learning
* Flexible Agent control with On Demand Decision Making
* Flexible agent control with On Demand Decision Making
* For more information, in addition to installation and usage
instructions, see our [documentation home](docs/Readme.md).
* If you have
used a version of the ML-Agents toolkit prior to v0.4, we strongly recommend
our [guide on migrating from earlier versions](docs/Migrating.md).
* For more information, in addition to installation and usage instructions, see
our [documentation home](docs/Readme.md).
* If you have used a version of the ML-Agents toolkit prior to v0.4, we strongly
recommend our [guide on migrating from earlier versions](docs/Migrating.md).
- Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
- [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
- [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/) announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
- [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
* Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and
[Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
* [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
* [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/)
announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
* [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
- [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
- [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
- [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
* [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
* [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
* [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
The ML-Agents toolkit is an open-source project and we encourage and welcome contributions.
If you wish to contribute, be sure to review our
[contribution guidelines](CONTRIBUTING.md) and
The ML-Agents toolkit is an open-source project and we encourage and welcome
contributions. If you wish to contribute, be sure to review our
[contribution guidelines](CONTRIBUTING.md) and
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using the ML-Agents toolkit and Unity developers enthusiastic
about machine learning. We use that channel to surface updates
regarding the ML-Agents toolkit (and, more broadly, machine learning in games).
* If you run into any problems using the ML-Agents toolkit,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using the ML-Agents toolkit and Unity developers
enthusiastic about machine learning. We use that channel to surface updates
regarding the ML-Agents toolkit (and, more broadly, machine learning in
games).
* If you run into any problems using the ML-Agents toolkit,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
For any other questions or feedback, connect directly with the ML-Agents
team at ml-agents@unity3d.com.

translating more pages and to other languages. Consequently,
we welcome any enhancements and improvements from the community.
- [Chinese](docs/localized/zh-CN/)
* [Chinese](docs/localized/zh-CN/)
## License

2
docs/API-Reference.md


# API Reference
Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
documented to be compatabile with
documented to be compatible with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
documentation.

2
docs/Background-TensorFlow.md


performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
train the behavior of an Agent, the output is a TensorFlow model (.bytes) file
train the behavior of an agent, the output is a TensorFlow model (.bytes) file
that you can then embed within an Internal Brain. Unless you implement a new
algorithm, the use of TensorFlow is mostly abstracted away and behind the
scenes.

32
docs/Basic-Guide.md


# Basic Guide
This guide will show you how to use a pretrained model in an example Unity
This guide will show you how to use a pre-trained model in an example Unity
environment, and show you how to train the model yourself.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we

In order to use the ML-Agents toolkit within Unity, you need to change some
Unity settings first. Also [TensorFlowSharp
plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
is needed for you to use pretrained model within Unity, which is based on the
is needed for you to use pre-trained model within Unity, which is based on the
[TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
1. Launch Unity

`None` if you want to interact with the current scene in the Unity Editor.
More information and documentation is provided in the
[Python API](../ml-agents/README.md) page.
[Python API](Python-API.md) page.
## Training the Brain with Reinforcement Learning

the brain used by the agents to **External**. This allows the agents to
the Brain used by the Agents to **External**. This allows the Agents to
communicate with the external training process when making their decisions.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy

### Training the environment
1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
**Note**: If you followed the default [installation](Installation.md), then
you should be able to run `mlagents-learn` from any directory.
Where:
where:
trainer configuration. The defaults used by environments in the ML-Agents
SDK can be found in `config/trainer_config.yaml`.
trainer configuration. The defaults used by example environments included
in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
- And the `--train` tells `mlagents-learn` to run a training session (rather
- `--train` tells `mlagents-learn` to run a training session (rather
4. When the message _"Start training by pressing the Play button in the Unity
4. If you cloned the ML-Agents repo, then you can simply run
```sh
mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
```
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.

'--train': True,
'--worker-id': '0',
'<trainer-config-path>': 'config/trainer_config.yaml'}
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
```
**Note**: If you're using Anaconda, don't forget to activate the ml-agents

like this:
```console
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
INFO:mlagents.envs:
'Ball3DAcademy' started successfully!
Unity Academy name: Ball3DAcademy

`models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where
`<academy_name>` is the name of the Academy GameObject in the current scene.
This file corresponds to your model's latest checkpoint. You can now embed this
trained model into your internal brain by following the steps below, which is
trained model into your Internal Brain by following the steps below, which is
similar to the steps described
[above](#play-an-example-environment-using-pretrained-model).

page.
- For a more detailed walk-through of our 3D Balance Ball environment, check out
the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
- For a "Hello World" introduction to creating your own learning environment,
- For a "Hello World" introduction to creating your own Learning Environment,
check out the [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.
- For a series of Youtube video tutorials, checkout the

18
docs/FAQ.md


## TensorFlowSharp flag not turned on
If you have already imported the TensorFlowSharp plugin, but havn't set
If you have already imported the TensorFlowSharp plugin, but haven't set
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
You need to install and enable the TensorFlowSharp plugin in order to use the Internal Brain.
```
This error message occurs because the TensorFlowSharp plugin won't be usage

## Tensorflow epsilon placeholder error
## TensorFlow epsilon placeholder error
If you have a graph placeholder set in the internal Brain inspector that is not
If you have a graph placeholder set in the Internal Brain inspector that is not
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
UnityAgentsException: One of the TensorFlow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
Similarly, if you have a graph scope set in the internal Brain inspector that is
Similarly, if you have a graph scope set in the Internal Brain inspector that is
not correctly set, you will see some error like this:
```console

Solution: Make sure your Graph Scope field matches the corresponding brain
object name in your Hierachy Inspector when there is multiple brain.
Solution: Make sure your Graph Scope field matches the corresponding Brain
object name in your Hierarchy Inspector when there are multiple Brains.
## Environment Permission Error

## Mean reward : nan
If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the learning environment not
using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for either the Academy or
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts

2
docs/Feature-Memory.md


# Memory-enhanced Agents using Recurrent Neural Networks
# Memory-enhanced agents using Recurrent Neural Networks
## What are memories for

2
docs/Feature-Monitor.md


You can track many different things both related and unrelated to the agents
themselves. By default, the Monitor is only active in the *inference* phase, so
not during training. To change this behaviour, you can activate or deactivate it
not during training. To change this behavior, you can activate or deactivate it
by calling `SetActive(boolean)`. For example to also show the monitor during
training, you can call it in the `InitializeAcademy()` method of your `Academy`:

94
docs/Getting-Started-with-Balance-Ball.md


This tutorial walks through the end-to-end process of opening a ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
agent in it, and finally embedding the trained model into the Unity environment.
Agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help

This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent** that
horizontally or vertically. In this environment, a platform is an **Agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the platforms learn to never drop the ball.

The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
independent agent, but they all share the same Brain. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
### Academy

and **Inference Configuration** properties set the graphics and timescale
properties for the Unity application. The Academy uses the **Training
Configuration** during training and the **Inference Configuration** when not
training. (*Inference* means that the agent is using a trained model or
training. (*Inference* means that the Agent is using a trained model or
heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the **Training
configuration** and a high graphics quality and the timescale to `1.0` for the

* Academy.InitializeAcademy() — Called once when the environment is launched.
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
agent.AgentAction() (and after the Agents collect their observations).
The 3D Balance Ball environment does not use these functions — each agent resets
The 3D Balance Ball environment does not use these functions — each Agent resets
environment around the agents.
environment around the Agents.
the Academy.) All the agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an agent, it just
routes the agent's collected observations to the decision making process and
returns the chosen action to the agent. Thus, all agents can share the same
brain, but act independently. The Brain settings tell you quite a bit about how
an agent works.
the Academy.) All the Agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an Agent, it just
routes the Agent's collected observations to the decision making process and
returns the chosen action to the Agent. Thus, all Agents can share the same
Brain, but act independently. The Brain settings tell you quite a bit about how
an Agent works.
The **Brain Type** determines how an agent makes its decisions. The **External**
The **Brain Type** determines how an Agent makes its decisions. The **External**
agents; use **Internal** when using the trained model. The **Heuristic** brain
allows you to hand-code the agent's logic by extending the Decision class.
Finally, the **Player** brain lets you map keyboard commands to actions, which
Agents; use **Internal** when using the trained model. The **Heuristic** Brain
allows you to hand-code the Agent's logic by extending the Decision class.
Finally, the **Player** Brain lets you map keyboard commands to actions, which
of brains do what you need, you can implement your own CoreBrain to create your
of Brains do what you need, you can implement your own CoreBrain to create your
own type.
In this tutorial, you will set the **Brain Type** to **External** for training;

The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the agent's observations contains eight elements: the `x` and
vector containing the Agent's observations contains eight elements: the `x` and
defined in the agent's `CollectObservations()` function.)
defined in the Agent's `CollectObservations()` function.)
An agent is given instructions from the brain in the form of *actions*.
An Agent is given instructions from the Brain in the form of *actions*.
element of the vector means is defined by the agent logic (the PPO training
element of the vector means is defined by the Agent logic (the PPO training
element might represent a force or torque applied to a `RigidBody` in the agent.
element might represent a force or torque applied to a `Rigidbody` in the Agent.
given to the agent is an array of indeces into tables.
given to the Agent is an array of indices into tables.
The 3D Balance Ball example is programmed to use both types of vector action
space. You can try training with both settings to observe whether there is a

Platform GameObjects. The base Agent object has a few properties that affect its
behavior:
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
* **Visual Observations** — Defines any Camera objects used by the agent to
* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
makes decisions. All the Agents in the 3D Balance Ball scene share the same
Brain.
* **Visual Observations** — Defines any Camera objects used by the Agent to
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an agent starts over when it is finished.
3D Balance Ball sets this true so that the agent restarts after reaching the
* **Max Step** — Defines how many simulation steps can occur before the Agent
decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an Agent starts over when it is finished.
3D Balance Ball sets this true so that the Agent restarts after reaching the
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
Perhaps the more interesting aspect of an agents is the Agent subclass
implementation. When you create an Agent, you must extend the base Agent class.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
* agent.AgentReset() — Called when the Agent resets, including at the beginning
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
* agent.CollectObservations() — Called every simulation step. Responsible for
collecting the Agent's observations of the environment. Since the Brain
instance assigned to the Agent is set to the continuous vector observation
* Agent.AgentAction() — Called every simulation step. Receives the action chosen
by the brain. The Ball3DAgent example handles both the continuous and the
* agent.AgentAction() — Called every simulation step. Receives the action chosen
by the Brain. The Ball3DAgent example handles both the continuous and the
assigns a reward to the agent; in this example, an agent receives a small
assigns a reward to the Agent; in this example, an Agent receives a small
negative reward for dropping the ball. An agent is also marked as done when it
negative reward for dropping the ball. An Agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.

explaining it.
To train the agents within the Ball Balance environment, we will be using the
python package. We have provided a convenient script called `mlagents-learn`
Python package. We have provided a convenient script called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
We can use `run_id` to identify the experiment and create a folder where the

The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the intructions in [Using an
Executable](Learning-Environment-Executable.md).
follow the intructions in
[Using an Executable](Learning-Environment-Executable.md).
### Observing Training Progress

Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type. **Note:** Do not just
use it with Agents having an **Internal** Brain type. **Note:** Do not just
close the Unity Window once the `Saved Model` message appears. Either wait for
the training process to close the window or press Ctrl+C at the command-line
prompt. If you simply close the window manually, the .bytes file containing the

To embed the trained model into Unity, follow the later part of [Training the
Brain with Reinforcement
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
of the Basic Buides page.
of the Basic Guide page.

4
docs/Glossary.md


logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given external
brain. Contains TensorFlow graph which makes decisions for external brain.
* **Trainer** - Python class which is responsible for training a given External
Brain. Contains TensorFlow graph which makes decisions for External Brain.

4
docs/Installation-Windows.md


Next, install `tensorflow`. Install this package using `pip` - which is a
package management system used to install Python packages. Latest versions of
Tensorflow won't work, so you will need to make sure that you install version
TensorFlow won't work, so you will need to make sure that you install version
1.7.1. In the same Anaconda Prompt, type in the following command _(make sure
you are connected to the internet)_:

Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an
Anaconda Prompt with the Conda environment ml-agents activated, type in the
following command to uninstall the tensorflow for cpu and install the tensorflow
following command to uninstall TensorFlow for cpu and install TensorFlow
for gpu _(make sure you are connected to the internet)_:
```sh

29
docs/Installation.md


width="500" border="10" />
</p>
## Clone the Ml-Agents Repository
## Clone the ML-Agents Toolkit Repository
The `UnitySDK` directory in this repository contains the Unity Assets to add
to your projects. The `python` directory contains python packages which provide
trainers, a python API to interface with Unity, and a package to interface with
OpenAI Gym.
The `UnitySDK` subdirectory contains the Unity Assets to add to your projects.
It also contains many [example environments](Learning-Environment-Examples.md)
that can be used to help get you familiar with Unity.
The `ml-agents` subdirectory contains Python packages which provide
trainers and a Python API to interface with Unity.
The `gym-unity` subdirectory contains a package to interface with OpenAI Gym.
## Install Python (with Dependencies)
## Install Python and mlagents Package
In order to use ML-Agents toolkit, you need Python 3.6 along with the
dependencies listed in the [requirements file](../ml-agents/requirements.txt).

### Mac and Unix Users
[Download](https://www.python.org/downloads/) and install Python 3 if you do not
[Download](https://www.python.org/downloads/) and install Python 3.6 if you do not
If your Python environment doesn't include `pip`, see these
If your Python environment doesn't include `pip3`, see these
To install dependencies, enter the `ml-agents/` directory and run from
the command line:
To install the dependencies and `mlagents` Python package, enter the
`ml-agents/` subdirectory and run from the command line:
pip3 install .
pip install .
If you installed this correctly, you should be able to run
`mlagents-learn --help`
## Docker-based Installation

4
docs/Learning-Environment-Best-Practices.md


([learn more here](Training-Curriculum-Learning.md)).
* When possible, it is often helpful to ensure that you can complete the task by
using a Player Brain to control the agent.
* It is often helpful to make many copies of the agent, and attach the brain to
be trained to all of these agents. In this way the brain can get more feedback
* It is often helpful to make many copies of the agent, and attach the Brain to
be trained to all of these agents. In this way the Brain can get more feedback
information from all of these agents, which helps it train faster.
## Rewards

77
docs/Learning-Environment-Create-New.md


This tutorial walks through the process of creating a Unity Environment. A Unity
Environment is an application built using the Unity Engine which can be used to
train Reinforcement Learning agents.
train Reinforcement Learning Agents.
![A simple ML-Agents environment](images/mlagents-NewTutSplash.png)

methods to update the scene independently of any agents. For example, you can
add, move, or delete agents and other entities in the environment.
3. Add one or more Brain objects to the scene as children of the Academy.
4. Implement your Agent subclasses. An Agent subclass defines the code an agent
4. Implement your Agent subclasses. An Agent subclass defines the code an Agent
optional methods to reset the agent when it has finished or failed its task.
optional methods to reset the Agent when it has finished or failed its task.
in the scene that represents the agent in the simulation. Each Agent object
in the scene that represents the Agent in the simulation. Each Agent object
must be assigned a Brain object.
6. If training, set the Brain type to External and
[run the training process](Training-ML-Agents.md).

Next, we will create a very simple scene to act as our ML-Agents environment.
The "physical" components of the environment include a Plane to act as the floor
for the agent to move around on, a Cube to act as the goal or target for the
agent to seek, and a Sphere to represent the agent itself.
for the Agent to move around on, a Cube to act as the goal or target for the
agent to seek, and a Sphere to represent the Agent itself.
### Create the floor plane

leave it alone for now.
So far, these are the basic steps that you would use to add ML-Agents to any
Unity project. Next, we will add the logic that will let our agent learn to roll
Unity project. Next, we will add the logic that will let our Agent learn to roll
to the cube using reinforcement learning.
In this simple scenario, we don't use the Academy object to control the

### Initialization and Resetting the Agent
When the agent reaches its target, it marks itself done and its agent reset
function moves the target to a random location. In addition, if the agent rolls
When the Agent reaches its target, it marks itself done and its Agent reset
function moves the target to a random location. In addition, if the Agent rolls
off the platform, the reset function puts it back onto the floor.
To move the target GameObject, we need a reference to its Transform (which

allowing you to choose which GameObject to use as the target in the Unity
Editor. To reset the agent's velocity (and later to apply force to move the
Editor. To reset the Agent's velocity (and later to apply force to move the
agent) we need a reference to the Rigidbody component. A
[Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) is Unity's
primary element for physics simulation. (See

{
if (this.transform.position.y < -1.0)
{
// The agent fell
// The Agent fell
this.transform.position = Vector3.zero;
this.rBody.angularVelocity = Vector3.zero;
this.rBody.velocity = Vector3.zero;

### Observing the Environment
The Agent sends the information we collect to the Brain, which uses it to make a
decision. When you train the agent (or use a trained model), the data is fed
into a neural network as a feature vector. For an agent to successfully learn a
decision. When you train the Agent (or use a trained model), the data is fed
into a neural network as a feature vector. For an Agent to successfully learn a
In our case, the information our agent collects includes:
In our case, the information our Agent collects includes:
training. Note that the agent only collects the x and z coordinates since the
training. Note that the Agent only collects the x and z coordinates since the
floor is aligned with the x-z plane and the y component of the target's
position never changes.

AddVectorObs(relativePosition.z / 5);
```
* Position of the agent itself within the confines of the floor. This data is
collected as the agent's distance from each edge of the floor.
* Position of the Agent itself within the confines of the floor. This data is
collected as the Agent's distance from each edge of the floor.
```csharp
// Distance to edges of platform

AddVectorObs((this.transform.position.z - 5) / 5);
```
* The velocity of the agent. This helps the agent learn to control its speed so
* The velocity of the Agent. This helps the Agent learn to control its speed so
it doesn't overshoot the target and roll off the platform.
```csharp

`AgentAction()` function. The number of elements in this array is determined by
the `Vector Action Space Type` and `Vector Action Space Size` settings of the
agent's Brain. The RollerAgent uses the continuous vector action space and needs
two continuous control signals from the brain. Thus, we will set the Brain
two continuous control signals from the Brain. Thus, we will set the Brain
axis. (If we allowed the agent to move in three dimensions, then we would need
axis. (If we allowed the Agent to move in three dimensions, then we would need
to set `Vector Action Size` to 3. Each of these values returned by the network
are between `-1` and `1.` Note the Brain really has no idea what the values in
the action array mean. The training process just adjusts the action values in

### Rewards
Reinforcement learning requires rewards. Assign rewards in the `AgentAction()`
function. The learning algorithm uses the rewards assigned to the agent at each
function. The learning algorithm uses the rewards assigned to the Agent at each
the agent the optimal actions. You want to reward an agent for completing the
assigned task (reaching the Target cube, in this case) and punish the agent if
the Agent the optimal actions. You want to reward an Agent for completing the
assigned task (reaching the Target cube, in this case) and punish the Agent if
training with sub-rewards that encourage behavior that helps the agent complete
training with sub-rewards that encourage behavior that helps the Agent complete
the agent moves closer to the target in a step and a small negative reward at
each step which encourages the agent to complete its task quickly.
the Agent moves closer to the target in a step and a small negative reward at
each step which encourages the Agent to complete its task quickly.
agent as finished by setting the agent to done.
agent as finished by setting the Agent to done.
```csharp
float distanceToTarget = Vector3.Distance(this.transform.position,

}
```
**Note:** When you mark an agent as done, it stops its activity until it is
reset. You can have the agent reset immediately, by setting the
**Note:** When you mark an Agent as done, it stops its activity until it is
reset. You can have the Agent reset immediately, by setting the
It can also encourage an agent to finish a task more quickly to assign a
It can also encourage an Agent to finish a task more quickly to assign a
negative reward at each step:
```csharp

Finally, to punish the agent for falling off the platform, assign a large
negative reward and, of course, set the agent to done so that it resets itself
Finally, to punish the Agent for falling off the platform, assign a large
negative reward and, of course, set the Agent to done so that it resets itself
in the next step:
```csharp

Now, that all the GameObjects and ML-Agent components are in place, it is time
to connect everything together in the Unity Editor. This involves assigning the
Brain object to the Agent, changing some of the Agent Components properties, and
setting the Brain properties so that they are compatible with our agent code.
setting the Brain properties so that they are compatible with our Agent code.
1. Expand the Academy GameObject in the Hierarchy window, so that the Brain
object is visible.

It is always a good idea to test your environment manually before embarking on
an extended training run. The reason we have left the Brain set to the
**Player** type is so that we can control the agent using direct keyboard
**Player** type is so that we can control the Agent using direct keyboard
control. But first, you need to define the keyboard to action mapping. Although
the RollerAgent only has an `Action Size` of two, we will use one key to specify
positive values and one to specify negative values for each action, for a total

`AgentAction()` function. **Value** is assigned to action[Index] when **Key** is
pressed.
Press **Play** to run the scene and use the WASD keys to move the agent around
Press **Play** to run the scene and use the WASD keys to move the Agent around
Console window and that the agent resets when it reaches its target or falls
Console window and that the Agent resets when it reaches its target or falls
includes a convenient Monitor class that you can use to easily display agent
includes a convenient Monitor class that you can use to easily display Agent
status information in the Game window.
One additional test you can perform is to first ensure that your environment and

Keep in mind:
* There can only be one Academy game object in a scene.
* You can have multiple Brain game objects but they must be child of the Academy game object.
* You can have multiple Brain game objects but they must be child of the Academy
game object.
Here is an example of what your scene hierarchy should look like:

6
docs/Learning-Environment-Design-Academy.md


# Creating an Academy
An Academy orchestrates all the Agent and Brain objects in a Unity scene. Every
scene containing agents must contain a single Academy. To use an Academy, you
scene containing Agents must contain a single Academy. To use an Academy, you
must create your own subclass. However, all the methods you can override are
optional.

## Resetting an Environment
Implement an `AcademyReset()` function to alter the environment at the start of
each episode. For example, you might want to reset an agent to its starting
each episode. For example, you might want to reset an Agent to its starting
position or move a goal to a random position. An environment resets when the
Academy `Max Steps` count is reached.

## Controlling an Environment
The `AcademyStep()` function is called at every step in the simulation before
any agents are updated. Use this function to update objects in the environment
any Agents are updated. Use this function to update objects in the environment
at every step or during the episode between environment resets. For example, if
you want to add elements to the environment at random intervals, you can put the
logic for creating them in the `AcademyStep()` function.

147
docs/Learning-Environment-Design-Agents.md


# Agents
An agent is an actor that can observe its environment and decide on the best
course of action using those observations. Create agents in Unity by extending
course of action using those observations. Create Agents in Unity by extending
successfully learn are the observations the agent collects and, for
reinforcement learning, the reward you assign to estimate the value of the
successfully learn are the observations the agent collects for
reinforcement learning and the reward you assign to estimate the value of the
An agent passes its observations to its brain. The brain, then, makes a decision
An Agent passes its observations to its Brain. The Brain, then, makes a decision
and passes the chosen action back to the agent. Your agent code must execute the
action, for example, move the agent in one direction or another. In order to
[train an agent using reinforcement learning](Learning-Environment-Design.md),

The Brain class abstracts out the decision making logic from the agent itself so
that you can use the same brain in multiple agents. How a brain makes its
decisions depends on the type of brain it is. An **External** brain simply
passes the observations from its agents to an external process and then passes
the decisions made externally back to the agents. An **Internal** brain uses the
The Brain class abstracts out the decision making logic from the Agent itself so
that you can use the same Brain in multiple Agents. How a Brain makes its
decisions depends on the type of Brain it is. An **External** Brain simply
passes the observations from its Agents to an external process and then passes
the decisions made externally back to the Agents. An **Internal** Brain uses the
parameters in search of a better decision). The other types of brains do not
parameters in search of a better decision). The other types of Brains do not
directly involve training, but you might find them useful as part of a training
project. See [Brains](Learning-Environment-Design-Brains.md).

of simulation steps (the frequency defaults to once-per-step). You can also set
up an agent to request decisions on demand. Making decisions at regular step
up an Agent to request decisions on demand. Making decisions at regular step
decisions on demand is generally appropriate for situations where agents only
decisions on demand is generally appropriate for situations where Agents only
respond to specific events or take actions of variable duration. For example, an
agent in a robotic simulator that must provide fine-control of joint torques
should make its decisions every step of the simulation. On the other hand, an

To control the frequency of step-based decision making, set the **Decision
Frequency** value for the Agent object in the Unity Inspector window. Agents
using the same Brain instance can use a different frequency. During simulation
steps in which no decision is requested, the agent receives the same action
steps in which no decision is requested, the Agent receives the same action
On demand decision making allows agents to request decisions from their brains
On demand decision making allows Agents to request decisions from their Brains
only when needed instead of receiving decisions at a fixed frequency. This is
useful when the agents commit to an action for a variable number of steps or
when the agents cannot make decisions at the same time. This typically the case

When you turn on **On Demand Decisions** for an agent, your agent code must call
When you turn on **On Demand Decisions** for an Agent, your agent code must call
of the observation-decision-action-reward cycle. The Brain invokes the agent's
of the observation-decision-action-reward cycle. The Brain invokes the Agent's
`AgentAction()` method. The Brain waits for the agent to request the next
`AgentAction()` method. The Brain waits for the Agent to request the next
decision before starting another iteration.
## Observations

point numbers.
* **Visual Observations** — one or more camera images.
When you use vector observations for an agent, implement the
When you use vector observations for an Agent, implement the
to implement the `CollectObservations()` method when your agent uses visual
to implement the `CollectObservations()` method when your Agent uses visual
observations (unless it also uses vector observations).
### Vector Observation Space: Feature Vectors

class calls the `CollectObservations()` method of each of its agents. Your
class calls the `CollectObservations()` method of each of its Agents. Your
The observation must include all the information an agent needs to accomplish
The observation must include all the information an agents needs to accomplish
its task. Without sufficient and relevant information, an agent may learn poorly
or may not learn at all. A reasonable approach for determining what information
should be included is to consider what you would need to calculate an analytical

an agent's observations to a fixed subset. For example, instead of observing
every enemy agent in an environment, you could only observe the closest five.
When you set up an Agent's brain in the Unity Editor, set the following
When you set up an Agent's Brain in the Unity Editor, set the following
properties to use a continuous vector observation:
* **Space Size** — The state size must match the length of your feature vector.

### Multiple Visual Observations
Camera observations use rendered textures from one or more cameras in a scene.
The brain vectorizes the textures into a 3D Tensor which can be fed into a
The Brain vectorizes the textures into a 3D Tensor which can be fed into a
convolutional neural network (CNN). For more information on CNNs, see [this
guide](http://cs231n.github.io/convolutional-networks/). You can use camera
observations along side vector observations.

also typically less efficient and slower to train, and sometimes don't succeed
at all.
To add a visual observation to an agent, click on the `Add Camera` button in the
To add a visual observation to an Agent, click on the `Add Camera` button in the
can have more than one camera attached to an agent.
can have more than one camera attached to an Agent.
specify the number of Cameras the agent is using for its visual observations.
specify the number of Cameras the Agent is using for its visual observations.
For each visual observation, set the width and height of the image (in pixels)
and whether or not the observation is color or grayscale (when `Black And White`
is checked).

An action is an instruction from the brain that the agent carries out. The
action is passed to the agent as a parameter when the Academy invokes the
An action is an instruction from the Brain that the agent carries out. The
action is passed to the Agent as a parameter when the Academy invokes the
is **Continuous**, the action parameter passed to the agent is an array of
is **Continuous**, the action parameter passed to the Agent is an array of
control signals with length equal to the `Vector Action Space Size` property.
When you specify a **Discrete** vector action space type, the action parameter
is an array containing integers. Each integer is an index into a list or table

corresponds to an action table, you can specify the size of each table by
modifying the `Branches` property. Set the `Vector Action Space Size` and
`Vector Action Space Type` properties on the Brain object assigned to the agent
`Vector Action Space Type` properties on the Brain object assigned to the Agent
many training episodes. Thus, the only place actions are defined for an agent is
many training episodes. Thus, the only place actions are defined for an Agent is
in the `AgentAction()` function. You simply specify the type of vector action
space, and, for the continuous vector action space, the number of values, and
then apply the received values appropriately (and consistently) in

either continuous or the discrete vector actions. In the continuous case, you
would set the vector action size to two (one for each dimension), and the
agent's brain would create an action with two floating point values. In the
agent's Brain would create an action with two floating point values. In the
direction), and the brain would create an action array containing a single
direction), and the Brain would create an action array containing a single
movement), and the brain would create an action array containing two elements
movement), and the Brain would create an action array containing two elements
test your action logic using a **Player** brain, which lets you map keyboard
test your action logic using a **Player** Brain, which lets you map keyboard
commands to actions. See [Brains](Learning-Environment-Design-Brains.md).
The [3DBall](Learning-Environment-Examples.md#3dball-3d-balance-ball) and

### Continuous Action Space
When an agent uses a brain set to the **Continuous** vector action space, the
action parameter passed to the agent's `AgentAction()` function is an array with
When an Agent uses a Brain set to the **Continuous** vector action space, the
action parameter passed to the Agent's `AgentAction()` function is an array with
them. If you assign an element in the array as the speed of an agent, for
example, the training process learns to control the speed of the agent though
them. If you assign an element in the array as the speed of an Agent, for
example, the training process learns to control the speed of the Agent though
this parameter.
The [Reacher example](Learning-Environment-Examples.md#reacher) defines a

### Discrete Action Space
When an agent uses a brain set to the **Discrete** vector action space, the
action parameter passed to the agent's `AgentAction()` function is an array
When an Agent uses a Brain set to the **Discrete** vector action space, the
action parameter passed to the Agent's `AgentAction()` function is an array
For example, if we wanted an agent that can move in an plane and jump, we could
For example, if we wanted an Agent that can move in an plane and jump, we could
agent be able to move __and__ jump concurently. We define the first branch to
agent be able to move __and__ jump concurrently. We define the first branch to
have 5 possible actions (don't move, go left, go right, go backward, go forward)
and the second one to have 2 possible actions (don't jump, jump). The
AgentAction method would look something like:

// Look up the index in the jump action list:
if (jump == 1 && IsGrounded()) { directionY = 1; }
// Apply the action results to move the agent
// Apply the action results to move the Agent
gameObject.GetComponent<Rigidbody>().AddForce(
new Vector3(
directionX * 40f, directionY * 300f, directionZ * 40f));

#### Masking Discrete Actions
When using Discrete Actions, it is possible to specify that some actions are
impossible for the next decision. Then the agent is controlled by an External or
Internal Brain, the agent will be unable to perform the specified action. Note
that when the agent is controlled by a Player or Heuristic Brain, the agent will
impossible for the next decision. Then the Agent is controlled by an External or
Internal Brain, the Agent will be unable to perform the specified action. Note
that when the Agent is controlled by a Player or Heuristic Brain, the Agent will
still be able to decide to perform the masked action. In order to mask an
action, call the method `SetActionMask` within the `CollectObservation` method :

* `branch` is the index (starting at 0) of the branch on which you want to mask
the action
* `actionIndices` is a list of `int` or a single `int` corresponding to the
index of theaction that the agent cannot perform.
index of the action that the Agent cannot perform.
For example, if you have an agent with 2 branches and on the first branch
For example, if you have an Agent with 2 branches and on the first branch
and _"change weapon"_. Then with the code bellow, the agent will either _"do
and _"change weapon"_. Then with the code bellow, the Agent will either _"do
nothing"_ or _"change weapon"_ for his next decision (since action index 1 and 2
are masked)

reward over time. The better your reward mechanism, the better your agent will
learn.
**Note:** Rewards are not used during inference by a brain using an already
**Note:** Rewards are not used during inference by a Brain using an already
to display the cumulative reward received by an agent. You can even use a Player
brain to control the agent while watching how it accumulates rewards.
to display the cumulative reward received by an Agent. You can even use a Player
Brain to control the Agent while watching how it accumulates rewards.
Allocate rewards to an agent by calling the `AddReward()` method in the
Allocate rewards to an Agent by calling the `AddReward()` method in the
`AgentAction()` function. The reward assigned in any step should be in the range
[-1,1]. Values outside this range can lead to unstable training. The `reward`
value is reset to zero at every step.

SetReward(0.1f);
}
// When ball falls mark agent as done and give a negative penalty
// When ball falls mark Agent as done and give a negative penalty
if ((ball.transform.position.y - gameObject.transform.position.y) < -2f ||
Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||
Mathf.Abs(ball.transform.position.z - gameObject.transform.position.z) > 3f)

![Agent Inspector](images/agent.png)
* `Brain` - The brain to register this agent to. Can be dragged into the
* `Brain` - The Brain to register this Agent to. Can be dragged into the
reached, the agent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the agent's `AgentReset()` function should be called
when the agent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the agent requests decisions at a fixed step
reached, the Agent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the Agent's `AgentReset()` function should be called
when the Agent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the Agent requests decisions at a fixed step
interval or explicitly requests decisions by calling `RequestDecision()`.
* If not checked, the Agent will request a new decision every `Decision
Frequency` steps and perform an action every step. In the example above,

* `RequestAction()` Signals that the Agent is requesting an action. The
action provided to the Agent in this case is the same action that was
provided the last time it requested a decision.
* `Decision Frequency` - The number of steps between decision requests. Not used if `On Demand Decision`, is true.
* `Decision Frequency` - The number of steps between decision requests. Not used
if `On Demand Decision`, is true.
Unity environment. While this was built for monitoring an Agent's value function
Unity environment. While this was built for monitoring an agent's value function
throughout the training process, we imagine it can be more broadly useful. You
can learn more [here](Feature-Monitor.md).

`GameObject.Instantiate()` function. It is typically easiest to instantiate an
agent from a [Prefab](https://docs.unity3d.com/Manual/Prefabs.html) (otherwise,
you have to instantiate every GameObject and Component that make up your agent
you have to instantiate every GameObject and Component that make up your Agent
following function creates a new agent given a Prefab, Brain instance, location,
following function creates a new Agent given a Prefab, Brain instance, location,
private void CreateAgent(GameObject agentPrefab, Brain brain, Vector3 position, Quaternion orientation)
private void CreateAgent(GameObject AgentPrefab, Brain brain, Vector3 position, Quaternion orientation)
GameObject agentObj = Instantiate(agentPrefab, position, orientation);
Agent agent = agentObj.GetComponent<Agent>();
agent.GiveBrain(brain);
agent.AgentReset();
GameObject AgentObj = Instantiate(agentPrefab, position, orientation);
Agent Agent = AgentObj.GetComponent<Agent>();
Agent.GiveBrain(brain);
Agent.AgentReset();
}
```

the next step in the simulation) so that the Brain knows that this agent is no
longer active. Thus, the best place to destroy an agent is in the
the next step in the simulation) so that the Brain knows that this Agent is no
longer active. Thus, the best place to destroy an Agent is in the
`Agent.AgentOnDone()` function:
```csharp

}
```
Note that in order for `AgentOnDone()` to be called, the agent's `ResetOnDone`
property must be false. You can set `ResetOnDone` on the agent's Inspector or in
Note that in order for `AgentOnDone()` to be called, the Agent's `ResetOnDone`
property must be false. You can set `ResetOnDone` on the Agent's Inspector or in
code.

42
docs/Learning-Environment-Design-Brains.md


* [External](Learning-Environment-Design-External-Internal-Brains.md) — The
**External** and **Internal** types typically work together; set **External**
when training your agents. You can also use the **External** brain to
when training your Agents. You can also use the **External** Brain to
**Heuristic** to hand-code the agent's logic by extending the Decision class.
**Heuristic** to hand-code the Agent's logic by extending the Decision class.
keyboard keys to agent actions, which can be useful to test your agent code.
keyboard keys to Agent actions, which can be useful to test your Agent code.
During training, set your agent's brain type to **External**. To use the trained
model, import the model file into the Unity project and change the brain type to
During training, set your Agent's Brain type to **External**. To use the trained
model, import the model file into the Unity project and change the Brain type to
Inspector window. These properties must be appropriate for the agents using the
brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an agent exactly. See
Inspector window. These properties must be appropriate for the Agents using the
Brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an Agent exactly. See
[Agents](Learning-Environment-Design-Agents.md) for information about creating
agents and setting up a Brain instance correctly.

* `Brain Parameters` - Define vector observations, visual observation, and
vector actions for the Brain.
* `Vector Observation`
* `Space Size` - Length of vector observation for brain.
* `Space Size` - Length of vector observation for Brain.
effective size of the vector observation being passed to the brain being:
effective size of the vector observation being passed to the Brain being:
_Space Size_ x _Stacked Vectors_.
* `Visual Observations` - Describes height, width, and whether to grayscale
visual observations for the Brain.

* `Space Size` (Continuous) - Length of action vector for brain.
* `Branches` (Discrete) - An array of integers, defines multiple concurent
* `Space Size` (Continuous) - Length of action vector for Brain.
* `Branches` (Discrete) - An array of integers, defines multiple concurrent
discrete actions. The values in the `Branches` array correspond to the
number of possible discrete values for each action branch.
* `Action Descriptions` - A list of strings used to name the available

## Using the Broadcast Feature
The Player, Heuristic and Internal brains have been updated to support
broadcast. The broadcast feature allows you to collect data from your agents
The Player, Heuristic and Internal Brains have been updated to support
broadcast. The broadcast feature allows you to collect data from your Agents
using a Python program without controlling them.
### How to use: Unity

### How to use: Python
When you launch your Unity Environment from a Python program, you can see what
the agents connected to non-external brains are doing. When calling `step` or
`reset` on your environment, you retrieve a dictionary mapping brain names to
the Agents connected to non-External Brains are doing. When calling `step` or
`reset` on your environment, you retrieve a dictionary mapping Brain names to
non-external brain set to broadcast as well as for any external brains.
non-External Brain set to broadcast as well as for any External Brains.
Just like with an external brain, the `BrainInfo` object contains the fields for
Just like with an External Brain, the `BrainInfo` object contains the fields for
were taken by the agents at the previous step, not the current one.
were taken by the Agents at the previous step, not the current one.
for non-external brains. If there are no external brains in the scene, simply
for non-External Brains. If there are no External Brains in the scene, simply
Heuristics or Internal brains game sessions. You can then use this data to train
Heuristics or Internal Brains game sessions. You can then use this data to train
an agent in a supervised context.

26
docs/Learning-Environment-Design-External-Internal-Brains.md


# External and Internal Brains
The **External** and **Internal** types of Brains work in different phases of
training. When training your agents, set their brain types to **External**; when
using the trained models, set their brain types to **Internal**.
training. When training your Agents, set their Brain types to **External**; when
using the trained models, set their Brain types to **Internal**.
training process to collect the observations of agents using that brain and give
the agents their actions.
training process to collect the observations of Agents using that Brain and give
the Agents their actions.
In addition to using an External brain for training using the ML-Agents learning
algorithms, you can use an External brain to control agents in a Unity
environment using an external Python program. See [Python API](../ml-agents/README.md)
In addition to using an External Brain for training using the ML-Agents learning
algorithms, you can use an External Brain to control Agents in a Unity
environment using an external Python program. See [Python API](Python-API.md)
for more information.
Unlike the other types, the External Brain has no properties to set in the Unity

A __model__ is a mathematical relationship mapping an agent's observations to
its actions. TensorFlow is a software library for performing numerical
computation through data flow graphs. A TensorFlow model, then, defines the
mathematical relationship between your agent's observations and its actions
mathematical relationship between your Agent's observations and its actions
using a TensorFlow data flow graph.
### Creating a graph model

* `Graph Scope` : If you set a scope while training your TensorFlow model, all
your placeholder name will have a prefix. You must specify that prefix here.
Note that if more than one Brain were set to external during training, you
must give a `Graph Scope` to the internal Brain corresponding to the name of
must give a `Graph Scope` to the Internal Brain corresponding to the name of
graph, you must specify the name if the placeholder here. The brain will make
the batch size equal to the number of agents connected to the brain
graph, you must specify the name if the placeholder here. The Brain will make
the batch size equal to the number of Agents connected to the Brain
automatically.
* `State Node Name` : If your graph uses the state as an input, you must specify
the name of the placeholder here.

if the output placeholder here.
* `Observation Placeholder Name` : If your graph uses observations as input, you
must specify it here. Note that the number of observations is equal to the
length of `Camera Resolutions` in the brain parameters.
length of `Camera Resolutions` in the Brain parameters.
actions of the brain in your graph. If the action space type is continuous,
actions of the Brain in your graph. If the action space type is continuous,
the output must be a one dimensional tensor of float of length `Action Space
Size`, if the action space type is discrete, the output must be a one
dimensional tensor of int of the same length as the `Branches` array.

14
docs/Learning-Environment-Design-Heuristic-Brains.md


# Heuristic Brain
The **Heuristic** brain type allows you to hand code an agent's decision making
process. A Heuristic brain requires an implementation of the Decision interface
The **Heuristic** Brain type allows you to hand code an Agent's decision making
process. A Heuristic Brain requires an implementation of the Decision interface
to which it delegates the decision making process.
When you set the **Brain Type** property of a Brain to **Heuristic**, you must

The Decision interface defines two methods, `Decide()` and `MakeMemory()`.
The `Decide()` method receives an agents current state, consisting of the
agent's observations, reward, memory and other aspects of the agent's state, and
must return an array containing the action that the agent should take. The
The `Decide()` method receives an Agents current state, consisting of the
agent's observations, reward, memory and other aspects of the Agent's state, and
must return an array containing the action that the Agent should take. The
format of the returned action array depends on the **Vector Action Space Type**.
When using a **Continuous** action space, the action array is just a float array
with a length equal to the **Vector Action Space Size** setting. When using a

integers.
The `MakeMemory()` function allows you to pass data forward to the next
iteration of an agent's decision making process. The array you return from
iteration of an Agent's decision making process. The array you return from
can use the memory to allow the agent's decision process to take past actions
can use the memory to allow the Agent's decision process to take past actions
and observations into account when making the current decision. If your
heuristic logic does not require memory, just return an empty array.

24
docs/Learning-Environment-Design-Player-Brains.md


# Player Brain
The **Player** brain type allows you to control an agent using keyboard
commands. You can use Player brains to control a "teacher" agent that trains
other agents during [imitation learning](Training-Imitation-Learning.md). You
can also use Player brains to test your agents and environment before changing
their brain types to **External** and running the training process.
The **Player** Brain type allows you to control an Agent using keyboard
commands. You can use Player Brains to control a "teacher" Agent that trains
other Agents during [imitation learning](Training-Imitation-Learning.md). You
can also use Player Brains to test your Agents and environment before changing
their Brain types to **External** and running the training process.
The **Player** brain properties allow you to assign one or more keyboard keys to
The **Player** Brain properties allow you to assign one or more keyboard keys to
brain uses the discrete action space, you can send one integer value as the
action per step. In contrast, when a brain uses the continuous action space you
Brain uses the discrete action space, you can send one integer value as the
action per step. In contrast, when a Brain uses the continuous action space you
can send any number of floating point values (up to the **Vector Action Space
Size** setting).

action. (If you press both keys at the same time, deterministic results are not guaranteed.)|
||**Element 0–N**| The mapping of keys to action values. |
|| **Key** | The key on the keyboard. |
|| **Index** | The element of the agent's action vector to set when this key is
|| **Index** | The element of the Agent's action vector to set when this key is
|| **Value** | The value to send to the agent as its action for the specified
|| **Value** | The value to send to the Agent as its action for the specified
index when the mapped key is pressed. All other members of the action vector
are set to 0. |
|**Discrete Player Actions**|| The mapping for the discrete vector action space.

|| **Key** | The key on the keyboard. |
|| **Branch Index** |The element of the agent's action vector to set when this
|| **Branch Index** |The element of the Agent's action vector to set when this
|| **Value** | The value to send to the agent as its action when the mapped key
|| **Value** | The value to send to the Agent as its action when the mapped key
is pressed. Cannot exceed the max value for the associated branch (minus 1,
since it is an array index).|

82
docs/Learning-Environment-Design.md


Training and simulation proceed in steps orchestrated by the ML-Agents Academy
class. The Academy works with Agent and Brain objects in the scene to step
through the simulation. When either the Academy has reached its maximum number
of steps or all agents in the scene are _done_, one training episode is
of steps or all Agents in the scene are _done_, one training episode is
neural network model. The type of Brain assigned to an agent determines whether
it participates in training or not. The **External** brain communicates with the
neural network model. The type of Brain assigned to an Agent determines whether
it participates in training or not. The **External** Brain communicates with the
with an **Internal** brain.
with an **Internal** Brain.
2. Calls the `AgentReset()` function for each agent in the scene.
3. Calls the `CollectObservations()` function for each agent in the scene.
4. Uses each agent's Brain class to decide on the agent's next action.
2. Calls the `AgentReset()` function for each Agent in the scene.
3. Calls the `CollectObservations()` function for each Agent in the scene.
4. Uses each Agent's Brain class to decide on the Agent's next action.
6. Calls the `AgentAction()` function for each agent in the scene, passing in
the action chosen by the agent's brain. (This function is not called if the
agent is done.)
7. Calls the agent's `AgentOnDone()` function if the agent has reached its `Max
6. Calls the `AgentAction()` function for each Agent in the scene, passing in
the action chosen by the Agent's Brain. (This function is not called if the
Agent is done.)
7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
an agent to restart if it finishes before the end of an episode. In this
an Agent to restart if it finishes before the end of an episode. In this
case, the Academy calls the `AgentReset()` function.
8. When the Academy reaches its own `Max Step` count, it starts the next episode
again by calling your Academy subclass's `AcademyReset()` function.

**Note:** The API used by the Python PPO training process to communicate with
and control the Academy during training can be used for other purposes as well.
For example, you could use the API to use Unity as the simulation engine for
your own machine learning algorithms. See [Python API](../ml-agents/README.md) for more
your own machine learning algorithms. See [Python API](Python-API.md) for more
information.
## Organizing the Unity Scene

as you need. Any Brain instances in the scene must be attached to GameObjects
that are children of the Academy in the Unity Scene Hierarchy. Agent instances
should be attached to the GameObject representing that agent.
should be attached to the GameObject representing that Agent.
You must assign a brain to every agent, but you can share brains between
multiple agents. Each agent will make its own observations and act
You must assign a Brain to every Agent, but you can share Brains between
multiple Agents. Each Agent will make its own observations and act
brains, the same trained TensorFlow model.
Brains, the same trained TensorFlow model.
The Academy object orchestrates agents and their decision making processes. Only
The Academy object orchestrates Agents and their decision making processes. Only
place a single Academy object in a scene.
You must create a subclass of the Academy class (since the base class is

* `InitializeAcademy()` — Prepare the environment the first time it launches.
* `AcademyReset()` — Prepare the environment and agents for the next training
* `AcademyReset()` — Prepare the environment and Agents for the next training
objects in the scene before the agents take their actions. Note that the
agents have already collected their observations and chosen an action before
objects in the scene before the Agents take their actions. Note that the
Agents have already collected their observations and chosen an action before
the Academy invokes this method.
The base Academy classes also defines several important properties that you can

assigned a Brain, but you can use the same Brain with more than one Agent.
Use the Brain class directly, rather than a subclass. Brain behavior is
determined by the brain type. During training, set your agent's brain type to
determined by the Brain type. During training, set your Agent's Brain type to
project and change the brain type to **Internal**. See
project and change the Brain type to **Internal**. See
different types of brains. You can extend the CoreBrain class to create
different brain types if the four built-in types don't do what you need.
different types of Brains. You can extend the CoreBrain class to create
different Brain types if the four built-in types don't do what you need.
Inspector window. These properties must be appropriate for the agents using the
brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an agent exactly. See
Inspector window. These properties must be appropriate for the Agents using the
Brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an Agent exactly. See
[Agents](Learning-Environment-Design-Agents.md) for information about creating
agents and setting up a Brain instance correctly.

in a football game or a car object in a vehicle simulation. Every Agent must be
assigned a Brain.
To create an agent, extend the Agent class and implement the essential
To create an Agent, extend the Agent class and implement the essential
* `CollectObservations()` — Collects the agent's observation of its environment.
* `AgentAction()` — Carries out the action chosen by the agent's brain and
* `CollectObservations()` — Collects the Agent's observation of its environment.
* `AgentAction()` — Carries out the action chosen by the Agent's Brain and
Brain assigned to this agent must be set.
Brain assigned to this Agent must be set.
manually set an agent to done in your `AgentAction()` function when the agent
has finished (or irrevocably failed) its task. You can also set the agent's `Max
Steps` property to a positive value and the agent will consider itself done
manually set an Agent to done in your `AgentAction()` function when the Agent
has finished (or irrevocably failed) its task. You can also set the Agent's `Max
Steps` property to a positive value and the Agent will consider itself done
count, it starts the next episode. If you set an agent's `ResetOnDone` property
to true, then the agent can attempt its task several times in one episode. (Use
the `Agent.AgentReset()` function to prepare the agent to start again.)
count, it starts the next episode. If you set an Agent's `ResetOnDone` property
to true, then the Agent can attempt its task several times in one episode. (Use
the `Agent.AgentReset()` function to prepare the Agent to start again.)
about programing your own agents.
about programing your own Agents.
## Environments

* The training scene must start automatically when your Unity application is
launched by the training process.
* The scene must include at least one **External** brain.
* The scene must include at least one **External** Brain.
each agent setting itself to `done`.
each Agent setting itself to `done`.

72
docs/Learning-Environment-Examples.md


* Set-up: A linear movement task where the agent must move left or right to
rewarding states.
* Goal: Move to the most reward state.
* Agents: The environment contains one agent linked to a single brain.
* Agents: The environment contains one agent linked to a single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: One variable corresponding to current state.
* Vector Action space: (Discrete) Two possible actions (Move left, move
right).

* Goal: The agent must balance the platform in order to keep the ball on it for
as long as possible.
* Agents: The environment contains 12 agents of the same kind, all linked to a
single brain.
single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: 8 variables corresponding to rotation of platform,
and position, rotation, and velocity of ball.
* Vector Observation space (Hard Version): 5 variables corresponding to