浏览代码

Documentation 0.5 Release Check List (Part 1) (#1154)

/develop-generalizationTraining-TrainerController
Arthur Juliani 6 年前
当前提交
2cd8e250
共有 48 个文件被更改,包括 880 次插入822 次删除
  1. 5
      CODE_OF_CONDUCT.md
  2. 60
      CONTRIBUTING.md
  3. 94
      README.md
  4. 2
      docs/API-Reference.md
  5. 2
      docs/Background-TensorFlow.md
  6. 32
      docs/Basic-Guide.md
  7. 18
      docs/FAQ.md
  8. 2
      docs/Feature-Memory.md
  9. 2
      docs/Feature-Monitor.md
  10. 94
      docs/Getting-Started-with-Balance-Ball.md
  11. 4
      docs/Glossary.md
  12. 4
      docs/Installation-Windows.md
  13. 29
      docs/Installation.md
  14. 4
      docs/Learning-Environment-Best-Practices.md
  15. 77
      docs/Learning-Environment-Create-New.md
  16. 6
      docs/Learning-Environment-Design-Academy.md
  17. 147
      docs/Learning-Environment-Design-Agents.md
  18. 42
      docs/Learning-Environment-Design-Brains.md
  19. 26
      docs/Learning-Environment-Design-External-Internal-Brains.md
  20. 14
      docs/Learning-Environment-Design-Heuristic-Brains.md
  21. 24
      docs/Learning-Environment-Design-Player-Brains.md
  22. 82
      docs/Learning-Environment-Design.md
  23. 72
      docs/Learning-Environment-Examples.md
  24. 19
      docs/Learning-Environment-Executable.md
  25. 4
      docs/Limitations.md
  26. 24
      docs/ML-Agents-Overview.md
  27. 19
      docs/Migrating.md
  28. 4
      docs/Readme.md
  29. 8
      docs/Training-Curriculum-Learning.md
  30. 24
      docs/Training-Imitation-Learning.md
  31. 13
      docs/Training-ML-Agents.md
  32. 2
      docs/Training-PPO.md
  33. 2
      docs/Training-on-Amazon-Web-Service.md
  34. 15
      docs/Training-on-Microsoft-Azure.md
  35. 8
      docs/Using-TensorFlow-Sharp-in-Unity.md
  36. 8
      docs/Using-Tensorboard.md
  37. 26
      docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
  38. 4
      docs/localized/zh-CN/docs/Installation.md
  39. 4
      docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
  40. 14
      docs/localized/zh-CN/docs/Learning-Environment-Design.md
  41. 42
      docs/localized/zh-CN/docs/Learning-Environment-Examples.md
  42. 2
      docs/localized/zh-CN/docs/ML-Agents-Overview.md
  43. 183
      ml-agents/README.md
  44. 149
      docs/Python-API.md
  45. 158
      gym-unity/README.md
  46. 1
      MLAgentsSDK/README.md
  47. 127
      gym-unity/Readme.md

5
CODE_OF_CONDUCT.md


## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct/
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 1.4, available at
https://www.contributor-covenant.org/version/1/4/code-of-conduct/
[homepage]: https://www.contributor-covenant.org

60
CONTRIBUTING.md


# Contribution Guidelines
Thank you for your interest in contributing to the ML-Agents toolkit! We are incredibly
excited to see how members of our community will use and extend the ML-Agents toolkit.
To facilitate your contributions, we've outlined a brief set of guidelines
to ensure that your extensions can be easily integrated.
Thank you for your interest in contributing to the ML-Agents toolkit! We are
incredibly excited to see how members of our community will use and extend the
ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set
of guidelines to ensure that your extensions can be easily integrated.
### Communication
## Communication
First, please read through our [code of conduct](CODE_OF_CONDUCT.md),
as we expect all our contributors to follow it.
First, please read through our [code of conduct](CODE_OF_CONDUCT.md), as we
expect all our contributors to follow it.
Second, before starting on a project that you intend to contribute
to the ML-Agents toolkit (whether environments or modifications to the codebase),
we **strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) and
briefly outlining the changes you plan to make. This will enable us to provide
some context that may be helpful for you. This could range from advice and
feedback on how to optimally perform your changes or reasons for not doing it.
Second, before starting on a project that you intend to contribute to the
ML-Agents toolkit (whether environments or modifications to the codebase), we
**strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues)
and briefly outlining the changes you plan to make. This will enable us to
provide some context that may be helpful for you. This could range from advice
and feedback on how to optimally perform your changes or reasons for not doing
it.
### Git Branches
## Git Branches
Starting with v0.3, we adopted the
Starting with v0.3, we adopted the
Consequently, the `master` branch corresponds to the latest release of
Consequently, the `master` branch corresponds to the latest release of
* Corresponding changes to documentation, unit tests and sample environments
(if applicable)
* Corresponding changes to documentation, unit tests and sample environments (if
applicable)
### Environments
## Environments
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
PR explaining the nature of the environment and task.
PR explaining the nature of the environment and task.
### Style Guide
## Style Guide
When performing changes to the codebase, ensure that you follow the style
guide of the file you're modifying. For Python, we follow
[PEP 8](https://www.python.org/dev/peps/pep-0008/). For C#, we will soon be
adding a formal style guide for our repository.
When performing changes to the codebase, ensure that you follow the style guide
of the file you're modifying. For Python, we follow
[PEP 8](https://www.python.org/dev/peps/pep-0008/).
For C#, we will soon be adding a formal style guide for our repository.

94
README.md


# Unity ML-Agents Toolkit (Beta)
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source Unity plugin
that enables games and simulations to serve as environments for training
intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through
a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers
and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
These trained agents can be used for multiple purposes, including
controlling NPC behavior (in a variety of settings such as multi-agent and
adversarial), automated testing of game builds and evaluating different game
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
developers and AI researchers as it provides a central platform where advances
in AI can be evaluated on Unity’s rich environments and then made accessible
to the wider research and game developer communities.
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source
Unity plugin that enables games and simulations to serve as environments for
training intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through a
simple-to-use Python API. We also provide implementations (based on TensorFlow)
of state-of-the-art algorithms to enable game developers and hobbyists to easily
train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be
used for multiple purposes, including controlling NPC behavior (in a variety of
settings such as multi-agent and adversarial), automated testing of game builds
and evaluating different game design decisions pre-release. The ML-Agents
toolkit is mutually beneficial for both game developers and AI researchers as it
provides a central platform where advances in AI can be evaluated on Unity’s
rich environments and then made accessible to the wider research and game
developer communities.
* Train memory-enhanced Agents using deep reinforcement learning
* Train memory-enhanced agents using deep reinforcement learning
* Broadcasting of Agent behavior for supervised learning
* Broadcasting of agent behavior for supervised learning
* Flexible Agent control with On Demand Decision Making
* Flexible agent control with On Demand Decision Making
* For more information, in addition to installation and usage
instructions, see our [documentation home](docs/Readme.md).
* If you have
used a version of the ML-Agents toolkit prior to v0.4, we strongly recommend
our [guide on migrating from earlier versions](docs/Migrating.md).
* For more information, in addition to installation and usage instructions, see
our [documentation home](docs/Readme.md).
* If you have used a version of the ML-Agents toolkit prior to v0.4, we strongly
recommend our [guide on migrating from earlier versions](docs/Migrating.md).
- Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
- [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
- [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/) announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
- [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
* Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and
[Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
* [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
* [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/)
announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
* [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
- [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
- [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
- [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
* [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
* [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
* [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
The ML-Agents toolkit is an open-source project and we encourage and welcome contributions.
If you wish to contribute, be sure to review our
[contribution guidelines](CONTRIBUTING.md) and
The ML-Agents toolkit is an open-source project and we encourage and welcome
contributions. If you wish to contribute, be sure to review our
[contribution guidelines](CONTRIBUTING.md) and
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using the ML-Agents toolkit and Unity developers enthusiastic
about machine learning. We use that channel to surface updates
regarding the ML-Agents toolkit (and, more broadly, machine learning in games).
* If you run into any problems using the ML-Agents toolkit,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using the ML-Agents toolkit and Unity developers
enthusiastic about machine learning. We use that channel to surface updates
regarding the ML-Agents toolkit (and, more broadly, machine learning in
games).
* If you run into any problems using the ML-Agents toolkit,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
For any other questions or feedback, connect directly with the ML-Agents
team at ml-agents@unity3d.com.

translating more pages and to other languages. Consequently,
we welcome any enhancements and improvements from the community.
- [Chinese](docs/localized/zh-CN/)
* [Chinese](docs/localized/zh-CN/)
## License

2
docs/API-Reference.md


# API Reference
Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
documented to be compatabile with
documented to be compatible with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
documentation.

2
docs/Background-TensorFlow.md


performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
train the behavior of an Agent, the output is a TensorFlow model (.bytes) file
train the behavior of an agent, the output is a TensorFlow model (.bytes) file
that you can then embed within an Internal Brain. Unless you implement a new
algorithm, the use of TensorFlow is mostly abstracted away and behind the
scenes.

32
docs/Basic-Guide.md


# Basic Guide
This guide will show you how to use a pretrained model in an example Unity
This guide will show you how to use a pre-trained model in an example Unity
environment, and show you how to train the model yourself.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we

In order to use the ML-Agents toolkit within Unity, you need to change some
Unity settings first. Also [TensorFlowSharp
plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
is needed for you to use pretrained model within Unity, which is based on the
is needed for you to use pre-trained model within Unity, which is based on the
[TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
1. Launch Unity

`None` if you want to interact with the current scene in the Unity Editor.
More information and documentation is provided in the
[Python API](../ml-agents/README.md) page.
[Python API](Python-API.md) page.
## Training the Brain with Reinforcement Learning

the brain used by the agents to **External**. This allows the agents to
the Brain used by the Agents to **External**. This allows the Agents to
communicate with the external training process when making their decisions.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy

### Training the environment
1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
**Note**: If you followed the default [installation](Installation.md), then
you should be able to run `mlagents-learn` from any directory.
Where:
where:
trainer configuration. The defaults used by environments in the ML-Agents
SDK can be found in `config/trainer_config.yaml`.
trainer configuration. The defaults used by example environments included
in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
- And the `--train` tells `mlagents-learn` to run a training session (rather
- `--train` tells `mlagents-learn` to run a training session (rather
4. When the message _"Start training by pressing the Play button in the Unity
4. If you cloned the ML-Agents repo, then you can simply run
```sh
mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
```
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.

'--train': True,
'--worker-id': '0',
'<trainer-config-path>': 'config/trainer_config.yaml'}
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
```
**Note**: If you're using Anaconda, don't forget to activate the ml-agents

like this:
```console
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
INFO:mlagents.envs:
'Ball3DAcademy' started successfully!
Unity Academy name: Ball3DAcademy

`models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where
`<academy_name>` is the name of the Academy GameObject in the current scene.
This file corresponds to your model's latest checkpoint. You can now embed this
trained model into your internal brain by following the steps below, which is
trained model into your Internal Brain by following the steps below, which is
similar to the steps described
[above](#play-an-example-environment-using-pretrained-model).

page.
- For a more detailed walk-through of our 3D Balance Ball environment, check out
the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
- For a "Hello World" introduction to creating your own learning environment,
- For a "Hello World" introduction to creating your own Learning Environment,
check out the [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.
- For a series of Youtube video tutorials, checkout the

18
docs/FAQ.md


## TensorFlowSharp flag not turned on
If you have already imported the TensorFlowSharp plugin, but havn't set
If you have already imported the TensorFlowSharp plugin, but haven't set
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
You need to install and enable the TensorFlowSharp plugin in order to use the Internal Brain.
```
This error message occurs because the TensorFlowSharp plugin won't be usage

## Tensorflow epsilon placeholder error
## TensorFlow epsilon placeholder error
If you have a graph placeholder set in the internal Brain inspector that is not
If you have a graph placeholder set in the Internal Brain inspector that is not
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
UnityAgentsException: One of the TensorFlow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
Similarly, if you have a graph scope set in the internal Brain inspector that is
Similarly, if you have a graph scope set in the Internal Brain inspector that is
not correctly set, you will see some error like this:
```console

Solution: Make sure your Graph Scope field matches the corresponding brain
object name in your Hierachy Inspector when there is multiple brain.
Solution: Make sure your Graph Scope field matches the corresponding Brain
object name in your Hierarchy Inspector when there are multiple Brains.
## Environment Permission Error

## Mean reward : nan
If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the learning environment not
using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for either the Academy or
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts

2
docs/Feature-Memory.md


# Memory-enhanced Agents using Recurrent Neural Networks
# Memory-enhanced agents using Recurrent Neural Networks
## What are memories for

2
docs/Feature-Monitor.md


You can track many different things both related and unrelated to the agents
themselves. By default, the Monitor is only active in the *inference* phase, so
not during training. To change this behaviour, you can activate or deactivate it
not during training. To change this behavior, you can activate or deactivate it
by calling `SetActive(boolean)`. For example to also show the monitor during
training, you can call it in the `InitializeAcademy()` method of your `Academy`:

94
docs/Getting-Started-with-Balance-Ball.md


This tutorial walks through the end-to-end process of opening a ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
agent in it, and finally embedding the trained model into the Unity environment.
Agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help

This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent** that
horizontally or vertically. In this environment, a platform is an **Agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the platforms learn to never drop the ball.

The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
independent agent, but they all share the same Brain. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
### Academy

and **Inference Configuration** properties set the graphics and timescale
properties for the Unity application. The Academy uses the **Training
Configuration** during training and the **Inference Configuration** when not
training. (*Inference* means that the agent is using a trained model or
training. (*Inference* means that the Agent is using a trained model or
heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the **Training
configuration** and a high graphics quality and the timescale to `1.0` for the

* Academy.InitializeAcademy() — Called once when the environment is launched.
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
agent.AgentAction() (and after the Agents collect their observations).
The 3D Balance Ball environment does not use these functions — each agent resets
The 3D Balance Ball environment does not use these functions — each Agent resets
environment around the agents.
environment around the Agents.
the Academy.) All the agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an agent, it just
routes the agent's collected observations to the decision making process and
returns the chosen action to the agent. Thus, all agents can share the same
brain, but act independently. The Brain settings tell you quite a bit about how
an agent works.
the Academy.) All the Agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an Agent, it just
routes the Agent's collected observations to the decision making process and
returns the chosen action to the Agent. Thus, all Agents can share the same
Brain, but act independently. The Brain settings tell you quite a bit about how
an Agent works.
The **Brain Type** determines how an agent makes its decisions. The **External**
The **Brain Type** determines how an Agent makes its decisions. The **External**
agents; use **Internal** when using the trained model. The **Heuristic** brain
allows you to hand-code the agent's logic by extending the Decision class.
Finally, the **Player** brain lets you map keyboard commands to actions, which
Agents; use **Internal** when using the trained model. The **Heuristic** Brain
allows you to hand-code the Agent's logic by extending the Decision class.
Finally, the **Player** Brain lets you map keyboard commands to actions, which
of brains do what you need, you can implement your own CoreBrain to create your
of Brains do what you need, you can implement your own CoreBrain to create your
own type.
In this tutorial, you will set the **Brain Type** to **External** for training;

The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the agent's observations contains eight elements: the `x` and
vector containing the Agent's observations contains eight elements: the `x` and
defined in the agent's `CollectObservations()` function.)
defined in the Agent's `CollectObservations()` function.)
An agent is given instructions from the brain in the form of *actions*.
An Agent is given instructions from the Brain in the form of *actions*.
element of the vector means is defined by the agent logic (the PPO training
element of the vector means is defined by the Agent logic (the PPO training
element might represent a force or torque applied to a `RigidBody` in the agent.
element might represent a force or torque applied to a `Rigidbody` in the Agent.
given to the agent is an array of indeces into tables.
given to the Agent is an array of indices into tables.
The 3D Balance Ball example is programmed to use both types of vector action
space. You can try training with both settings to observe whether there is a

Platform GameObjects. The base Agent object has a few properties that affect its
behavior:
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
* **Visual Observations** — Defines any Camera objects used by the agent to
* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
makes decisions. All the Agents in the 3D Balance Ball scene share the same
Brain.
* **Visual Observations** — Defines any Camera objects used by the Agent to
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an agent starts over when it is finished.
3D Balance Ball sets this true so that the agent restarts after reaching the
* **Max Step** — Defines how many simulation steps can occur before the Agent
decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an Agent starts over when it is finished.
3D Balance Ball sets this true so that the Agent restarts after reaching the
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
Perhaps the more interesting aspect of an agents is the Agent subclass
implementation. When you create an Agent, you must extend the base Agent class.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
* agent.AgentReset() — Called when the Agent resets, including at the beginning
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
* agent.CollectObservations() — Called every simulation step. Responsible for
collecting the Agent's observations of the environment. Since the Brain
instance assigned to the Agent is set to the continuous vector observation
* Agent.AgentAction() — Called every simulation step. Receives the action chosen
by the brain. The Ball3DAgent example handles both the continuous and the
* agent.AgentAction() — Called every simulation step. Receives the action chosen
by the Brain. The Ball3DAgent example handles both the continuous and the
assigns a reward to the agent; in this example, an agent receives a small
assigns a reward to the Agent; in this example, an Agent receives a small
negative reward for dropping the ball. An agent is also marked as done when it
negative reward for dropping the ball. An Agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.

explaining it.
To train the agents within the Ball Balance environment, we will be using the
python package. We have provided a convenient script called `mlagents-learn`
Python package. We have provided a convenient script called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
We can use `run_id` to identify the experiment and create a folder where the

The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the intructions in [Using an
Executable](Learning-Environment-Executable.md).
follow the intructions in
[Using an Executable](Learning-Environment-Executable.md).
### Observing Training Progress

Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type. **Note:** Do not just
use it with Agents having an **Internal** Brain type. **Note:** Do not just
close the Unity Window once the `Saved Model` message appears. Either wait for
the training process to close the window or press Ctrl+C at the command-line
prompt. If you simply close the window manually, the .bytes file containing the

To embed the trained model into Unity, follow the later part of [Training the
Brain with Reinforcement
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
of the Basic Buides page.
of the Basic Guide page.

4
docs/Glossary.md


logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given external
brain. Contains TensorFlow graph which makes decisions for external brain.
* **Trainer** - Python class which is responsible for training a given External
Brain. Contains TensorFlow graph which makes decisions for External Brain.

4
docs/Installation-Windows.md


Next, install `tensorflow`. Install this package using `pip` - which is a
package management system used to install Python packages. Latest versions of
Tensorflow won't work, so you will need to make sure that you install version
TensorFlow won't work, so you will need to make sure that you install version
1.7.1. In the same Anaconda Prompt, type in the following command _(make sure
you are connected to the internet)_:

Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an
Anaconda Prompt with the Conda environment ml-agents activated, type in the
following command to uninstall the tensorflow for cpu and install the tensorflow
following command to uninstall TensorFlow for cpu and install TensorFlow
for gpu _(make sure you are connected to the internet)_:
```sh

29
docs/Installation.md


width="500" border="10" />
</p>
## Clone the Ml-Agents Repository
## Clone the ML-Agents Toolkit Repository
The `UnitySDK` directory in this repository contains the Unity Assets to add
to your projects. The `python` directory contains python packages which provide
trainers, a python API to interface with Unity, and a package to interface with
OpenAI Gym.
The `UnitySDK` subdirectory contains the Unity Assets to add to your projects.
It also contains many [example environments](Learning-Environment-Examples.md)
that can be used to help get you familiar with Unity.
The `ml-agents` subdirectory contains Python packages which provide
trainers and a Python API to interface with Unity.
The `gym-unity` subdirectory contains a package to interface with OpenAI Gym.
## Install Python (with Dependencies)
## Install Python and mlagents Package
In order to use ML-Agents toolkit, you need Python 3.6 along with the
dependencies listed in the [requirements file](../ml-agents/requirements.txt).

### Mac and Unix Users
[Download](https://www.python.org/downloads/) and install Python 3 if you do not
[Download](https://www.python.org/downloads/) and install Python 3.6 if you do not
If your Python environment doesn't include `pip`, see these
If your Python environment doesn't include `pip3`, see these
To install dependencies, enter the `ml-agents/` directory and run from
the command line:
To install the dependencies and `mlagents` Python package, enter the
`ml-agents/` subdirectory and run from the command line:
pip3 install .
pip install .
If you installed this correctly, you should be able to run
`mlagents-learn --help`
## Docker-based Installation

4
docs/Learning-Environment-Best-Practices.md


([learn more here](Training-Curriculum-Learning.md)).
* When possible, it is often helpful to ensure that you can complete the task by
using a Player Brain to control the agent.
* It is often helpful to make many copies of the agent, and attach the brain to
be trained to all of these agents. In this way the brain can get more feedback
* It is often helpful to make many copies of the agent, and attach the Brain to
be trained to all of these agents. In this way the Brain can get more feedback
information from all of these agents, which helps it train faster.
## Rewards

77
docs/Learning-Environment-Create-New.md


This tutorial walks through the process of creating a Unity Environment. A Unity
Environment is an application built using the Unity Engine which can be used to
train Reinforcement Learning agents.
train Reinforcement Learning Agents.
![A simple ML-Agents environment](images/mlagents-NewTutSplash.png)

methods to update the scene independently of any agents. For example, you can
add, move, or delete agents and other entities in the environment.
3. Add one or more Brain objects to the scene as children of the Academy.
4. Implement your Agent subclasses. An Agent subclass defines the code an agent
4. Implement your Agent subclasses. An Agent subclass defines the code an Agent
optional methods to reset the agent when it has finished or failed its task.
optional methods to reset the Agent when it has finished or failed its task.
in the scene that represents the agent in the simulation. Each Agent object
in the scene that represents the Agent in the simulation. Each Agent object
must be assigned a Brain object.
6. If training, set the Brain type to External and
[run the training process](Training-ML-Agents.md).

Next, we will create a very simple scene to act as our ML-Agents environment.
The "physical" components of the environment include a Plane to act as the floor
for the agent to move around on, a Cube to act as the goal or target for the
agent to seek, and a Sphere to represent the agent itself.
for the Agent to move around on, a Cube to act as the goal or target for the
agent to seek, and a Sphere to represent the Agent itself.
### Create the floor plane

leave it alone for now.
So far, these are the basic steps that you would use to add ML-Agents to any
Unity project. Next, we will add the logic that will let our agent learn to roll
Unity project. Next, we will add the logic that will let our Agent learn to roll
to the cube using reinforcement learning.
In this simple scenario, we don't use the Academy object to control the

### Initialization and Resetting the Agent
When the agent reaches its target, it marks itself done and its agent reset
function moves the target to a random location. In addition, if the agent rolls
When the Agent reaches its target, it marks itself done and its Agent reset
function moves the target to a random location. In addition, if the Agent rolls
off the platform, the reset function puts it back onto the floor.
To move the target GameObject, we need a reference to its Transform (which

allowing you to choose which GameObject to use as the target in the Unity
Editor. To reset the agent's velocity (and later to apply force to move the
Editor. To reset the Agent's velocity (and later to apply force to move the
agent) we need a reference to the Rigidbody component. A
[Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) is Unity's
primary element for physics simulation. (See

{
if (this.transform.position.y < -1.0)
{
// The agent fell
// The Agent fell
this.transform.position = Vector3.zero;
this.rBody.angularVelocity = Vector3.zero;
this.rBody.velocity = Vector3.zero;

### Observing the Environment
The Agent sends the information we collect to the Brain, which uses it to make a
decision. When you train the agent (or use a trained model), the data is fed
into a neural network as a feature vector. For an agent to successfully learn a
decision. When you train the Agent (or use a trained model), the data is fed
into a neural network as a feature vector. For an Agent to successfully learn a
In our case, the information our agent collects includes:
In our case, the information our Agent collects includes:
training. Note that the agent only collects the x and z coordinates since the
training. Note that the Agent only collects the x and z coordinates since the
floor is aligned with the x-z plane and the y component of the target's
position never changes.

AddVectorObs(relativePosition.z / 5);
```
* Position of the agent itself within the confines of the floor. This data is
collected as the agent's distance from each edge of the floor.
* Position of the Agent itself within the confines of the floor. This data is
collected as the Agent's distance from each edge of the floor.
```csharp
// Distance to edges of platform

AddVectorObs((this.transform.position.z - 5) / 5);
```
* The velocity of the agent. This helps the agent learn to control its speed so
* The velocity of the Agent. This helps the Agent learn to control its speed so
it doesn't overshoot the target and roll off the platform.
```csharp

`AgentAction()` function. The number of elements in this array is determined by
the `Vector Action Space Type` and `Vector Action Space Size` settings of the
agent's Brain. The RollerAgent uses the continuous vector action space and needs
two continuous control signals from the brain. Thus, we will set the Brain
two continuous control signals from the Brain. Thus, we will set the Brain
axis. (If we allowed the agent to move in three dimensions, then we would need
axis. (If we allowed the Agent to move in three dimensions, then we would need
to set `Vector Action Size` to 3. Each of these values returned by the network
are between `-1` and `1.` Note the Brain really has no idea what the values in
the action array mean. The training process just adjusts the action values in

### Rewards
Reinforcement learning requires rewards. Assign rewards in the `AgentAction()`
function. The learning algorithm uses the rewards assigned to the agent at each
function. The learning algorithm uses the rewards assigned to the Agent at each
the agent the optimal actions. You want to reward an agent for completing the
assigned task (reaching the Target cube, in this case) and punish the agent if
the Agent the optimal actions. You want to reward an Agent for completing the
assigned task (reaching the Target cube, in this case) and punish the Agent if
training with sub-rewards that encourage behavior that helps the agent complete
training with sub-rewards that encourage behavior that helps the Agent complete
the agent moves closer to the target in a step and a small negative reward at
each step which encourages the agent to complete its task quickly.
the Agent moves closer to the target in a step and a small negative reward at
each step which encourages the Agent to complete its task quickly.
agent as finished by setting the agent to done.
agent as finished by setting the Agent to done.
```csharp
float distanceToTarget = Vector3.Distance(this.transform.position,

}
```
**Note:** When you mark an agent as done, it stops its activity until it is
reset. You can have the agent reset immediately, by setting the
**Note:** When you mark an Agent as done, it stops its activity until it is
reset. You can have the Agent reset immediately, by setting the
It can also encourage an agent to finish a task more quickly to assign a
It can also encourage an Agent to finish a task more quickly to assign a
negative reward at each step:
```csharp

Finally, to punish the agent for falling off the platform, assign a large
negative reward and, of course, set the agent to done so that it resets itself
Finally, to punish the Agent for falling off the platform, assign a large
negative reward and, of course, set the Agent to done so that it resets itself
in the next step:
```csharp

Now, that all the GameObjects and ML-Agent components are in place, it is time
to connect everything together in the Unity Editor. This involves assigning the
Brain object to the Agent, changing some of the Agent Components properties, and
setting the Brain properties so that they are compatible with our agent code.
setting the Brain properties so that they are compatible with our Agent code.
1. Expand the Academy GameObject in the Hierarchy window, so that the Brain
object is visible.

It is always a good idea to test your environment manually before embarking on
an extended training run. The reason we have left the Brain set to the
**Player** type is so that we can control the agent using direct keyboard
**Player** type is so that we can control the Agent using direct keyboard
control. But first, you need to define the keyboard to action mapping. Although
the RollerAgent only has an `Action Size` of two, we will use one key to specify
positive values and one to specify negative values for each action, for a total

`AgentAction()` function. **Value** is assigned to action[Index] when **Key** is
pressed.
Press **Play** to run the scene and use the WASD keys to move the agent around
Press **Play** to run the scene and use the WASD keys to move the Agent around
Console window and that the agent resets when it reaches its target or falls
Console window and that the Agent resets when it reaches its target or falls
includes a convenient Monitor class that you can use to easily display agent
includes a convenient Monitor class that you can use to easily display Agent
status information in the Game window.
One additional test you can perform is to first ensure that your environment and

Keep in mind:
* There can only be one Academy game object in a scene.
* You can have multiple Brain game objects but they must be child of the Academy game object.
* You can have multiple Brain game objects but they must be child of the Academy
game object.
Here is an example of what your scene hierarchy should look like:

6
docs/Learning-Environment-Design-Academy.md


# Creating an Academy
An Academy orchestrates all the Agent and Brain objects in a Unity scene. Every
scene containing agents must contain a single Academy. To use an Academy, you
scene containing Agents must contain a single Academy. To use an Academy, you
must create your own subclass. However, all the methods you can override are
optional.

## Resetting an Environment
Implement an `AcademyReset()` function to alter the environment at the start of
each episode. For example, you might want to reset an agent to its starting
each episode. For example, you might want to reset an Agent to its starting
position or move a goal to a random position. An environment resets when the
Academy `Max Steps` count is reached.

## Controlling an Environment
The `AcademyStep()` function is called at every step in the simulation before
any agents are updated. Use this function to update objects in the environment
any Agents are updated. Use this function to update objects in the environment
at every step or during the episode between environment resets. For example, if
you want to add elements to the environment at random intervals, you can put the
logic for creating them in the `AcademyStep()` function.

147
docs/Learning-Environment-Design-Agents.md


# Agents
An agent is an actor that can observe its environment and decide on the best
course of action using those observations. Create agents in Unity by extending
course of action using those observations. Create Agents in Unity by extending
successfully learn are the observations the agent collects and, for
reinforcement learning, the reward you assign to estimate the value of the
successfully learn are the observations the agent collects for
reinforcement learning and the reward you assign to estimate the value of the
An agent passes its observations to its brain. The brain, then, makes a decision
An Agent passes its observations to its Brain. The Brain, then, makes a decision
and passes the chosen action back to the agent. Your agent code must execute the
action, for example, move the agent in one direction or another. In order to
[train an agent using reinforcement learning](Learning-Environment-Design.md),

The Brain class abstracts out the decision making logic from the agent itself so
that you can use the same brain in multiple agents. How a brain makes its
decisions depends on the type of brain it is. An **External** brain simply
passes the observations from its agents to an external process and then passes
the decisions made externally back to the agents. An **Internal** brain uses the
The Brain class abstracts out the decision making logic from the Agent itself so
that you can use the same Brain in multiple Agents. How a Brain makes its
decisions depends on the type of Brain it is. An **External** Brain simply
passes the observations from its Agents to an external process and then passes
the decisions made externally back to the Agents. An **Internal** Brain uses the
parameters in search of a better decision). The other types of brains do not
parameters in search of a better decision). The other types of Brains do not
directly involve training, but you might find them useful as part of a training
project. See [Brains](Learning-Environment-Design-Brains.md).

of simulation steps (the frequency defaults to once-per-step). You can also set
up an agent to request decisions on demand. Making decisions at regular step
up an Agent to request decisions on demand. Making decisions at regular step
decisions on demand is generally appropriate for situations where agents only
decisions on demand is generally appropriate for situations where Agents only
respond to specific events or take actions of variable duration. For example, an
agent in a robotic simulator that must provide fine-control of joint torques
should make its decisions every step of the simulation. On the other hand, an

To control the frequency of step-based decision making, set the **Decision
Frequency** value for the Agent object in the Unity Inspector window. Agents
using the same Brain instance can use a different frequency. During simulation
steps in which no decision is requested, the agent receives the same action
steps in which no decision is requested, the Agent receives the same action
On demand decision making allows agents to request decisions from their brains
On demand decision making allows Agents to request decisions from their Brains
only when needed instead of receiving decisions at a fixed frequency. This is
useful when the agents commit to an action for a variable number of steps or
when the agents cannot make decisions at the same time. This typically the case

When you turn on **On Demand Decisions** for an agent, your agent code must call
When you turn on **On Demand Decisions** for an Agent, your agent code must call
of the observation-decision-action-reward cycle. The Brain invokes the agent's
of the observation-decision-action-reward cycle. The Brain invokes the Agent's
`AgentAction()` method. The Brain waits for the agent to request the next
`AgentAction()` method. The Brain waits for the Agent to request the next
decision before starting another iteration.
## Observations

point numbers.
* **Visual Observations** — one or more camera images.
When you use vector observations for an agent, implement the
When you use vector observations for an Agent, implement the
to implement the `CollectObservations()` method when your agent uses visual
to implement the `CollectObservations()` method when your Agent uses visual
observations (unless it also uses vector observations).
### Vector Observation Space: Feature Vectors

class calls the `CollectObservations()` method of each of its agents. Your
class calls the `CollectObservations()` method of each of its Agents. Your
The observation must include all the information an agent needs to accomplish
The observation must include all the information an agents needs to accomplish
its task. Without sufficient and relevant information, an agent may learn poorly
or may not learn at all. A reasonable approach for determining what information
should be included is to consider what you would need to calculate an analytical

an agent's observations to a fixed subset. For example, instead of observing
every enemy agent in an environment, you could only observe the closest five.
When you set up an Agent's brain in the Unity Editor, set the following
When you set up an Agent's Brain in the Unity Editor, set the following
properties to use a continuous vector observation:
* **Space Size** — The state size must match the length of your feature vector.

### Multiple Visual Observations
Camera observations use rendered textures from one or more cameras in a scene.
The brain vectorizes the textures into a 3D Tensor which can be fed into a
The Brain vectorizes the textures into a 3D Tensor which can be fed into a
convolutional neural network (CNN). For more information on CNNs, see [this
guide](http://cs231n.github.io/convolutional-networks/). You can use camera
observations along side vector observations.

also typically less efficient and slower to train, and sometimes don't succeed
at all.
To add a visual observation to an agent, click on the `Add Camera` button in the
To add a visual observation to an Agent, click on the `Add Camera` button in the
can have more than one camera attached to an agent.
can have more than one camera attached to an Agent.
specify the number of Cameras the agent is using for its visual observations.
specify the number of Cameras the Agent is using for its visual observations.
For each visual observation, set the width and height of the image (in pixels)
and whether or not the observation is color or grayscale (when `Black And White`
is checked).

An action is an instruction from the brain that the agent carries out. The
action is passed to the agent as a parameter when the Academy invokes the
An action is an instruction from the Brain that the agent carries out. The
action is passed to the Agent as a parameter when the Academy invokes the
is **Continuous**, the action parameter passed to the agent is an array of
is **Continuous**, the action parameter passed to the Agent is an array of
control signals with length equal to the `Vector Action Space Size` property.
When you specify a **Discrete** vector action space type, the action parameter
is an array containing integers. Each integer is an index into a list or table

corresponds to an action table, you can specify the size of each table by
modifying the `Branches` property. Set the `Vector Action Space Size` and
`Vector Action Space Type` properties on the Brain object assigned to the agent
`Vector Action Space Type` properties on the Brain object assigned to the Agent
many training episodes. Thus, the only place actions are defined for an agent is
many training episodes. Thus, the only place actions are defined for an Agent is
in the `AgentAction()` function. You simply specify the type of vector action
space, and, for the continuous vector action space, the number of values, and
then apply the received values appropriately (and consistently) in

either continuous or the discrete vector actions. In the continuous case, you
would set the vector action size to two (one for each dimension), and the
agent's brain would create an action with two floating point values. In the
agent's Brain would create an action with two floating point values. In the
direction), and the brain would create an action array containing a single
direction), and the Brain would create an action array containing a single
movement), and the brain would create an action array containing two elements
movement), and the Brain would create an action array containing two elements
test your action logic using a **Player** brain, which lets you map keyboard
test your action logic using a **Player** Brain, which lets you map keyboard
commands to actions. See [Brains](Learning-Environment-Design-Brains.md).
The [3DBall](Learning-Environment-Examples.md#3dball-3d-balance-ball) and

### Continuous Action Space
When an agent uses a brain set to the **Continuous** vector action space, the
action parameter passed to the agent's `AgentAction()` function is an array with
When an Agent uses a Brain set to the **Continuous** vector action space, the
action parameter passed to the Agent's `AgentAction()` function is an array with
them. If you assign an element in the array as the speed of an agent, for
example, the training process learns to control the speed of the agent though
them. If you assign an element in the array as the speed of an Agent, for
example, the training process learns to control the speed of the Agent though
this parameter.
The [Reacher example](Learning-Environment-Examples.md#reacher) defines a

### Discrete Action Space
When an agent uses a brain set to the **Discrete** vector action space, the
action parameter passed to the agent's `AgentAction()` function is an array
When an Agent uses a Brain set to the **Discrete** vector action space, the
action parameter passed to the Agent's `AgentAction()` function is an array
For example, if we wanted an agent that can move in an plane and jump, we could
For example, if we wanted an Agent that can move in an plane and jump, we could
agent be able to move __and__ jump concurently. We define the first branch to
agent be able to move __and__ jump concurrently. We define the first branch to
have 5 possible actions (don't move, go left, go right, go backward, go forward)
and the second one to have 2 possible actions (don't jump, jump). The
AgentAction method would look something like:

// Look up the index in the jump action list:
if (jump == 1 && IsGrounded()) { directionY = 1; }
// Apply the action results to move the agent
// Apply the action results to move the Agent
gameObject.GetComponent<Rigidbody>().AddForce(
new Vector3(
directionX * 40f, directionY * 300f, directionZ * 40f));

#### Masking Discrete Actions
When using Discrete Actions, it is possible to specify that some actions are
impossible for the next decision. Then the agent is controlled by an External or
Internal Brain, the agent will be unable to perform the specified action. Note
that when the agent is controlled by a Player or Heuristic Brain, the agent will
impossible for the next decision. Then the Agent is controlled by an External or
Internal Brain, the Agent will be unable to perform the specified action. Note
that when the Agent is controlled by a Player or Heuristic Brain, the Agent will
still be able to decide to perform the masked action. In order to mask an
action, call the method `SetActionMask` within the `CollectObservation` method :

* `branch` is the index (starting at 0) of the branch on which you want to mask
the action
* `actionIndices` is a list of `int` or a single `int` corresponding to the
index of theaction that the agent cannot perform.
index of the action that the Agent cannot perform.
For example, if you have an agent with 2 branches and on the first branch
For example, if you have an Agent with 2 branches and on the first branch
and _"change weapon"_. Then with the code bellow, the agent will either _"do
and _"change weapon"_. Then with the code bellow, the Agent will either _"do
nothing"_ or _"change weapon"_ for his next decision (since action index 1 and 2
are masked)

reward over time. The better your reward mechanism, the better your agent will
learn.
**Note:** Rewards are not used during inference by a brain using an already
**Note:** Rewards are not used during inference by a Brain using an already
to display the cumulative reward received by an agent. You can even use a Player
brain to control the agent while watching how it accumulates rewards.
to display the cumulative reward received by an Agent. You can even use a Player
Brain to control the Agent while watching how it accumulates rewards.
Allocate rewards to an agent by calling the `AddReward()` method in the
Allocate rewards to an Agent by calling the `AddReward()` method in the
`AgentAction()` function. The reward assigned in any step should be in the range
[-1,1]. Values outside this range can lead to unstable training. The `reward`
value is reset to zero at every step.

SetReward(0.1f);
}
// When ball falls mark agent as done and give a negative penalty
// When ball falls mark Agent as done and give a negative penalty
if ((ball.transform.position.y - gameObject.transform.position.y) < -2f ||
Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||
Mathf.Abs(ball.transform.position.z - gameObject.transform.position.z) > 3f)

![Agent Inspector](images/agent.png)
* `Brain` - The brain to register this agent to. Can be dragged into the
* `Brain` - The Brain to register this Agent to. Can be dragged into the
reached, the agent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the agent's `AgentReset()` function should be called
when the agent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the agent requests decisions at a fixed step
reached, the Agent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the Agent's `AgentReset()` function should be called
when the Agent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the Agent requests decisions at a fixed step
interval or explicitly requests decisions by calling `RequestDecision()`.
* If not checked, the Agent will request a new decision every `Decision
Frequency` steps and perform an action every step. In the example above,

* `RequestAction()` Signals that the Agent is requesting an action. The
action provided to the Agent in this case is the same action that was
provided the last time it requested a decision.
* `Decision Frequency` - The number of steps between decision requests. Not used if `On Demand Decision`, is true.
* `Decision Frequency` - The number of steps between decision requests. Not used
if `On Demand Decision`, is true.
Unity environment. While this was built for monitoring an Agent's value function
Unity environment. While this was built for monitoring an agent's value function
throughout the training process, we imagine it can be more broadly useful. You
can learn more [here](Feature-Monitor.md).

`GameObject.Instantiate()` function. It is typically easiest to instantiate an
agent from a [Prefab](https://docs.unity3d.com/Manual/Prefabs.html) (otherwise,
you have to instantiate every GameObject and Component that make up your agent
you have to instantiate every GameObject and Component that make up your Agent
following function creates a new agent given a Prefab, Brain instance, location,
following function creates a new Agent given a Prefab, Brain instance, location,
private void CreateAgent(GameObject agentPrefab, Brain brain, Vector3 position, Quaternion orientation)
private void CreateAgent(GameObject AgentPrefab, Brain brain, Vector3 position, Quaternion orientation)
GameObject agentObj = Instantiate(agentPrefab, position, orientation);
Agent agent = agentObj.GetComponent<Agent>();
agent.GiveBrain(brain);
agent.AgentReset();
GameObject AgentObj = Instantiate(agentPrefab, position, orientation);
Agent Agent = AgentObj.GetComponent<Agent>();
Agent.GiveBrain(brain);
Agent.AgentReset();
}
```

the next step in the simulation) so that the Brain knows that this agent is no
longer active. Thus, the best place to destroy an agent is in the
the next step in the simulation) so that the Brain knows that this Agent is no
longer active. Thus, the best place to destroy an Agent is in the
`Agent.AgentOnDone()` function:
```csharp

}
```
Note that in order for `AgentOnDone()` to be called, the agent's `ResetOnDone`
property must be false. You can set `ResetOnDone` on the agent's Inspector or in
Note that in order for `AgentOnDone()` to be called, the Agent's `ResetOnDone`
property must be false. You can set `ResetOnDone` on the Agent's Inspector or in
code.

42
docs/Learning-Environment-Design-Brains.md


* [External](Learning-Environment-Design-External-Internal-Brains.md) — The
**External** and **Internal** types typically work together; set **External**
when training your agents. You can also use the **External** brain to
when training your Agents. You can also use the **External** Brain to
**Heuristic** to hand-code the agent's logic by extending the Decision class.
**Heuristic** to hand-code the Agent's logic by extending the Decision class.
keyboard keys to agent actions, which can be useful to test your agent code.
keyboard keys to Agent actions, which can be useful to test your Agent code.
During training, set your agent's brain type to **External**. To use the trained
model, import the model file into the Unity project and change the brain type to
During training, set your Agent's Brain type to **External**. To use the trained
model, import the model file into the Unity project and change the Brain type to
Inspector window. These properties must be appropriate for the agents using the
brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an agent exactly. See
Inspector window. These properties must be appropriate for the Agents using the
Brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an Agent exactly. See
[Agents](Learning-Environment-Design-Agents.md) for information about creating
agents and setting up a Brain instance correctly.

* `Brain Parameters` - Define vector observations, visual observation, and
vector actions for the Brain.
* `Vector Observation`
* `Space Size` - Length of vector observation for brain.
* `Space Size` - Length of vector observation for Brain.
effective size of the vector observation being passed to the brain being:
effective size of the vector observation being passed to the Brain being:
_Space Size_ x _Stacked Vectors_.
* `Visual Observations` - Describes height, width, and whether to grayscale
visual observations for the Brain.

* `Space Size` (Continuous) - Length of action vector for brain.
* `Branches` (Discrete) - An array of integers, defines multiple concurent
* `Space Size` (Continuous) - Length of action vector for Brain.
* `Branches` (Discrete) - An array of integers, defines multiple concurrent
discrete actions. The values in the `Branches` array correspond to the
number of possible discrete values for each action branch.
* `Action Descriptions` - A list of strings used to name the available

## Using the Broadcast Feature
The Player, Heuristic and Internal brains have been updated to support
broadcast. The broadcast feature allows you to collect data from your agents
The Player, Heuristic and Internal Brains have been updated to support
broadcast. The broadcast feature allows you to collect data from your Agents
using a Python program without controlling them.
### How to use: Unity

### How to use: Python
When you launch your Unity Environment from a Python program, you can see what
the agents connected to non-external brains are doing. When calling `step` or
`reset` on your environment, you retrieve a dictionary mapping brain names to
the Agents connected to non-External Brains are doing. When calling `step` or
`reset` on your environment, you retrieve a dictionary mapping Brain names to
non-external brain set to broadcast as well as for any external brains.
non-External Brain set to broadcast as well as for any External Brains.
Just like with an external brain, the `BrainInfo` object contains the fields for
Just like with an External Brain, the `BrainInfo` object contains the fields for
were taken by the agents at the previous step, not the current one.
were taken by the Agents at the previous step, not the current one.
for non-external brains. If there are no external brains in the scene, simply
for non-External Brains. If there are no External Brains in the scene, simply
Heuristics or Internal brains game sessions. You can then use this data to train
Heuristics or Internal Brains game sessions. You can then use this data to train
an agent in a supervised context.

26
docs/Learning-Environment-Design-External-Internal-Brains.md


# External and Internal Brains
The **External** and **Internal** types of Brains work in different phases of
training. When training your agents, set their brain types to **External**; when
using the trained models, set their brain types to **Internal**.
training. When training your Agents, set their Brain types to **External**; when
using the trained models, set their Brain types to **Internal**.
training process to collect the observations of agents using that brain and give
the agents their actions.
training process to collect the observations of Agents using that Brain and give
the Agents their actions.
In addition to using an External brain for training using the ML-Agents learning
algorithms, you can use an External brain to control agents in a Unity
environment using an external Python program. See [Python API](../ml-agents/README.md)
In addition to using an External Brain for training using the ML-Agents learning
algorithms, you can use an External Brain to control Agents in a Unity
environment using an external Python program. See [Python API](Python-API.md)
for more information.
Unlike the other types, the External Brain has no properties to set in the Unity

A __model__ is a mathematical relationship mapping an agent's observations to
its actions. TensorFlow is a software library for performing numerical
computation through data flow graphs. A TensorFlow model, then, defines the
mathematical relationship between your agent's observations and its actions
mathematical relationship between your Agent's observations and its actions
using a TensorFlow data flow graph.
### Creating a graph model

* `Graph Scope` : If you set a scope while training your TensorFlow model, all
your placeholder name will have a prefix. You must specify that prefix here.
Note that if more than one Brain were set to external during training, you
must give a `Graph Scope` to the internal Brain corresponding to the name of
must give a `Graph Scope` to the Internal Brain corresponding to the name of
graph, you must specify the name if the placeholder here. The brain will make
the batch size equal to the number of agents connected to the brain
graph, you must specify the name if the placeholder here. The Brain will make
the batch size equal to the number of Agents connected to the Brain
automatically.
* `State Node Name` : If your graph uses the state as an input, you must specify
the name of the placeholder here.

if the output placeholder here.
* `Observation Placeholder Name` : If your graph uses observations as input, you
must specify it here. Note that the number of observations is equal to the
length of `Camera Resolutions` in the brain parameters.
length of `Camera Resolutions` in the Brain parameters.
actions of the brain in your graph. If the action space type is continuous,
actions of the Brain in your graph. If the action space type is continuous,
the output must be a one dimensional tensor of float of length `Action Space
Size`, if the action space type is discrete, the output must be a one
dimensional tensor of int of the same length as the `Branches` array.

14
docs/Learning-Environment-Design-Heuristic-Brains.md


# Heuristic Brain
The **Heuristic** brain type allows you to hand code an agent's decision making
process. A Heuristic brain requires an implementation of the Decision interface
The **Heuristic** Brain type allows you to hand code an Agent's decision making
process. A Heuristic Brain requires an implementation of the Decision interface
to which it delegates the decision making process.
When you set the **Brain Type** property of a Brain to **Heuristic**, you must

The Decision interface defines two methods, `Decide()` and `MakeMemory()`.
The `Decide()` method receives an agents current state, consisting of the
agent's observations, reward, memory and other aspects of the agent's state, and
must return an array containing the action that the agent should take. The
The `Decide()` method receives an Agents current state, consisting of the
agent's observations, reward, memory and other aspects of the Agent's state, and
must return an array containing the action that the Agent should take. The
format of the returned action array depends on the **Vector Action Space Type**.
When using a **Continuous** action space, the action array is just a float array
with a length equal to the **Vector Action Space Size** setting. When using a

integers.
The `MakeMemory()` function allows you to pass data forward to the next
iteration of an agent's decision making process. The array you return from
iteration of an Agent's decision making process. The array you return from
can use the memory to allow the agent's decision process to take past actions
can use the memory to allow the Agent's decision process to take past actions
and observations into account when making the current decision. If your
heuristic logic does not require memory, just return an empty array.

24
docs/Learning-Environment-Design-Player-Brains.md


# Player Brain
The **Player** brain type allows you to control an agent using keyboard
commands. You can use Player brains to control a "teacher" agent that trains
other agents during [imitation learning](Training-Imitation-Learning.md). You
can also use Player brains to test your agents and environment before changing
their brain types to **External** and running the training process.
The **Player** Brain type allows you to control an Agent using keyboard
commands. You can use Player Brains to control a "teacher" Agent that trains
other Agents during [imitation learning](Training-Imitation-Learning.md). You
can also use Player Brains to test your Agents and environment before changing
their Brain types to **External** and running the training process.
The **Player** brain properties allow you to assign one or more keyboard keys to
The **Player** Brain properties allow you to assign one or more keyboard keys to
brain uses the discrete action space, you can send one integer value as the
action per step. In contrast, when a brain uses the continuous action space you
Brain uses the discrete action space, you can send one integer value as the
action per step. In contrast, when a Brain uses the continuous action space you
can send any number of floating point values (up to the **Vector Action Space
Size** setting).

action. (If you press both keys at the same time, deterministic results are not guaranteed.)|
||**Element 0–N**| The mapping of keys to action values. |
|| **Key** | The key on the keyboard. |
|| **Index** | The element of the agent's action vector to set when this key is
|| **Index** | The element of the Agent's action vector to set when this key is
|| **Value** | The value to send to the agent as its action for the specified
|| **Value** | The value to send to the Agent as its action for the specified
index when the mapped key is pressed. All other members of the action vector
are set to 0. |
|**Discrete Player Actions**|| The mapping for the discrete vector action space.

|| **Key** | The key on the keyboard. |
|| **Branch Index** |The element of the agent's action vector to set when this
|| **Branch Index** |The element of the Agent's action vector to set when this
|| **Value** | The value to send to the agent as its action when the mapped key
|| **Value** | The value to send to the Agent as its action when the mapped key
is pressed. Cannot exceed the max value for the associated branch (minus 1,
since it is an array index).|

82
docs/Learning-Environment-Design.md


Training and simulation proceed in steps orchestrated by the ML-Agents Academy
class. The Academy works with Agent and Brain objects in the scene to step
through the simulation. When either the Academy has reached its maximum number
of steps or all agents in the scene are _done_, one training episode is
of steps or all Agents in the scene are _done_, one training episode is
neural network model. The type of Brain assigned to an agent determines whether
it participates in training or not. The **External** brain communicates with the
neural network model. The type of Brain assigned to an Agent determines whether
it participates in training or not. The **External** Brain communicates with the
with an **Internal** brain.
with an **Internal** Brain.
2. Calls the `AgentReset()` function for each agent in the scene.
3. Calls the `CollectObservations()` function for each agent in the scene.
4. Uses each agent's Brain class to decide on the agent's next action.
2. Calls the `AgentReset()` function for each Agent in the scene.
3. Calls the `CollectObservations()` function for each Agent in the scene.
4. Uses each Agent's Brain class to decide on the Agent's next action.
6. Calls the `AgentAction()` function for each agent in the scene, passing in
the action chosen by the agent's brain. (This function is not called if the
agent is done.)
7. Calls the agent's `AgentOnDone()` function if the agent has reached its `Max
6. Calls the `AgentAction()` function for each Agent in the scene, passing in
the action chosen by the Agent's Brain. (This function is not called if the
Agent is done.)
7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
an agent to restart if it finishes before the end of an episode. In this
an Agent to restart if it finishes before the end of an episode. In this
case, the Academy calls the `AgentReset()` function.
8. When the Academy reaches its own `Max Step` count, it starts the next episode
again by calling your Academy subclass's `AcademyReset()` function.

**Note:** The API used by the Python PPO training process to communicate with
and control the Academy during training can be used for other purposes as well.
For example, you could use the API to use Unity as the simulation engine for
your own machine learning algorithms. See [Python API](../ml-agents/README.md) for more
your own machine learning algorithms. See [Python API](Python-API.md) for more
information.
## Organizing the Unity Scene

as you need. Any Brain instances in the scene must be attached to GameObjects
that are children of the Academy in the Unity Scene Hierarchy. Agent instances
should be attached to the GameObject representing that agent.
should be attached to the GameObject representing that Agent.
You must assign a brain to every agent, but you can share brains between
multiple agents. Each agent will make its own observations and act
You must assign a Brain to every Agent, but you can share Brains between
multiple Agents. Each Agent will make its own observations and act
brains, the same trained TensorFlow model.
Brains, the same trained TensorFlow model.
The Academy object orchestrates agents and their decision making processes. Only
The Academy object orchestrates Agents and their decision making processes. Only
place a single Academy object in a scene.
You must create a subclass of the Academy class (since the base class is

* `InitializeAcademy()` — Prepare the environment the first time it launches.
* `AcademyReset()` — Prepare the environment and agents for the next training
* `AcademyReset()` — Prepare the environment and Agents for the next training
objects in the scene before the agents take their actions. Note that the
agents have already collected their observations and chosen an action before
objects in the scene before the Agents take their actions. Note that the
Agents have already collected their observations and chosen an action before
the Academy invokes this method.
The base Academy classes also defines several important properties that you can

assigned a Brain, but you can use the same Brain with more than one Agent.
Use the Brain class directly, rather than a subclass. Brain behavior is
determined by the brain type. During training, set your agent's brain type to
determined by the Brain type. During training, set your Agent's Brain type to
project and change the brain type to **Internal**. See
project and change the Brain type to **Internal**. See
different types of brains. You can extend the CoreBrain class to create
different brain types if the four built-in types don't do what you need.
different types of Brains. You can extend the CoreBrain class to create
different Brain types if the four built-in types don't do what you need.
Inspector window. These properties must be appropriate for the agents using the
brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an agent exactly. See
Inspector window. These properties must be appropriate for the Agents using the
Brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an Agent exactly. See
[Agents](Learning-Environment-Design-Agents.md) for information about creating
agents and setting up a Brain instance correctly.

in a football game or a car object in a vehicle simulation. Every Agent must be
assigned a Brain.
To create an agent, extend the Agent class and implement the essential
To create an Agent, extend the Agent class and implement the essential
* `CollectObservations()` — Collects the agent's observation of its environment.
* `AgentAction()` — Carries out the action chosen by the agent's brain and
* `CollectObservations()` — Collects the Agent's observation of its environment.
* `AgentAction()` — Carries out the action chosen by the Agent's Brain and
Brain assigned to this agent must be set.
Brain assigned to this Agent must be set.
manually set an agent to done in your `AgentAction()` function when the agent
has finished (or irrevocably failed) its task. You can also set the agent's `Max
Steps` property to a positive value and the agent will consider itself done
manually set an Agent to done in your `AgentAction()` function when the Agent
has finished (or irrevocably failed) its task. You can also set the Agent's `Max
Steps` property to a positive value and the Agent will consider itself done
count, it starts the next episode. If you set an agent's `ResetOnDone` property
to true, then the agent can attempt its task several times in one episode. (Use
the `Agent.AgentReset()` function to prepare the agent to start again.)
count, it starts the next episode. If you set an Agent's `ResetOnDone` property
to true, then the Agent can attempt its task several times in one episode. (Use
the `Agent.AgentReset()` function to prepare the Agent to start again.)
about programing your own agents.
about programing your own Agents.
## Environments

* The training scene must start automatically when your Unity application is
launched by the training process.
* The scene must include at least one **External** brain.
* The scene must include at least one **External** Brain.
each agent setting itself to `done`.
each Agent setting itself to `done`.

72
docs/Learning-Environment-Examples.md


* Set-up: A linear movement task where the agent must move left or right to
rewarding states.
* Goal: Move to the most reward state.
* Agents: The environment contains one agent linked to a single brain.
* Agents: The environment contains one agent linked to a single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: One variable corresponding to current state.
* Vector Action space: (Discrete) Two possible actions (Move left, move
right).

* Goal: The agent must balance the platform in order to keep the ball on it for
as long as possible.
* Agents: The environment contains 12 agents of the same kind, all linked to a
single brain.
single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: 8 variables corresponding to rotation of platform,
and position, rotation, and velocity of ball.
* Vector Observation space (Hard Version): 5 variables corresponding to

and obstacles.
* Goal: The agent must navigate the grid to the goal while avoiding the
obstacles.
* Agents: The environment contains one agent linked to a single brain.
* Agents: The environment contains one agent linked to a single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: None
* Vector Action space: (Discrete) Size of 4, corresponding to movement in
cardinal directions. Note that for this environment,

net.
* Goal: The agents must bounce ball between one another while not dropping or
sending ball out of bounds.
* Agents: The environment contains two agent linked to a single brain named
TennisBrain. After training you can attach another brain named MyBrain to one
* Agents: The environment contains two agent linked to a single Brain named
TennisBrain. After training you can attach another Brain named MyBrain to one
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: 8 variables corresponding to position and velocity
of ball and racket.
* Vector Action space: (Continuous) Size of 2, corresponding to movement

* Set-up: A platforming environment where the agent can push a block around.
* Goal: The agent must push the block to the goal.
* Agents: The environment contains one agent linked to a single brain.
* Agents: The environment contains one agent linked to a single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: (Continuous) 70 variables corresponding to 14
ray-casts each detecting one of three possible objects (wall, goal, or
block).

* Set-up: A platforming environment where the agent can jump over a wall.
* Goal: The agent must use the block to scale the wall and reach the goal.
* Agents: The environment contains one agent linked to two different brains. The
brain the agent is linked to changes depending on the height of the wall.
* Agents: The environment contains one agent linked to two different Brains. The
Brain the agent is linked to changes depending on the height of the wall.
* Brains: Two brains, each with the following observation/action space.
* Vector Observation space: Size of 74, corresponding to 14 raycasts each
* Brains: Two Brains, each with the following observation/action space.
* Vector Observation space: Size of 74, corresponding to 14 ray casts each
* Rotation (3 possible acions: Rotate Left, Rotate Right, No Action)
* Side Motion (3 possible acions: Left, Right, No Action)
* Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
* Side Motion (3 possible actions: Left, Right, No Action)
* Jump (2 possible actions: Jump, No Action)
* Visual Observations: None.
* Reset Parameters: 4, corresponding to the height of the possible walls.

* Set-up: Double-jointed arm which can move to target locations.
* Goal: The agents must move it's hand to the goal location, and keep it there.
* Agents: The environment contains 10 agent linked to a single brain.
* Agents: The environment contains 10 agent linked to a single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: 26 variables corresponding to position, rotation,
velocity, and angular velocities of the two arm Rigidbodies.
* Vector Action space: (Continuous) Size of 4, corresponding to torque

* Goal: The agents must move its body toward the goal direction without falling.
* `CrawlerStaticTarget` - Goal direction is always forward.
* `CrawlerDynamicTarget`- Goal direction is randomized.
* Agents: The environment contains 3 agent linked to a single brain.
* Agents: The environment contains 3 agent linked to a single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: 117 variables corresponding to position, rotation,
velocity, and angular velocities of each limb plus the acceleration and
angular acceleration of the body.

* Set-up: A multi-agent environment where agents compete to collect bananas.
* Goal: The agents must learn to move to as many yellow bananas as possible
while avoiding blue bananas.
* Agents: The environment contains 5 agents linked to a single brain.
* Agents: The environment contains 5 agents linked to a single Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: 53 corresponding to velocity of agent (2), whether
agent is frozen and/or shot its laser (2), plus ray-based perception of
objects around agent's forward direction (49; 7 raycast angles with 7

* Side Motion (3 possible acions: Left, Right, No Action)
* Rotation (3 possible acions: Rotate Left, Rotate Right, No Action)
* Side Motion (3 possible actions: Left, Right, No Action)
* Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
* Laser (2 possible actions: Laser, No Action)
* Visual Observations (Optional): First-person camera per-agent. Use
`VisualBanana` scene.

remember it, and use it to move to the correct goal.
* Goal: Move to the goal which corresponds to the color of the block in the
room.
* Agents: The environment contains one agent linked to a single brain.
* Agents: The environment contains one agent linked to a single Brain.
* Brains: One brain with the following observation/action space:
* Brains: One Brain with the following observation/action space:
* Vector Observation space: 30 corresponding to local ray-casts detecting
objects, goals, and walls.
* Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent

* Set-up: Environment where the agent needs on-demand decision making. The agent
must decide how perform its next bounce only when it touches the ground.
* Goal: Catch the floating banana. Only has a limited number of jumps.
* Agents: The environment contains one agent linked to a single brain.
* Agents: The environment contains one agent linked to a single Brain.
* Brains: One brain with the following observation/action space:
* Brains: One Brain with the following observation/action space:
* Vector Observation space: 6 corresponding to local position of agent and
banana.
* Vector Action space: (Continuous) 3 corresponding to agent force applied for

* Goal:
* Striker: Get the ball into the opponent's goal.
* Goalie: Prevent the ball from entering its own goal.
* Agents: The environment contains four agents, with two linked to one brain
* Agents: The environment contains four agents, with two linked to one Brain
(strikers) and two linked to another (goalies).
* Agent Reward Function (dependent):
* Striker:

* -1 When ball enters team's goal.
* +0.1 When ball enters opponents goal.
* +0.001 Existential bonus.
* Brains: Two brain with the following observation/action space:
* Brains: Two Brain with the following observation/action space:
* Vector Observation space: 112 corresponding to local 14 ray casts, each
detecting 7 possible object types, along with the object's distance.
Perception is in 180 degree view from front of agent.

* Set-up: Physics-based Humanoids agents with 26 degrees of freedom. These DOFs
correspond to articulation of the following body-parts: hips, chest, spine,
head, thighs, shins, feets, arms, forearms and hands.
head, thighs, shins, feet, arms, forearms and hands.
brain.
Brain.
* Brains: One brain with the following observation/action space.
* Brains: One Brain with the following observation/action space.
* Vector Observation space: 215 variables corresponding to position, rotation,
velocity, and angular velocities of each limb, along with goal direction.
* Vector Action space: (Continuous) Size of 39, corresponding to target

pyramid, then navigate to the pyramid, knock it over, and move to the gold
brick at the top.
* Goal: Move to the golden brick on top of the spawned pyramid.
* Agents: The environment contains one agent linked to a single brain.
* Agents: The environment contains one agent linked to a single Brain.
* Brains: One brain with the following observation/action space:
* Brains: One Brain with the following observation/action space:
* Vector Observation space: 148 corresponding to local ray-casts detecting
switch, bricks, golden brick, and walls, plus variable indicating switch
state.

19
docs/Learning-Environment-Executable.md


Make sure the Brains in the scene have the right type. For example, if you want
to be able to control your agents from Python, you will need to set the
corresponding brain to **External**.
corresponding Brain to **External**.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.

## Interacting with the Environment
If you want to use the [Python API](../ml-agents/README.md) to interact with your
If you want to use the [Python API](Python-API.md) to interact with your
executable, you can pass the name of the executable with the argument
'file_name' of the `UnityEnvironment`. For instance:

## Training the Environment
1. Open a command or terminal window.
2. Nagivate to the folder where you installed ML-Agents.
3. Change to the python directory.
4. Run
2. Navigate to the folder where you installed the ML-Agents Toolkit. If you
followed the default [installation](Installation.md), then navigate to the
`ml-agents/` folder.
3. Run
* `<trainer-config-file>` is the filepath of the trainer configuration yaml.
* `<trainer-config-file>` is the file path of the trainer configuration yaml
* `<env_name>` is the name and path to the executable you exported from Unity
(without extension)
* `<run-identifier>` is a string used to separate the results of different

For example, if you are training with a 3DBall executable you exported to the
ml-agents/python directory, run:
the directory where you installed the ML-Agents Toolkit, run:
mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train
mlagents-learn ../config/trainer_config.yaml --env=3DBall --run-id=firstRun --train
```
And you should see something like

You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds
to your model's latest checkpoint. You can now embed this trained model into
your internal brain by following the steps below:
your Internal Brain by following the steps below:
1. Move your model file into
`UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.

4
docs/Limitations.md


Currently the speed of the game physics can only be increased to 100x real-time.
The Academy also moves in time with FixedUpdate() rather than Update(), so game
behavior implemented in Update() may be out of sync with the Agent decision
behavior implemented in Update() may be out of sync with the agent decision
making. See
[Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
for more information.

As of version 0.3, we no longer support Python 2.
### Tensorflow support
### TensorFlow support
Currently the Ml-Agents toolkit uses TensorFlow 1.7.1 due to the version of the
TensorFlowSharp plugin we are using.

24
docs/ML-Agents-Overview.md


border="10" />
</p>
_An example of how a scene containing multiple Agents and Brains might be
_An example of how a scene containing multiple Agents and Brains might be
configured._
## Training Modes

the scene will be controlled within Python.
We do not currently have a tutorial highlighting this mode, but you can
learn more about the Python API [here](../ml-agents/README.md).
learn more about the Python API [here](Python-API.md).
### Curriculum Learning

training intelligent agents, below are a few examples that can serve as
inspiration:
- Single-Agent. A single Agent linked to a single Brain, with its own reward
- Single-Agent. A single agent linked to a single Brain, with its own reward
- Simultaneous Single-Agent. Multiple independent Agents with independent reward
- Simultaneous Single-Agent. Multiple independent agents with independent reward
signals linked to a single Brain. A parallelized version of the traditional
training scenario, which can speed-up and stabilize the training process.
Helpful when you have multiple versions of the same character in an

- Adversarial Self-Play. Two interacting Agents with inverse reward signals
- Adversarial Self-Play. Two interacting agents with inverse reward signals
- Cooperative Multi-Agent. Multiple interacting Agents with a shared reward
- Cooperative Multi-Agent. Multiple interacting agents with a shared reward
- Competitive Multi-Agent. Multiple interacting Agents with inverse reward
- Competitive Multi-Agent. Multiple interacting s with inverse reward
scenario, agents must compete with one another to either win a competition, or
scenario, s must compete with one another to either win a competition, or
- Ecosystem. Multiple interacting Agents with independent reward signals linked
- Ecosystem. Multiple interacting s with independent reward signals linked
to either a single or multiple different Brains. This scenario can be thought
of as creating a small world in which animals with different goals all
interact, such as a savanna in which there might be zebras, elephants and

learn more about enabling LSTM during training [here](Feature-Memory.md).
- **Monitoring Agent’s Decision Making** - Since communication in ML-Agents is a
two-way street, we provide an agent Monitor class in Unity which can display
aspects of the trained agent, such as the agents perception on how well it is
two-way street, we provide an Agent Monitor class in Unity which can display
aspects of the trained Agent, such as the Agents perception on how well it is
real-time, researchers and developers can more easily debug an agent’s
real-time, researchers and developers can more easily debug an Agent’s
behavior. You can learn more about using the Monitor class
[here](Feature-Monitor.md).

19
docs/Migrating.md


### Unity API
* Discrete Actions now use [branches](https://arxiv.org/abs/1711.08946). You can now specify concurrent discrete
actions. You will need to update the Brain Parameters in the Brain Inspector
in all your environments that use discrete actions. Refer to the [discrete action documentation](Learning-Environment-Design-Agents.md#discrete-action-space) for more information.
* Discrete Actions now use [branches](https://arxiv.org/abs/1711.08946). You can
now specify concurrent discrete actions. You will need to update the Brain
Parameters in the Brain Inspector in all your environments that use discrete
actions. Refer to the
[discrete action documentation](Learning-Environment-Design-Agents.md#discrete-action-space)
for more information.
### Python API

### Python API
* We've changed some of the python packages dependencies in requirement.txt
file. Make sure to run `pip install .` within your `ml-agents/python` folder
to update your python packages.
* We've changed some of the Python packages dependencies in requirement.txt
file. Make sure to run `pip3 install .` within your `ml-agents/python` folder
to update your Python packages.
## Migrating from ML-Agents toolkit v0.2 to v0.3

replaced with a single `learn.py` script as the launching point for training
with ML-Agents. For more information on using `learn.py`, see
[here](Training-ML-Agents.md#training-with-mlagents-learn).
* Hyperparameters for training brains are now stored in the
* Hyperparameters for training Brains are now stored in the
`trainer_config.yaml` file. For more information on using this file, see
[here](Training-ML-Agents.md#training-config-file).

* `AgentStep()` has been replaced by `AgentAction()`.
* `WaitTime()` has been removed.
* The `Frame Skip` field of the Academy is replaced by the Agent's `Decision
Frequency` field, enabling agent to make decisions at different frequencies.
Frequency` field, enabling the Agent to make decisions at different frequencies.
* The names of the inputs in the Internal Brain have been changed. You must
replace `state` with `vector_observation` and `observation` with
`visual_observation`. In addition, you must remove the `epsilon` placeholder.

4
docs/Readme.md


## API Docs
* [API Reference](API-Reference.md)
* [How to use the Python API](../ml-agents/README.md)
* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
* [How to use the Python API](Python-API.md)
* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)

8
docs/Training-Curriculum-Learning.md


Each Brain in an environment can have a corresponding curriculum. These
curriculums are held in what we call a metacurriculum. A metacurriculum allows
different brains to follow different curriculums within the same environment.
different Brains to follow different curriculums within the same environment.
### Specifying a Metacurriculum

measure by previous values.
* If `true`, weighting will be 0.75 (new) 0.25 (old).
* `parameters` (dictionary of key:string, value:float array) - Corresponds to
academy reset parameters to control. Length of each array should be one
Academy reset parameters to control. Length of each array should be one
and modify the environment from the agent's `AgentReset()` function. See
and modify the environment from the Agent's `AgentReset()` function. See
[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/UnitySDK/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
for an example. Note that if the Academy's __Max Steps__ is not set to some
positive number the environment will never be reset. The Academy must reset

corresponding Brain. For example, in the Wall Jump environment, there are two
brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
Brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
the BigWallBrain, we will save `BigWallBrain.json` into
`curricula/wall-jump/`.

24
docs/Training-Imitation-Learning.md


1. In order to use imitation learning in a scene, the first thing you will need
is to create two Brains, one which will be the "Teacher," and the other which
will be the "Student." We will assume that the names of the brain
will be the "Student." We will assume that the names of the Brain
2. Set the "Teacher" brain to Player mode, and properly configure the inputs to
2. Set the "Teacher" Brain to Player mode, and properly configure the inputs to
3. Set the "Student" brain to External mode.
4. Link the brains to the desired agents (one agent as the teacher and at least
one agent as a student).
5. In `config/trainer_config.yaml`, add an entry for the "Student" brain. Set
3. Set the "Student" Brain to External mode.
4. Link the Brains to the desired Agents (one Agent as the teacher and at least
one Agent as a student).
5. In `config/trainer_config.yaml`, add an entry for the "Student" Brain. Set
`brain_to_imitate` parameter to the name of the teacher brain: "Teacher".
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
the agents for a longer period of time.
the Agents for a longer period of time.
7. From the Unity window, control the agent with the Teacher brain by providing
7. From the Unity window, control the Agent with the Teacher Brain by providing
8. Watch as the agent(s) with the student brain attached begin to behave
8. Watch as the Agent(s) with the student Brain attached begin to behave
9. Once the Student agents are exhibiting the desired behavior, end the training
9. Once the Student Agents are exhibiting the desired behavior, end the training
with `Internal` brain.
with `Internal` Brain.
### BC Teacher Helper

13
docs/Training-ML-Agents.md


`ml-agents/mlagents/trainers/learn.py`. The [configuration file](#training-config-file),
`config/trainer_config.yaml` specifies the hyperparameters used during training.
You can edit this file with a text editor to add a specific configuration for
each brain.
each Brain.
For a broader overview of reinforcement learning, imitation learning and the
ML-Agents training process, see [ML-Agents Toolkit

where
* `<trainer-config-file>` is the filepath of the trainer configuration yaml.
* `<trainer-config-file>` is the file path of the trainer configuration yaml.
* `<env_name>`__(Optional)__ is the name (including path) of your Unity
executable containing the agents to be trained. If `<env_name>` is not passed,
the training will happen in the Editor. Press the :arrow_forward: button in

1. [Build the project](Learning-Environment-Executable.md), making sure that you
only include the training scene.
2. Open a terminal or console window.
3. Navigate to the ml-agents `python` folder.
3. Navigate to the directory where you installed the ML-Agents Toolkit.
4. Run the following to launch the training process using the path to the Unity
environment you built in step 1:

regular intervals (specified by the `summary_freq` option). The saved statistics
are grouped by the `run-id` value so you should assign a unique id to each
training run if you plan to view the statistics. You can view these statistics
using TensorBoard during or after training by running the following command
(from the ML-Agents python directory):
using TensorBoard during or after training by running the following command:
```sh
tensorboard --logdir=summaries

settings. (This GameObject will be a child of the Academy in your scene.)
Sections for the example environments are included in the provided config file.
| ** Setting ** | **Description** | **Applies To Trainer**|
| :-- | :-- | :-- |
| **Setting** | **Description** | **Applies To Trainer**|
| :-- | :-- | :-- |
| batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
| beta | The strength of entropy regularization.| PPO, BC |

2
docs/Training-PPO.md


### Entropy
This corresponds to how random the decisions of a brain are. This should
This corresponds to how random the decisions of a Brain are. This should
consistently decrease during training. If it decreases too soon or not at all,
`beta` should be adjusted (when using discrete action space).

2
docs/Training-on-Amazon-Web-Service.md


source activate python3
```
2. Clone the ML-Agents repo and install the required python packages
2. Clone the ML-Agents repo and install the required Python packages
```sh
git clone https://github.com/Unity-Technologies/ml-agents.git

15
docs/Training-on-Microsoft-Azure.md


following command to complete dependency installation:
```sh
pip install docopt
pip3 install docopt
```
Note that, if you choose to deploy the image to an

1. [Move](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/copy-files-to-linux-vm-using-scp)
your built Unity application to your Virtual Machine.
2. Set the `ml-agents` sub-folder of the ml-agents repo to your working
directory.
2. Set the the directory where the ML-Agents Toolkit was installed to your
working directory.
3. Run the following command:
```sh

2. Unless you started the training as a background process, connect to your VM
from another terminal instance.
3. Set the `python` folder in ml-agents to your current working directory.
4. Run the following command from your `tensorboard --logdir=summaries --host
0.0.0.0`
5. You should now be able to open a browser and navigate to
3. Run the following command from your terminal
`tensorboard --logdir=summaries --host 0.0.0.0`
4. You should now be able to open a browser and navigate to
`<Your_VM_IP_Address>:6060` to view the TensorBoard report.
## Running on Azure Container Instances

it isn't needed. You can read more about
[The ML-Agents toolkit support for Docker containers here](Using-Docker.md).
Using ACI enables you to offload training of your models without needing to
install Python and Tensorflow on your own computer. You can find instructions,
install Python and TensorFlow on your own computer. You can find instructions,
including a pre-deployed image in DockerHub for you to use, available
[here](https://github.com/druttka/unity-ml-on-azure).

8
docs/Using-TensorFlow-Sharp-in-Unity.md


placed in placeholders of dimension 1 and size 1. (Be sure to name them.)
It is important that the inputs and outputs of the graph are exactly the ones
you receive and return when training your model with an `External` brain. This
you receive and return when training your model with an `External` Brain. This
means you cannot have any operations such as reshaping outside of the graph. The
object you get by calling `step` or `reset` has fields `vector_observations`,
`visual_observations` and `memories` which must correspond to the placeholders

.bytes file so Unity can load it.
In the Unity Editor, you must specify the names of the nodes used by your graph
in the **Internal** brain Inspector window. If you used a scope when defining
in the **Internal** Brain Inspector window. If you used a scope when defining
your graph, specify it in the `Graph Scope` field.
![Internal Brain Inspector](images/internal_brain.png)

for more information about using Internal Brains.
If you followed these instructions well, the agents in your environment that use
this brain will use your fully trained network to make decisions.
If you followed these instructions well, the Agents in your environment that use
this Brain will use your fully trained network to make decisions.
## iOS additional instructions for building

8
docs/Using-Tensorboard.md


start TensorBoard:
1. Open a terminal or console window:
2. Navigate to the ml-agents/python folder.
2. Navigate to the directory where the ML-Agents Toolkit is installed.
tensorboard --logdir=summaries
```sh
tensorboard --logdir=summaries
```
4. Open a browser window and navigate to [localhost:6006](http://localhost:6006).

## The ML-Agents toolkit training statistics
The ML-agents training program saves the following statistics:
The ML-Agents training program saves the following statistics:
![Example TensorBoard Run](images/mlagents-TensorBoard.png)

26
docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md


在打开 3D Balance Ball 场景后,您可能会首先注意到它包含的
不是一个平台,而是多个平台。场景中的每个平台都是
独立的 agent,但它们全部共享同一个 brain。3D Balance Ball 通过
独立的 agent,但它们全部共享同一个 Brain。3D Balance Ball 通过
这种方式可以加快训练速度,因为所有 12 个 agent 可以并行参与训练任务。
### Academy

Brain 不存储关于 agent 的任何信息,
只是将 agent 收集的观测结果发送到决策过程,
然后将所选的动作返回给 agent。因此,所有 agent 可共享
同一个 brain,但会独立行动。Brain 设置可以提供很多
同一个 Brain,但会独立行动。Brain 设置可以提供很多
**Heuristic** brain 允许您通过扩展 Decision 类来对 agent 的逻辑进行
手动编码。最后,**Player** brain 可让您将键盘命令
**Heuristic** Brain 允许您通过扩展 Decision 类来对 agent 的逻辑进行
手动编码。最后,**Player** Brain 可让您将键盘命令
会非常有用。如果这些类型的 brain 都不能满足您的需求,您可以
会非常有用。如果这些类型的 Brain 都不能满足您的需求,您可以
实现自己的 CoreBrain 来创建自有的类型。
在本教程中,进行训练时,需要将 **Brain Type** 设置为 **External**

**向量运动空间**
brain 以*动作*的形式向 agent 提供指令。与状态
Brain 以*动作*的形式向 agent 提供指令。与状态
`RigidBody` 上的力或扭矩。**Discrete** 向量运动空间将其动作
`Rigidbody` 上的力或扭矩。**Discrete** 向量运动空间将其动作
定义为一个表。提供给 agent 的具体动作是这个表的
索引。

平台游戏对象上。基础 Agent 对象有一些影响其行为的
属性:
* **Brain** — 每个 Agent 必须有一个 Brain。brain 决定了 agent 如何
* **Brain** — 每个 Agent 必须有一个 Brain。Brain 决定了 agent 如何
brain。
Brain。
* **Visual Observations** — 定义 agent 用来观测其环境的
任何 Camera 对象。3D Balance Ball 不使用摄像机观测。
* **Max Step** — 定义在 agent 决定自己完成之前可以发生多少个

agent 的 Brain 实例设置为状态大小为 8 的连续向量观测空间,
因此 `CollectObservations()` 必须调用 8 次
`AddVectorObs`
* Agent.AgentAction() — 在每个模拟步骤调用。接收 brain 选择的
* Agent.AgentAction() — 在每个模拟步骤调用。接收 Brain 选择的
动作。Ball3DAgent 示例可以处理连续和离散
运动空间类型。在此环境中,两种状态类型之间实际上
没有太大的差别:这两种向量运动空间在每一步都会

![3DBall 场景](images/mlagents-Open3DBall.png)
由于我们要建立此环境来进行训练,因此我们需要
将 agent 使用的 brain 设置为 **External**。这样 agent 在
将 agent 使用的 Brain 设置为 **External**。这样 agent 在
进行决策时能够与外部训练过程进行通信。
1. 在 **Scene** 窗口中,单击 Ball3DAcademy 对象旁边的三角形

一旦训练过程完成,并且训练过程保存了模型
(通过 `Saved Model` 消息可看出),您便可以将该模型添加到 Unity 项目中,
然后将其用于 brain 类型为 **Internal** 的 agent。
然后将其用于 Brain 类型为 **Internal** 的 agent。
### 设置 TensorFlowSharp 支持

1. 确保 TensorFlowSharp 插件位于 `Assets` 文件夹中。
可在
[此处](https://s3.amazonaws.com/unity-ml-agents/0.3/TFSharpPlugin.unitypackage)下载一个包含 TF# 的 Plugins 文件夹。
[此处](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)下载一个包含 TF# 的 Plugins 文件夹。
下载后,双击并将其导入。您可以在 Project 选项卡中
(位于 `Assets` > `ML-Agents` > `Plugins` > `Computer` 下)
检查 TensorFlow 的相关文件来查看是否安装成功

4
docs/localized/zh-CN/docs/Installation.md


### Mac 和 Unix 用户
如果您的 Python 环境不包括 `pip`,请参阅这些
如果您的 Python 环境不包括 `pip3`,请参阅这些
[说明](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
以了解其安装方法。

## Unity 包
您可以通过 Unity 包的形式下载TensorFlowSharp 插件([AWS S3链接](https://s3.amazonaws.com/unity-ml-agents/0.3/TFSharpPlugin.unitypackage),[百度盘链接](https://pan.baidu.com/s/1s0mJN8lvuxTcYbs2kL2FqA))
您可以通过 Unity 包的形式下载TensorFlowSharp 插件([AWS S3链接](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage),[百度盘链接](https://pan.baidu.com/s/1s0mJN8lvuxTcYbs2kL2FqA))
## 帮助

4
docs/localized/zh-CN/docs/Learning-Environment-Create-New.md


**动作**
Brain 的决策以动作数组的形式传递给 `AgentAction()` 函数。此数组中的元素数量由 agent 的 Brain 的 `Vector Action Space Type``Vector Action Space Size` 设置确定。RollerAgent 使用连续向量运动空间,并需要 brain 提供的两个连续控制信号。因此,我们要将 Brain `Vector Action Size` 设置为 2。第一个元素 `action[0]` 确定沿 x 轴施加的力;`action[1]` 确定沿 z 轴施加的力。(如果我们允许 agent 以三维方式移动,那么我们需要将 `Vector Action Size` 设置为 3。)注意,Brain 并不知道动作数组中的值是什么意思。训练过程只是根据观测输入来调整动作值,然后看看会得到什么样的奖励。
Brain 的决策以动作数组的形式传递给 `AgentAction()` 函数。此数组中的元素数量由 agent 的 Brain 的 `Vector Action Space Type``Vector Action Space Size` 设置确定。RollerAgent 使用连续向量运动空间,并需要 Brain 提供的两个连续控制信号。因此,我们要将 Brain `Vector Action Size` 设置为 2。第一个元素 `action[0]` 确定沿 x 轴施加的力;`action[1]` 确定沿 z 轴施加的力。(如果我们允许 agent 以三维方式移动,那么我们需要将 `Vector Action Size` 设置为 3。)注意,Brain 并不知道动作数组中的值是什么意思。训练过程只是根据观测输入来调整动作值,然后看看会得到什么样的奖励。
RollerAgent 使用 `Rigidbody.AddForce` 函数将 action[] 数组中的值应用到其 Rigidbody 组件 `rBody`

1. 选择 Brain 游戏对象以便在 Inspector 中查看该对象的属性。
2. 将 **Brain Type** 设置为 **Player**
3. 展开 **Continuous Player Actions**(仅在使用 **Player* brain 时可见)。
3. 展开 **Continuous Player Actions**(仅在使用 **Player* Brain 时可见)。
4. 将 **Size** 设置为 4。
5. 设置以下映射:

14
docs/localized/zh-CN/docs/Learning-Environment-Design.md


训练和模拟过程以 ML-Agents Academy 类编排的步骤进行。Academy 与场景中的 Agent 和 Brain 对象一起协作逐步完成模拟。当 Academy 已达到其最大步数或场景中的所有 agent 均_完成_时,一个训练场景即完成。
在训练期间,处于外部的 Python 进程会在训练过程中与 Academy 不断进行通信以便运行一系列场景,同时会收集数据并优化其神经网络模型。分配给 agent 的 Brain 类型决定了我们是否进行训练。**External** brain 会与外部过程进行通信以训练 TensorFlow 模型。成功完成训练后,您可以将经过训练的模型文件添加到您的 Unity 项目中,以便提供给 **Internal** brain 来控制agent的行为。
在训练期间,处于外部的 Python 进程会在训练过程中与 Academy 不断进行通信以便运行一系列场景,同时会收集数据并优化其神经网络模型。分配给 agent 的 Brain 类型决定了我们是否进行训练。**External** Brain 会与外部过程进行通信以训练 TensorFlow 模型。成功完成训练后,您可以将经过训练的模型文件添加到您的 Unity 项目中,以便提供给 **Internal** Brain 来控制agent的行为。
ML-Agents Academy 类按如下方式编排 agent 模拟循环:

4. 使用每个 agent 的 Brain 类来决定 agent 的下一动作。
5. 调用您的子类的 `AcademyAct()` 函数。
6. 对场景中的每个 agent 调用 `AgentAction()` 函数,传入由 agent 的 brain 选择的动作。(如果 agent 已完成,则不调用此函数。)
6. 对场景中的每个 agent 调用 `AgentAction()` 函数,传入由 agent 的 Brain 选择的动作。(如果 agent 已完成,则不调用此函数。)
7. 如果 agent 已达到其 `Max Step` 计数或者已将其自身标记为 `done`,则调用 agent 的 `AgentOnDone()` 函数。或者,如果某个 agent 在场景结束之前已完成,您可以将其设置为重新开始。在这种情况下,Academy 会调用 `AgentReset()` 函数。
8. 当 Academy 达到其自身的 `Max Step` 计数时,它会通过调用您的 Academy 子类的 `AcademyReset()` 函数来再次开始下一场景。

[Screenshot of scene hierarchy]
您必须为每个 agent 分配一个 brain,但可以在多个 agent 之间共享 brain。每个 agent 都将进行自己的观测并独立行动,但会使用相同的决策逻辑,而对于 **Internal** brain,则会使用相同的经过训练的 TensorFlow 模型。
您必须为每个 agent 分配一个 Brain,但可以在多个 agent 之间共享 Brain。每个 agent 都将进行自己的观测并独立行动,但会使用相同的决策逻辑,而对于 **Internal** Brain,则会使用相同的经过训练的 TensorFlow 模型。
### Academy

Brain 内部封装了决策过程。Brain 对象必须放在 Hierarchy 视图中的 Academy 的子级。我们必须为每个 Agent 分配一个 Brain,但可以在多个 Agent 之间共享同一个 Brain。
当我们使用 Brain 类的时候不需要使用其子类,而应该直接使用 Brain 这个类。Brain 的行为取决于 brain 的类型。在训练期间,应将 agent 上连接的 Brain 的 Brain Type 设置为 **External**。要使用经过训练的模型,请将模型文件导入 Unity 项目,并将对应 Brain 的 Brain Type 更改为 **Internal**。请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以了解有关使用不同类型的 Brain 的详细信息。如果四种内置的类型不能满足您的需求,您可以扩展 CoreBrain 类以创建其它的 Brain 类型。
当我们使用 Brain 类的时候不需要使用其子类,而应该直接使用 Brain 这个类。Brain 的行为取决于 Brain 的类型。在训练期间,应将 agent 上连接的 Brain 的 Brain Type 设置为 **External**。要使用经过训练的模型,请将模型文件导入 Unity 项目,并将对应 Brain 的 Brain Type 更改为 **Internal**。请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以了解有关使用不同类型的 Brain 的详细信息。如果四种内置的类型不能满足您的需求,您可以扩展 CoreBrain 类以创建其它的 Brain 类型。
Brain 类有若干可以使用 Inspector 窗口进行设置的重要属性。对于使用 brain 的 agent,这些属性必须恰当。例如,`Vector Observation Space Size` 属性必须与 agent 创建的特征向量的长度完全匹配。请参阅 [Agent](/docs/Learning-Environment-Design-Agents.md) 以获取有关创建 agent 和正确设置 Brain 实例的信息。
Brain 类有若干可以使用 Inspector 窗口进行设置的重要属性。对于使用 Brain 的 agent,这些属性必须恰当。例如,`Vector Observation Space Size` 属性必须与 agent 创建的特征向量的长度完全匹配。请参阅 [Agent](/docs/Learning-Environment-Design-Agents.md) 以获取有关创建 agent 和正确设置 Brain 实例的信息。
请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以查看 Brain 属性的完整列表。

要创建 agent,请扩展 Agent 类并实现基本的 `CollectObservations()``AgentAction()` 方法:
* `CollectObservations()` — 收集 agent 对其环境的观测结果。
* `AgentAction()` — 执行由 agent 的 brain 选择的动作,并为当前状态分配奖励。
* `AgentAction()` — 执行由 agent 的 Brain 选择的动作,并为当前状态分配奖励。
这些函数的实现决定了分配给此 agent 的 Brain 的属性要如何设置。

在 Unity 中创建训练环境时,必须设置场景以便可以通过外部训练过程来控制场景。注意以下几点:
* 在训练程序启动后,Unity 可执行文件会被自动打开,然后训练场景会自动开始训练。
* 场景中至少须包括一个 **External** brain。
* 场景中至少须包括一个 **External** Brain。
* Academy 必须在每一轮训练后将场景重置为有效的初始状态。
* 训练场景必须有明确的结束状态,为此需要使用 `Max Steps`,或让每个 agent 将自身设置为 `done`

42
docs/localized/zh-CN/docs/Learning-Environment-Examples.md


* 训练环境:一种线性移动任务,在此任务中 agent 必须向左或向右移动到奖励状态。
* 目标:移动到最高奖励状态。
* Agent设置:环境包含一个 agent,上面附带了单个 brain。
* Agent设置:环境包含一个 agent,上面附带了单个 Brain。
* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(离散变量)一个变量,对应于当前状态。
* 向量运动空间:(离散变量)两个可能的动作(向左移动、向右移动)。
* 视觉观测:0

* 训练环境:一种平衡球任务,在此任务中 agent 需要控制平台。
* 目标:agent 必须平衡平台,以尽可能长时间在平台上保持球不掉落。
* Agent设置:环境包含 12 个全部链接到单个 brain 的同类 agent。
* Agent设置:环境包含 12 个全部链接到单个 Brain 的同类 agent。
* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)8 个,对应于平台的旋转以及球的位置、旋转和速度。
* 向量观测空间(困难版本,因为观测到的信息减少了):(连续变量)5 个变量,对应于平台的旋转以及球的位置和旋转。
* 向量运动空间:(连续变量)2 个,其中一个值对应于 X 旋转,而另一个值对应于 Z 旋转。

* 训练环境:某一个典型版本的的grid-world任务。场景包含 agent、目标和障碍。
* 目标:agent 必须在网格中避开障碍的同时移动到目标。
* Agent设置:环境包含一个链接到单个 brain 的 agent。
* Agent设置:环境包含一个链接到单个 Brain 的 agent。
* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:无
* 向量运动空间:(离散变量)4 个,对应于基本方向的移动。
* 视觉观测:一个对应于 GridWorld 自上而下的视图。

* 训练环境:agent 控制球拍将球弹过球网的双人游戏。
* 目标:agent 必须在彼此之间弹起网球,同时不能丢球或击球出界。
* Agent设置:环境包含两个链接到单个 brain(名为 TennisBrain)的 agent。在训练之后,您可以将另一个名为 MyBrain 的 brain 附加到其中一个 agent,从而与经过训练的模型进行游戏比赛。
* Agent设置:环境包含两个链接到单个 Brain TennisBrain)的 agent。在训练之后,您可以将另一个名为 MyBrain 的 Brain 附加到其中一个 agent,从而与经过训练的模型进行游戏比赛。
* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)8 个,分别对应于球和球拍的位置和速度。
* 向量运动空间:(连续变量)2 个,分别对应于朝向球网或远离球网的运动,以及上下的运动。
* 视觉观测:无

* 训练环境:一个平台,agent 可以在该平台上推动方块。
* 目标:agent 必须将方块推向目标。
* Agent设置:环境包含一个链接到单个 brain 的 agent。
* Agent设置:环境包含一个链接到单个 Brain 的 agent。
* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)15 个,分别对应于 agent、方块和目标的位置和速度。
* 向量运动空间:(连续变量)2 个,分别对应于 X 和 Z 方向的移动。
* 视觉观测:无。

* 训练环境:一个平台环境,agent 可以在该环境中跳过墙。
* 目标:agent 必须利用一个方块越过墙并到达目标。
* Agent设置:环境包含一个链接到两个不同 brain 的 agent。agent 链接到的 brain 根据墙的高度而变化。
* Agent设置:环境包含一个链接到两个不同 Brain 的 agent。agent 链接到的 Brain 根据墙的高度而变化。
* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)16 个,分别对应于 agent、方块和目标的位置和速度以及墙的高度。
* 向量运动空间:(离散变量)74 个,分别对应于 14 个射线投射,每个射线投射可检测 4 个可能的物体,加上 agent 的全局位置以及 agent 是否落地。
* 视觉观测:无。

* 训练环境:可以移动到目标位置的双关节臂。
* 目标:agent 必须将手移动到目标位置,并保持在此处。
* Agent设置:环境包含 32 个链接到单个 brain 的 agent。
* Agent设置:环境包含 32 个链接到单个 Brain 的 agent。
* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)26 个,对应于两个机械臂 Rigidbody 的位置、旋转、速度和角速度。
* 向量运动空间:(连续变量)4 个,对应于两个关节的两个方向上的转动。
* 视觉观测:无

* 训练环境:一种有 4 个手臂的生物,每个手臂分两节
* 目标:agent 必须沿 x 轴移动其身体,并且保持不跌倒。
* Agent设置:环境包含 3 个链接到单个 brain 的 agent。
* Agent设置:环境包含 3 个链接到单个 Brain 的 agent。
* Agent 奖励函数设置(agent互相之间独立):
* +1 乘以 x 方向的速度
* 跌倒时 -1。

* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)117 个,对应于每个肢体的位置、旋转、速度和角速度以及身体的加速度和角速度。
* 向量运动空间:(连续变量)12 个,对应于适用于 12 个关节的扭矩。
* 视觉观测:无

* 训练环境:一个包含多个 agent 的环境,这些 agent 争相收集香蕉。
* 目标:agent 必须学习尽可能接近更多的黄色香蕉,同时避开红色香蕉。
* Agent设置:环境包含 10 个链接到单个 brain 的 agent。
* Agent设置:环境包含 10 个链接到单个 Brain 的 agent。
* Brain 设置:一个有以下观测/运动空间的 brain。
* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)51 个,对应于 agent 的速度, agent 前进方向,以及 agent 对周围物体进行基于射线的感知。
* 向量运动空间:(连续变量)3 个,对应于向前移动,绕 y 轴旋转,以及是否使用激光使其他 agent 瘫痪。
* 视觉观测(可选):每个 agent 的第一人称视图。

* 训练环境:在一个环境中,agent 需要在房间内查找信息、记住信息并使用信息移动到正确目标。
* 目标:移动到与房间内的方块的颜色相对应的目标。
* Agent设置:环境包含一个链接到单个 brain 的 agent。
* Agent设置:环境包含一个链接到单个 Brain 的 agent。
* Agent 奖励函数设置(agent互相之间独立):
* 移动到正确目标时 +1。
* 移动到错误目标时 -0.1。

* 训练环境:在一个环境中,agent 需要按需决策。agent 必须决定在接触地面时如何进行下一次弹跳。
* 目标:抓住漂浮的香蕉。跳跃次数有限。
* Agent设置:环境包含一个链接到单个 brain 的 agent。
* Agent设置:环境包含一个链接到单个 Brain 的 agent。
* Agent 奖励函数设置(agent互相之间独立):
* 抓住香蕉时 +1。
* 弹跳出界时 -1。

* 目标:
* 前锋:让球进入对手的球门。
* 守门员:防止球进入自己的球门。
* Agent设置:环境包含四个 agent,其中两个链接到一个 brain(前锋),两个链接到另一个 brain(守门员)。
* Agent设置:环境包含四个 agent,其中两个链接到一个 Brain(前锋),两个链接到另一个 Brain(守门员)。
* Agent 奖励函数设置(agent互相之间非独立):
* 前锋:
* 球进入对手球门时 +1。

2
docs/localized/zh-CN/docs/ML-Agents-Overview.md


我们将 Brain 类型切换为 Internal,并加入从训练阶段
生成的 TensorFlow 模型。现在,在预测阶段,军医
仍然继续生成他们的观测结果,但不再将结果发送到
Python API,而是送入他们的嵌入了的 Tensorflow 模型,
Python API,而是送入他们的嵌入了的 TensorFlow 模型,
以便生成每个军医在每个时间点上要采取的_最佳_动作。
总结一下:我们的实现是基于 TensorFlow 的,因此,

183
ml-agents/README.md


# Unity ml-agents interface and trainers
# Unity ML-Agents Python Interface and Trainers
The `mlagents` package contains two components : The low level API which allows
you to interact directly with a Unity Environment and a training component whcih
allows you to train agents in Unity Environments using our implementations of
reinforcement learning or imitation learning.
The `mlagents` Python package is part of the
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
`mlagents` provides a Python API that allows direct interaction with the Unity
game engine as well as a collection of trainers and algorithms to train agents
in Unity environments.
The `mlagents` Python package contains two components: The low level API which
allows you to interact directly with a Unity Environment (`mlagents.envs`) and
an entry point to train (`mlagents-learn`) which allows you to train agents in
Unity Environments using our implementations of reinforcement learning or
imitation learning.
The `ml-agents` package can be installed using:
Install `mlagents` with:
or by running the following from the `ml-agents` directory of the repository:
```sh
pip install .
```
## `mlagents.envs`
The ML-Agents toolkit provides a Python API for controlling the agent simulation
loop of a environment or game built with Unity. This API is used by the ML-Agent
training algorithms (run with `mlagents-learn`), but you can also write your
Python programs using this API.
The key objects in the Python API include:
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BrainInfo** — contains all the data from agents in the simulation, such as
observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
These classes are all defined in the `ml-agents/mlagents/envs` folder of
the ML-Agents SDK.
To communicate with an agent in a Unity environment from a Python program, the
agent must either use an **External** brain or use a brain that is broadcasting
(has its **Broadcast** property set to true). Your code is expected to return
actions for agents with external brains, but can only observe broadcasting
brains (the information you receive for an agent is the same in both cases).
_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._
### Loading a Unity Environment
Python-side communication happens through `UnityEnvironment` which is located in
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
file, put the file in the same directory as `envs`. For example, if the filename
of your Unity environment is 3DBall.app, in python, run:
```python
from mlagents.env import UnityEnvironment
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
```
- `file_name` is the name of the environment binary (located in the root
directory of the python project).
- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which do not involve physics calculations,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
## Usage & More Information
If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the :arrow_forward: button in the Editor when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
### Interacting with a Unity Environment
A BrainInfo object contains the following fields:
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
the list corresponds to the n<sup>th</sup> observation of the brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
size, vector observation size)`.
- **`text_observations`** : A list of string corresponding to the agents text
observations.
- **`memories`** : A two dimensional numpy array of dimension `(batch size,
memory size)` which corresponds to the memories sent at the previous step.
- **`rewards`** : A list as long as the number of agents using the brain
containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of agents using the brain
containing `done` flags (whether or not the agent is done).
- **`max_reached`** : A list as long as the number of agents using the brain
containing true if the agents reached their max steps.
- **`agents`** : A list of the unique ids of the agents using the brain.
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
size, vector action size)` if the vector action space is continuous and
`(batch size, number of branches)` if the vector action space is discrete.
Once loaded, you can use your UnityEnvironment object, which referenced by a
variable named `env` in this example, can be used in the following way:
- **Print : `print(str(env))`**
Prints all parameters relevant to the loaded environment and the external
brains.
- **Reset : `env.reset(train_model=True, config=None)`**
Send a reset signal to the environment, and provides a dictionary mapping
brain names to BrainInfo objects.
- `train_model` indicates whether to run the environment in train (`True`) or
test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the
environment. For generic environments, `config` can be ignored. `config` is
a dictionary of strings to floats where the keys are the names of the
`resetParameters` and the values are their corresponding float values.
Define the reset parameters on the Academy Inspector window in the Unity
Editor.
- **Step : `env.step(action, memory=None, text_action=None)`**
Sends a step signal to the environment using the actions. For each brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have
multiple agents per brains.
- `memory` is an optional input that can be used to send a list of floats per
agents to be retrieved at the next step.
- `text_action` is an optional input that be used to send a single string per
agent.
Returns a dictionary mapping brain names to BrainInfo objects.
For example, to access the BrainInfo belonging to a brain called
'brain_name', and the BrainInfo field 'vector_observations':
```python
info = env.step()
brainInfo = info['brain_name']
observations = brainInfo.vector_observations
```
Note that if you have more than one external brain in the environment, you
must provide dictionaries from brain names to arrays for `action`, `memory`
and `value`. For example: If you have two external brains named `brain1` and
`brain2` each with one agent taking two continuous actions, then you can
have:
```python
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
```
Returns a dictionary mapping brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication
socket.
## `mlagents.trainers`
1. Open a command or terminal window.
2. Run
```sh
mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train <environment-name>
```
Where:
- `<trainer-config-path>` is the relative or absolute filepath of the trainer
configuration. The defaults used by environments in the ML-Agents SDK can be
found in `config/trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs
- The `--train` flag tells `mlagents-learn` to run a training session (rather
than inference)
- `<environment-name>` __(Optional)__ is the path to the Unity executable you
want to train. __Note:__ If this argument is not passed, the training
will be made through the editor.
For more detailled documentation, check out the
[ML-Agents toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md)
For more detailed documentation, check out the
[ML-Agents Toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md)

149
docs/Python-API.md


# Unity ML-Agents Python Interface and Trainers
The `mlagents` Python package is part of the [ML-Agents
Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents` provides a
Python API that allows direct interaction with the Unity game engine as well as
a collection of trainers and algorithms to train agents in Unity environments.
The `mlagents` Python package contains two components: a low level API which
allows you to interact directly with a Unity Environment (`mlagents.envs`) and
an entry point to train (`mlagents-learn`) which allows you to train agents in
Unity Environments using our implementations of reinforcement learning or
imitation learning.
## mlagents.envs
The ML-Agents Toolkit provides a Python API for controlling the Agent simulation
loop of an environment or game built with Unity. This API is used by the
training algorithms inside the ML-Agent Toolkit, but you can also write your own
Python programs using this API.
The key objects in the Python API include:
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BrainInfo** — contains all the data from Agents in the simulation, such as
observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
These classes are all defined in the `ml-agents/mlagents/envs` folder of
the ML-Agents SDK.
To communicate with an Agent in a Unity environment from a Python program, the
Agent must either use an **External** Brain or use a Brain that is broadcasting
(has its **Broadcast** property set to true). Your code is expected to return
actions for Agents with external Brains, but can only observe broadcasting
Brains (the information you receive for an Agent is the same in both cases).
_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._
### Loading a Unity Environment
Python-side communication happens through `UnityEnvironment` which is located in
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
file, put the file in the same directory as `envs`. For example, if the filename
of your Unity environment is 3DBall.app, in python, run:
```python
from mlagents.env import UnityEnvironment
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
```
- `file_name` is the name of the environment binary (located in the root
directory of the python project).
- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which do not involve physics calculations,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the :arrow_forward: button in the Editor when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
### Interacting with a Unity Environment
A BrainInfo object contains the following fields:
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
the list corresponds to the n<sup>th</sup> observation of the Brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
size, vector observation size)`.
- **`text_observations`** : A list of string corresponding to the Agents text
observations.
- **`memories`** : A two dimensional numpy array of dimension `(batch size,
memory size)` which corresponds to the memories sent at the previous step.
- **`rewards`** : A list as long as the number of Agents using the Brain
containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of Agents using the Brain
containing `done` flags (whether or not the Agent is done).
- **`max_reached`** : A list as long as the number of Agents using the Brain
containing true if the Agents reached their max steps.
- **`agents`** : A list of the unique ids of the Agents using the Brain.
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
size, vector action size)` if the vector action space is continuous and
`(batch size, number of branches)` if the vector action space is discrete.
Once loaded, you can use your UnityEnvironment object, which referenced by a
variable named `env` in this example, can be used in the following way:
- **Print : `print(str(env))`**
Prints all parameters relevant to the loaded environment and the external
Brains.
- **Reset : `env.reset(train_model=True, config=None)`**
Send a reset signal to the environment, and provides a dictionary mapping
Brain names to BrainInfo objects.
- `train_model` indicates whether to run the environment in train (`True`) or
test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the
environment. For generic environments, `config` can be ignored. `config` is
a dictionary of strings to floats where the keys are the names of the
`resetParameters` and the values are their corresponding float values.
Define the reset parameters on the Academy Inspector window in the Unity
Editor.
- **Step : `env.step(action, memory=None, text_action=None)`**
Sends a step signal to the environment using the actions. For each Brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have
multiple Agents per Brain.
- `memory` is an optional input that can be used to send a list of floats per
Agents to be retrieved at the next step.
- `text_action` is an optional input that be used to send a single string per
Agent.
Returns a dictionary mapping Brain names to BrainInfo objects.
For example, to access the BrainInfo belonging to a Brain called
'brain_name', and the BrainInfo field 'vector_observations':
```python
info = env.step()
brainInfo = info['brain_name']
observations = brainInfo.vector_observations
```
Note that if you have more than one external Brain in the environment, you
must provide dictionaries from Brain names to arrays for `action`, `memory`
and `value`. For example: If you have two external Brains named `brain1` and
`brain2` each with one Agent taking two continuous actions, then you can
have:
```python
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
```
Returns a dictionary mapping Brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication
socket.
## mlagents-learn
For more detailed documentation on using `mlagents-learn`, check out
[Training ML-Agents](Training-ML-Agents.md)

158
gym-unity/README.md


# Unity ML-Agents Gym Wrapper
A common way in which machine learning researchers interact with simulation
environments is via a wrapper provided by OpenAI called `gym`. For more
information on the gym interface, see [here](https://github.com/openai/gym).
We provide a a gym wrapper, and instructions for using it with existing machine
learning algorithms which utilize gyms. Both wrappers provide interfaces on top
of our `UnityEnvironment` class, which is the default way of interfacing with a
Unity environment via Python.
## Installation
The gym wrapper can be installed using:
```sh
pip install gym_unity
```
or by running the following from the `/gym-unity` directory of the repository:
```sh
pip install .
```
## Using the Gym Wrapper
The gym interface is available from `gym_unity.envs`. To launch an environmnent
from the root of the project repository use:
```python
from gym_unity.envs import UnityEnv
env = UnityEnv(environment_filename, worker_id, default_visual, multiagent)
```
* `environment_filename` refers to the path to the Unity environment.
* `worker_id` refers to the port to use for communication with the environment.
Defaults to `0`.
* `use_visual` refers to whether to use visual observations (True) or vector
observations (False) as the default observation provided by the `reset` and
`step` functions. Defaults to `False`.
* `multiagent` refers to whether you intent to launch an environment which
contains more than one agent. Defaults to `False`.
The returned environment `env` will function as a gym.
For more on using the gym interface, see our
[Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb).
## Limitation
* It is only possible to use an environment with a single Brain.
* By default the first visual observation is provided as the `observation`, if
present. Otherwise vector observations are provided.
* All `BrainInfo` output from the environment can still be accessed from the
`info` provided by `env.step(action)`.
* Stacked vector observations are not supported.
* Environment registration for use with `gym.make()` is currently not supported.
## Running OpenAI Baselines Algorithms
OpenAI provides a set of open-source maintained and tested Reinforcement
Learning algorithms called the [Baselines](https://github.com/openai/baselines).
Using the provided Gym wrapper, it is possible to train ML-Agents environments
using these algorithms. This requires the creation of custom training scripts to
launch each algorithm. In most cases these scripts can be created by making
slightly modifications to the ones provided for Atari and Mujoco environments.
### Example - DQN Baseline
In order to train an agent to play the `GridWorld` environment using the
Baselines DQN algorithm, create a file called `train_unity.py` within the
`baselines/deepq/experiments` subfolder of the baselines repository. This file
will be a modification of the `run_atari.py` file within the same folder. Then
create and `/envs/` directory within the repository, and build the GridWorld
environment to that directory. For more information on building Unity
environments, see [here](../docs/Learning-Environment-Executable.md). Add the
following code to the `train_unity.py` file:
```python
import gym
from baselines import deepq
from gym_unity.envs import UnityEnv
def main():
env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
model = deepq.models.cnn_to_mlp(
convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
hiddens=[256],
dueling=True,
)
act = deepq.learn(
env,
q_func=model,
lr=1e-3,
max_timesteps=100000,
buffer_size=50000,
exploration_fraction=0.1,
exploration_final_eps=0.02,
print_freq=10,
)
print("Saving model to unity_model.pkl")
act.save("unity_model.pkl")
if __name__ == '__main__':
main()
```
To start the training process, run the following from the root of the baselines
repository:
```sh
python -m baselines.deepq.experiments.train_unity
```
### Other Algorithms
Other algorithms in the Baselines repository can be run using scripts similar to
the example provided above. In most cases, the primary changes needed to use a
Unity environment are to import `UnityEnv`, and to replace the environment
creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)`
passing the environment binary path.
A typical rule of thumb is that for vision-based environments, modification
should be done to Atari training scripts, and for vector observation
environments, modification should be done to Mujoco scripts.
Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()`
functions. These are defined in `baselines/common/cmd_util.py`. In order to use
Unity environments for these algorithms, add the following import statement and
function to `cmd_utils.py`:
```python
from gym_unity.envs import UnityEnv
def make_unity_env(env_directory, num_env, visual, start_index=0):
"""
Create a wrapped, monitored Unity environment.
"""
def make_env(rank): # pylint: disable=C0111
def _thunk():
env = UnityEnv(env_directory, rank, use_visual=True)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk
if visual:
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
else:
rank = MPI.COMM_WORLD.Get_rank()
env = UnityEnv(env_directory, rank, use_visual=False)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
```

1
MLAgentsSDK/README.md


# ML-Agents SDK

127
gym-unity/Readme.md


# Unity ML-Agents Gym Wrapper
A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by OpenAI called `gym`. For more information on the gym interface, see [here](https://github.com/openai/gym).
We provide a a gym wrapper, and instructions for using it with existing machine learning algorithms which utilize gyms. Both wrappers provide interfaces on top of our `UnityEnvironment` class, which is the default way of interfacing with a Unity environment via Python.
## Installation
The gym wrapper can be installed using:
```
pip install gym_unity
```
or by running the following from the `/gym-unity` directory of the repository:
```
pip install .
```
## Using the Gym Wrapper
The gym interface is available from `gym_unity.envs`. To launch an environmnent from the root of the project repository use:
```python
from gym_unity.envs import UnityEnv
env = UnityEnv(environment_filename, worker_id, default_visual, multiagent)
```
* `environment_filename` refers to the path to the Unity environment.
* `worker_id` refers to the port to use for communication with the environment. Defaults to `0`.
* `use_visual` refers to whether to use visual observations (True) or vector observations (False) as the default observation provided by the `reset` and `step` functions. Defaults to `False`.
* `multiagent` refers to whether you intent to launch an environment which contains more than one agent. Defaults to `False`.
The returned environment `env` will function as a gym.
For more on using the gym interface, see our [Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb).
## Limitation
* It is only possible to use an environment with a single Brain.
* By default the first visual observation is provided as the `observation`, if
present. Otherwise vector observations are provided.
* All `BrainInfo` output from the environment can still be accessed from the
`info` provided by `env.step(action)`.
* Stacked vector observations are not supported.
* Environment registration for use with `gym.make()` is currently not supported.
## Running OpenAI Baselines Algorithms
OpenAI provides a set of open-source maintained and tested Reinforcement Learning algorithms called the [Baselines](https://github.com/openai/baselines).
Using the provided Gym wrapper, it is possible to train ML-Agents environments using these algorithms. This requires the creation of custom training scripts to launch each algorithm. In most cases these scripts can be created by making slightly modifications to the ones provided for Atari and Mujoco environments.
### Example - DQN Baseline
In order to train an agent to play the `GridWorld` environment using the Baselines DQN algorithm, create a file called `train_unity.py` within the `baselines/deepq/experiments` subfolder of the baselines repository. This file will be a modification of the `run_atari.py` file within the same folder. Then create and `/envs/` directory within the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). Add the following code to the `train_unity.py` file:
```
import gym
from baselines import deepq
from gym_unity.envs import UnityEnv
def main():
env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
model = deepq.models.cnn_to_mlp(
convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
hiddens=[256],
dueling=True,
)
act = deepq.learn(
env,
q_func=model,
lr=1e-3,
max_timesteps=100000,
buffer_size=50000,
exploration_fraction=0.1,
exploration_final_eps=0.02,
print_freq=10,
)
print("Saving model to unity_model.pkl")
act.save("unity_model.pkl")
if __name__ == '__main__':
main()
```
To start the training process, run the following from the root of the baselines repository:
```
python -m baselines.deepq.experiments.train_unity
```
### Other Algorithms
Other algorithms in the Baselines repository can be run using scripts similar to the example provided above. In most cases, the primary changes needed to use a Unity environment are to import `UnityEnv`, and to replace the environment creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)` passing the environment binary path.
A typical rule of thumb is that for vision-based environments, modification should be done to Atari training scripts, and for vector observation environments, modification should be done to Mujoco scripts.
Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()` functions. These are defined in `baselines/common/cmd_util.py`. In order to use Unity environments for these algorithms, add the following import statement and function to `cmd_utils.py`:
```python
from gym_unity.envs import UnityEnv
def make_unity_env(env_directory, num_env, visual, start_index=0):
"""
Create a wrapped, monitored Unity environment.
"""
def make_env(rank): # pylint: disable=C0111
def _thunk():
env = UnityEnv(env_directory, rank, use_visual=True)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk
if visual:
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
else:
rank = MPI.COMM_WORLD.Get_rank()
env = UnityEnv(env_directory, rank, use_visual=False)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
```
正在加载...
取消
保存