浏览代码

Several, small documentation improvements (#3903)

* Several, small documentation improvements

- Re-organize main repo README
- Minor clean-ups to Python package-specific readme files
- Clean-up to Unity Inference Engine page
- Update to the docs README
- Added a specific cross-platform section in ML-Agents Overview to amplify Barracuda
- Updated the links in Limitations.md to point to the specific subsections
- Cleaned up the Designing a Learning Environment page. Added an intro paragraph.
- Updated the installation guide to specifically call out local installation
- A few minor formatting, spelling errors fixed.
/release_1_branch
GitHub 5 年前
当前提交
759e222e
共有 18 个文件被更改,包括 610 次插入465 次删除
  1. 26
      README.md
  2. 130
      com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
  3. 6
      docs/Background-TensorFlow.md
  4. 92
      docs/Custom-SideChannels.md
  5. 14
      docs/Getting-Started.md
  6. 165
      docs/Installation.md
  7. 8
      docs/Learning-Environment-Design-Agents.md
  8. 58
      docs/Learning-Environment-Design.md
  9. 94
      docs/Learning-Environment-Examples.md
  10. 2
      docs/Learning-Environment-Executable.md
  11. 8
      docs/Limitations.md
  12. 100
      docs/ML-Agents-Overview.md
  13. 68
      docs/Profiling-Python.md
  14. 14
      docs/Readme.md
  15. 60
      docs/Unity-Inference-Engine.md
  16. 158
      gym-unity/README.md
  17. 37
      ml-agents-envs/README.md
  18. 35
      ml-agents/README.md

26
README.md


<img src="docs/images/image-banner.png" align="middle" width="3000"/>
# Unity ML-Agents Toolkit (Beta)
[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_1_docs/docs/)
[![license badge](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)

## Features
- Unity environment control from Python
- 15+ sample Unity environments
- Two deep reinforcement learning algorithms, Proximal Policy Optimization (PPO)
and Soft Actor-Critic (SAC)
- 15+ [example Unity environments](docs/Learning-Environment-Examples.md)
- Self-play mechanism for training agents in adversarial scenarios
- Train memory-enhanced agents using deep reinforcement learning
- Easily definable Curriculum Learning and Generalization scenarios
- Flexible Unity SDK that can be integrated into your game or custom Unity scene
- Training using two deep reinforcement learning algorithms, Proximal Policy
Optimization (PPO) and Soft Actor-Critic (SAC)
- Self-play mechanism for training agents in adversarial scenarios
- Easily definable Curriculum Learning scenarios for complex tasks
- Train robust agents using environment randomization
- Wrap learning environments as a gym
- Utilizes the Unity Inference Engine
- Train using concurrent Unity environment instances
- Train using multiple concurrent Unity environment instances
- Utilizes the [Unity Inference Engine](docs/Unity-Inference-Engine.md) to
provide native cross-platform support
- Unity environment [control from Python](docs/Python-API.md)
- Wrap Unity learning environments as a [gym](gym-unity/README.md)
See our [ML-Agents Overview](docs/ML-Agents-Overview.md) page for detailed
descriptions of all these features.
## Releases & Documentation

130
com.unity.ml-agents/Documentation~/com.unity.ml-agents.md


# About ML-Agents package (`com.unity.ml-agents`)
The Unity ML-Agents package contains the C# SDK for the [Unity ML-Agents Toolkit].
The Unity ML-Agents package contains the C# SDK for the [Unity ML-Agents
Toolkit].
The package allows you to convert any Unity scene to into a learning
environment and train character behaviors using a variety of machine learning
algorithms. Additionally, it allows you to embed these trained behaviors back into
Unity scenes to control your characters. More specifically, the package provides
the following core functionalities:
The package allows you to convert any Unity scene to into a learning environment
and train character behaviors using a variety of machine learning algorithms.
Additionally, it allows you to embed these trained behaviors back into Unity
scenes to control your characters. More specifically, the package provides the
following core functionalities:
* Define Agents: entities, or characters, whose behavior will be learned. Agents are entities
that generate observations (through sensors), take actions, and receive rewards from
the environment.
* Define Behaviors: entities that specifiy how an agent should act. Multiple agents can
share the same Behavior and a scene may have multiple Behaviors.
* Record demonstrations of an agent within the Editor. You can use demonstrations
to help train a behavior for that agent.
* Embedding a trained behavior into the scene via the [Unity Inference Engine].
Embedded behaviors allow you to switch an Agent between learning and inference.
- Define Agents: entities, or characters, whose behavior will be learned. Agents
are entities that generate observations (through sensors), take actions, and
receive rewards from the environment.
- Define Behaviors: entities that specifiy how an agent should act. Multiple
agents can share the same Behavior and a scene may have multiple Behaviors.
- Record demonstrations of an agent within the Editor. You can use
demonstrations to help train a behavior for that agent.
- Embedding a trained behavior into the scene via the [Unity Inference Engine].
Embedded behaviors allow you to switch an Agent between learning and
inference.
Note that the *ML-Agents* package does not contain the machine learning algorithms for training
behaviors. The *ML-Agents* package only supports instrumenting a Unity scene, setting it up for
training, and then embedding the trained model back into your Unity scene. The machine learning
algorithms that orchestrate training are part of the companion [Python package].
Note that the _ML-Agents_ package does not contain the machine learning
algorithms for training behaviors. The _ML-Agents_ package only supports
instrumenting a Unity scene, setting it up for training, and then embedding the
trained model back into your Unity scene. The machine learning algorithms that
orchestrate training are part of the companion [Python package].
|**Location**|**Description**|
|---|---|
|*Documentation~*|Contains the documentation for the Unity package.|
|*Editor*|Contains utilities for Editor windows and drawers.|
|*Plugins*|Contains third-party DLLs.|
|*Runtime*|Contains core C# APIs for integrating ML-Agents into your Unity scene. |
|*Tests*|Contains the unit tests for the package.|
| **Location** | **Description** |
| ---------------- | ---------------------------------------------------------------------- |
| _Documentation~_ | Contains the documentation for the Unity package. |
| _Editor_ | Contains utilities for Editor windows and drawers. |
| _Plugins_ | Contains third-party DLLs. |
| _Runtime_ | Contains core C# APIs for integrating ML-Agents into your Unity scene. |
| _Tests_ | Contains the unit tests for the package. |
To install this *ML-Agents* package, follow the instructions in the [Package Manager documentation].
To install this _ML-Agents_ package, follow the instructions in the [Package
Manager documentation].
This version of the Unity ML-Agents package is compatible with the following versions of the
Unity Editor:
* 2018.4 and later
This version of the Unity ML-Agents package is compatible with the following
versions of the Unity Editor:
- 2018.4 and later
## Known limitations
## Known Limitations
Training is limited to the Unity Editor and Standalone builds on Windows, MacOS, and Linux with the
Mono scripting backend. Currently, training does not work with the IL2CPP scripting backend. Your
environment will default to inference mode if training is not supported or is not currently running.
Training is limited to the Unity Editor and Standalone builds on Windows, MacOS,
and Linux with the Mono scripting backend. Currently, training does not work
with the IL2CPP scripting backend. Your environment will default to inference
mode if training is not supported or is not currently running.
Inference is executed via the [Unity Inference Engine](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html).
Inference is executed via the
[Unity Inference Engine](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html).
- All platforms supported.
All platforms supported.
- All platforms supported except:
- WebGL and GLES 3/2 on Android / iPhone
**NOTE:** Mobile platform support includes:
- Vulkan for Android
- Metal for iOS.
All platforms supported except:
- WebGL and GLES 3/2 on Android / iPhone
**NOTE:** Mobile platform support includes:
- Vulkan for Android
- Metal for iOS.
### Headless Mode

`Academy.Instance.EnvironmentStep()`
### Unity Inference Engine Models
the documentation, you can checkout our
[GitHUb Repository], which also includes a number of ways to [connect with us]
including our [ML-Agents Forum].
the documentation, you can checkout our [GitHUb Repository], which also includes
a number of ways to [connect with us] including our [ML-Agents Forum].
[Unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
[Unity Inference Engine]: https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
[Package Manager documentation]: https://docs.unity3d.com/Manual/upm-ui-install.html
[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_1_docs/docs/Installation.md
[GitHUb Repository]: https://github.com/Unity-Technologies/ml-agents
[Python package]: https://github.com/Unity-Technologies/ml-agents
[Execution Order of Event Functions]: https://docs.unity3d.com/Manual/ExecutionOrder.html
[connect with us]: https://github.com/Unity-Technologies/ml-agents#community-and-feedback
[ML-Agents Forum]: https://forum.unity.com/forums/ml-agents.453/
[unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
[unity inference engine]:
https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
[package manager documentation]:
https://docs.unity3d.com/Manual/upm-ui-install.html
[installation instructions]:
https://github.com/Unity-Technologies/ml-agents/blob/release_1_docs/docs/Installation.md
[github repository]: https://github.com/Unity-Technologies/ml-agents
[python package]: https://github.com/Unity-Technologies/ml-agents
[execution order of event functions]:
https://docs.unity3d.com/Manual/ExecutionOrder.html
[connect with us]:
https://github.com/Unity-Technologies/ml-agents#community-and-feedback
[ml-agents forum]: https://forum.unity.com/forums/ml-agents.453/

6
docs/Background-TensorFlow.md


performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents Toolkit, when you
train the behavior of an agent, the output is a TensorFlow model (.nn) file that
you can then associate with an Agent. Unless you implement a new algorithm, the
use of TensorFlow is mostly abstracted away and behind the scenes.
train the behavior of an agent, the output is a model (.nn) file that you can
then associate with an Agent. Unless you implement a new algorithm, the use of
TensorFlow is mostly abstracted away and behind the scenes.
## TensorBoard

92
docs/Custom-SideChannels.md


## Overview
In order to use a side channel, it must be implemented as both Unity and Python classes.
In order to use a side channel, it must be implemented as both Unity and Python
classes.
The side channel will have to implement the `SideChannel` abstract class and the following method.
* `OnMessageReceived(IncomingMessage msg)` : You must implement this method and read the data from IncomingMessage.
The data must be read in the order that it was written.
The side channel will have to implement the `SideChannel` abstract class and the
following method.
The side channel must also assign a `ChannelId` property in the constructor. The `ChannelId` is a Guid
(or UUID in Python) used to uniquely identify a side channel. This Guid must be the same on C# and Python.
There can only be one side channel of a certain id during communication.
- `OnMessageReceived(IncomingMessage msg)` : You must implement this method and
read the data from IncomingMessage. The data must be read in the order that it
was written.
To send data from C# to Python, create an `OutgoingMessage` instance, add data to it, call the
`base.QueueMessageToSend(msg)` method inside the side channel, and call the
`OutgoingMessage.Dispose()` method.
The side channel must also assign a `ChannelId` property in the constructor. The
`ChannelId` is a Guid (or UUID in Python) used to uniquely identify a side
channel. This Guid must be the same on C# and Python. There can only be one side
channel of a certain id during communication.
To register a side channel on the Unity side, call `SideChannelManager.RegisterSideChannel` with the side channel
as only argument.
To send data from C# to Python, create an `OutgoingMessage` instance, add data
to it, call the `base.QueueMessageToSend(msg)` method inside the side channel,
and call the `OutgoingMessage.Dispose()` method.
To register a side channel on the Unity side, call
`SideChannelManager.RegisterSideChannel` with the side channel as only argument.
The side channel will have to implement the `SideChannel` abstract class. You must implement :
* `on_message_received(self, msg: "IncomingMessage") -> None` : You must implement this method and read the data
from IncomingMessage. The data must be read in the order that it was written.
The side channel will have to implement the `SideChannel` abstract class. You
must implement :
The side channel must also assign a `channel_id` property in the constructor. The `channel_id` is a UUID
(referred in C# as Guid) used to uniquely identify a side channel. This number must be the same on C# and
Python. There can only be one side channel of a certain id during communication.
- `on_message_received(self, msg: "IncomingMessage") -> None` : You must
implement this method and read the data from IncomingMessage. The data must be
read in the order that it was written.
To assign the `channel_id` call the abstract class constructor with the appropriate `channel_id` as follows:
The side channel must also assign a `channel_id` property in the constructor.
The `channel_id` is a UUID (referred in C# as Guid) used to uniquely identify a
side channel. This number must be the same on C# and Python. There can only be
one side channel of a certain id during communication.
To assign the `channel_id` call the abstract class constructor with the
appropriate `channel_id` as follows:
To send a byte array from Python to C#, create an `OutgoingMessage` instance, add data to it, and call the
`super().queue_message_to_send(msg)` method inside the side channel.
To send a byte array from Python to C#, create an `OutgoingMessage` instance,
add data to it, and call the `super().queue_message_to_send(msg)` method inside
the side channel.
To register a side channel on the Python side, pass the side channel as argument when creating the
`UnityEnvironment` object. One of the arguments of the constructor (`side_channels`) is a list of side channels.
To register a side channel on the Python side, pass the side channel as argument
when creating the `UnityEnvironment` object. One of the arguments of the
constructor (`side_channels`) is a list of side channels.
Below is a simple implementation of a side channel that will exchange ascii encoded
strings between a Unity environment and Python.
Below is a simple implementation of a side channel that will exchange ASCII
encoded strings between a Unity environment and Python.
The first step is to create the `StringLogSideChannel` class within the Unity project.
Here is an implementation of a `StringLogSideChannel` that will listen for messages
from python and print them to the Unity debug log, as well as send error messages
from Unity to python.
The first step is to create the `StringLogSideChannel` class within the Unity
project. Here is an implementation of a `StringLogSideChannel` that will listen
for messages from python and print them to the Unity debug log, as well as send
error messages from Unity to python.
```csharp
using UnityEngine;

instantiated and registered. This can typically be done wherever the logic of
the side channel makes sense to be associated, for example on a MonoBehaviour
object that might need to access data from the side channel. Here we show a
simple MonoBehaviour object which instantiates and registeres the new side
channel. If you have not done it already, make sure that the MonoBehaviour
which registers the side channel is attached to a gameobject which will
be live in your Unity scene.
simple MonoBehaviour object which instantiates and registers the new side
channel. If you have not done it already, make sure that the MonoBehaviour which
registers the side channel is attached to a GameObject which will be live in
your Unity scene.
```csharp
using UnityEngine;

### Example Python code
Now that we have created the necessary Unity C# classes, we can create their Python counterparts.
Now that we have created the necessary Unity C# classes, we can create their
Python counterparts.
```python
from mlagents_envs.environment import UnityEnvironment

super().queue_message_to_send(msg)
```
We can then instantiate the new side channel,
launch a `UnityEnvironment` with that side channel active, and send a series of
messages to the Unity environment from Python using it.
We can then instantiate the new side channel, launch a `UnityEnvironment` with
that side channel active, and send a series of messages to the Unity environment
from Python using it.
```python
# Create the channel

env.close()
```
Now, if you run this script and press `Play` the Unity Editor when prompted,
the console in the Unity Editor will display a message at every Python step.
Now, if you run this script and press `Play` the Unity Editor when prompted, the
console in the Unity Editor will display a message at every Python step.
Additionally, if you press the Space Bar in the Unity Engine, a message will
appear in the terminal.

14
docs/Getting-Started.md


Note how the `Mean Reward` value printed to the screen increases as training
progresses. This is a positive sign that training is succeeding.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the instructions in
[Using an Executable](Learning-Environment-Executable.md).
### Observing Training Progress
Once you start training using `mlagents-learn` in the way described in the

(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with compatible Agents (the Agents that generated the model). **Note:**
Do not just close the Unity Window once the `Saved Model` message appears.
Either wait for the training process to close the window or press Ctrl+C at the
command-line prompt. If you close the window manually, the `.nn` file containing
the trained model is not exported into the ml-agents folder.
Either wait for the training process to close the window or press `Ctrl+C` at
the command-line prompt. If you close the window manually, the `.nn` file
containing the trained model is not exported into the ml-agents folder.
If you've quit the training early using Ctrl+C and want to resume training, run
the same command again, appending the `--resume` flag:
If you've quit the training early using `Ctrl+C` and want to resume training,
run the same command again, appending the `--resume` flag:
```sh
mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun --resume

165
docs/Installation.md


# Installation
The ML-Agents Toolkit contains several components:
* Unity package ([`com.unity.ml-agents`](../com.unity.ml-agents/)) contains the Unity C#
SDK that will be integrated into your Unity scene.
* Three Python packages:
* [`mlagents`](../ml-agents/) contains the machine learning algorithms that enables you
to train behaviors in your Unity scene. Most users of ML-Agents will only need to
directly install `mlagents`.
* [`mlagents_envs`](../ml-agents-envs/) contains a Python API to interact with a Unity
scene. It is a foundational layer that facilitates data messaging between Unity scene
and the Python machine learning algorithms. Consequently, `mlagents` depends on `mlagents_envs`.
* [`gym_unity`](../gym-unity/) provides a Python-wrapper for your Unity scene that
supports the OpenAI Gym interface.
* Unity [Project](../Project/) that contains several
[example environments](Learning-Environment-Examples.md) that highlight the various features
of the toolkit to help you get started.
Consequently, to install and use ML-Agents you will need to:
* Install Unity (2018.4 or later)
* Install Python (3.6.1 or higher)
* Clone this repository
* Install the `com.unity.ml-agents` Unity package
* Install the `mlagents` Python package
- Unity package ([`com.unity.ml-agents`](../com.unity.ml-agents/)) contains the
Unity C# SDK that will be integrated into your Unity scene.
- Three Python packages:
- [`mlagents`](../ml-agents/) contains the machine learning algorithms that
enables you to train behaviors in your Unity scene. Most users of ML-Agents
will only need to directly install `mlagents`.
- [`mlagents_envs`](../ml-agents-envs/) contains a Python API to interact with
a Unity scene. It is a foundational layer that facilitates data messaging
between Unity scene and the Python machine learning algorithms.
Consequently, `mlagents` depends on `mlagents_envs`.
- [`gym_unity`](../gym-unity/) provides a Python-wrapper for your Unity scene
that supports the OpenAI Gym interface.
- Unity [Project](../Project/) that contains several
[example environments](Learning-Environment-Examples.md) that highlight the
various features of the toolkit to help you get started.
Consequently, to install and use the ML-Agents Toolkit you will need to:
- Install Unity (2018.4 or later)
- Install Python (3.6.1 or higher)
- Clone this repository (Optional)
- Install the `com.unity.ml-agents` Unity package
- Install the `mlagents` Python package
[Download](https://unity3d.com/get-unity/download) and install Unity. We strongly recommend
that you install Unity through the Unity Hub as it will enable you to manage multiple Unity
versions.
[Download](https://unity3d.com/get-unity/download) and install Unity. We
strongly recommend that you install Unity through the Unity Hub as it will
enable you to manage multiple Unity versions.
We recommend [installing](https://www.python.org/downloads/) Python 3.6 or 3.7. If your Python
environment doesn't include `pip3`, see these
We recommend [installing](https://www.python.org/downloads/) Python 3.6 or 3.7.
If your Python environment doesn't include `pip3`, see these
Although we do not provide support for Anaconda installation on Windows, the previous
Although we do not provide support for Anaconda installation on Windows, the
previous
### Clone the ML-Agents Toolkit Repository
### Clone the ML-Agents Toolkit Repository (Optional)
Now that you have installed Unity and Python, you will need to clone the
ML-Agents Toolkit GitHub repository to install the Unity package (the Python
packages can be installed directly from PyPi - a Python package registry).
Now that you have installed Unity and Python, you can now install the Unity and
Python packages. You do not need to clone the repository to install those
packages, but you may choose to clone the repository if you'd like download our
example environments and training configurations to experiment with them (some
of our tutorials / guides assume you have access to our example environments).
The `--branch release_1` option will switch to the tag of the latest stable release.
Omitting that will get the `master` branch which is potentially unstable.
The `--branch release_1` option will switch to the tag of the latest stable
release. Omitting that will get the `master` branch which is potentially
unstable.
#### Advanced: Local Installation for Development
You will need to clone the repository if you plan to modify or extend the
ML-Agents Toolkit for your purposes. If you plan to contribute those changes
back, make sure to clone the `master` branch (by omitting `--branch release_1`
from the command above). See our
[Contributions Guidelines](../com.unity.ml-agents/CONTRIBUTING.md) for more
information on contributing to the ML-Agents Toolkit.
The Unity ML-Agents C# SDK is a Unity Package. You can install the `com.unity.ml-agents` package
[directly from the Package Manager registry](https://docs.unity3d.com/Manual/upm-ui-install.html)
without cloning this repository. Please make sure you enable 'Preview Packages' in the 'Advanced'
dropdown in order to find it.
The Unity ML-Agents C# SDK is a Unity Package. You can install the
`com.unity.ml-agents` package
[directly from the Package Manager registry](https://docs.unity3d.com/Manual/upm-ui-install.html).
Please make sure you enable 'Preview Packages' in the 'Advanced' dropdown in
order to find it.
#### Advanced: Local Installation for Development
If you want to try newer features from our development branch you
can still [install the local package](https://docs.unity3d.com/Manual/upm-ui-local.html) from the
repo that you just cloned.
You can [add the local](https://docs.unity3d.com/Manual/upm-ui-local.html)
`com.unity.ml-agents` package (from the repository that you just cloned) to our
project by:
Add the local `com.unity.ml-agents` package to your project by:
1. navigating to the menu `Window` -> `Package Manager`.
1. navigating to the menu `Window` -> `Package Manager`.
**NOTE:** In Unity 2018.4 the `+` button is on the bottom right of the packages list, and in Unity 2019.3 it's
on the top left of the packages list.
**NOTE:** In Unity 2018.4 the `+` button is on the bottom right of the packages
list, and in Unity 2019.3 it's on the top left of the packages list.
height="340" border="10" />
height="300"
border="10" />
height="340" border="10" />
height="300"
border="10" />
If you are going to follow the examples from our documentation, you can open the `Project`
folder in Unity and start tinkering immediately.
If you are going to follow the examples from our documentation, you can open the
`Project` folder in Unity and start tinkering immediately.
Installing the `mlagents` Python package involves installing other Python packages
that `mlagents` depends on. So you may run into installation issues if your machine
has older versions of any of those dependencies already installed. Consequently, our
supported path for installing `mlagents` is to leverage Python Virtual Environments.
Virtual Environments provide a mechanim for isolating the dependencies for each project
and are supported on Mac / Windows / Linux. We offer a dedicated
[guide on Virtual Environments](Using-Virtual-Environment.md).
Installing the `mlagents` Python package involves installing other Python
packages that `mlagents` depends on. So you may run into installation issues if
your machine has older versions of any of those dependencies already installed.
Consequently, our supported path for installing `mlagents` is to leverage Python
Virtual Environments. Virtual Environments provide a mechanism for isolating the
dependencies for each project and are supported on Mac / Windows / Linux. We
offer a dedicated [guide on Virtual Environments](Using-Virtual-Environment.md).
To install the `mlagents` Python package, activate your virtual environment and run from the
command line:
To install the `mlagents` Python package, activate your virtual environment and
run from the command line:
Note that this will install `mlagents` from PyPi, _not_ from the cloned repo.
If you installed this correctly, you should be able to run
`mlagents-learn --help`, after which you will see the Unity logo and the command line
parameters you can use with `mlagents-learn`.
Note that this will install `mlagents` from PyPi, _not_ from the cloned
repository. If you installed this correctly, you should be able to run
`mlagents-learn --help`, after which you will see the Unity logo and the command
line parameters you can use with `mlagents-learn`.
#### Advanced: Installing for Development
#### Advanced: Local Installation for Development
If you intend to make modifications to `mlagents` or `mlagents_envs`, you should install
the packages from the cloned repo rather than from PyPi. To do this, you will need to install
`mlagents` and `mlagents_envs` separately. From the repo's root directory, run:
If you intend to make modifications to `mlagents` or `mlagents_envs`, you should
install the packages from the cloned repository rather than from PyPi. To do
this, you will need to install `mlagents` and `mlagents_envs` separately. From
the repository's root directory, run:
```sh
pip3 install -e ./ml-agents-envs

Running pip with the `-e` flag will let you make changes to the Python files directly and have
those reflected when you run `mlagents-learn`. It is important to install these packages in this
order as the `mlagents` package depends on `mlagents_envs`, and installing it in the other
order will download `mlagents_envs` from PyPi.
Running pip with the `-e` flag will let you make changes to the Python files
directly and have those reflected when you run `mlagents-learn`. It is important
to install these packages in this order as the `mlagents` package depends on
`mlagents_envs`, and installing it in the other order will download
`mlagents_envs` from PyPi.
The [Getting Started](Getting-Started.md) guide contains several short tutorials on
setting up the ML-Agents Toolkit within Unity, running a pre-trained model, in
addition to building and training environments.
The [Getting Started](Getting-Started.md) guide contains several short tutorials
on setting up the ML-Agents Toolkit within Unity, running a pre-trained model,
in addition to building and training environments.
## Help

8
docs/Learning-Environment-Design-Agents.md


distinguish opposing agents, set the team ID to different integer values in the
behavior parameters script on the agent prefab.
![Team ID](images/team_id.png)
<p align="center">
<img src="images/team_id.png"
alt="Team ID"
width="375" border="10" />
</p>
**_Team ID must be 0 or an integer greater than 0._**

<p align="center">
<img src="images/demo_component.png"
alt="Demonstration Recorder"
width="375" border="10" />
width="450" border="10" />
</p>
When `Record` is checked, a demonstration will be created whenever the scene is

58
docs/Learning-Environment-Design.md


# Reinforcement Learning in Unity
# Designing a Learning Environment
This page contains general advice on how to design your learning environment, in
addition to overviewing aspects of the ML-Agents Unity SDK that pertain to
setting up your scene and simulation as opposed to designing your agents within
the scene. We have a dedicated page on
[Designing Agents](Learning-Environment-Design-Agents.md) which includes how to
instrument observations, actions and rewards, define teams for multi-agent
scenarios and record agent demonstrations for imitation learning.
To help on-board to the entire set of functionality provided by the ML-Agents
Toolkit, we recommend exploring our [API documentation](API-Reference.md).
Additionally, our [example environments](Learning-Environment-Examples.md) are a
great resource as they provide sample usage of almost all of our features.
## The Simulation and Training Process

for each training episode. Otherwise, the agent would probably on learn to solve
one, particular maze, not mazes in general.
### Environment Parameters
### Multiple Areas
In many of the example environments, many copies of the training area are
instantiated in the scene. This generally speeds up training, allowing the
environment to gather many experiences in parallel. This can be achieved simply
by instantiating many Agents with the same Behavior Name. If possible, consider
designing your scene to support multiple areas.
Check out our example environments to see examples of multiple areas.
Additionally, the
[Making a New Learning Environment](Learning-Environment-Create-New.md#optional-multiple-training-areas-within-the-same-scene)
guide demonstrates this option.
## Environments
When you create a training environment in Unity, you must set up the scene so
that it can be controlled by the external training process. Considerations
include:
- The training scene must start automatically when your Unity application is
launched by the training process.
- The Academy must reset the scene to a valid starting point for each episode of
training.
- A training episode must have a definite end — either using `Max Steps` or by
each Agent ending its episode manually with `EndEpisode()`.
## Environment Parameters
Curriculum learning and environment parameter randomization are two training
methods that control specific parameters in your environment. As such, it is

[WallJumpAgent.cs](../Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
).
### Agent
## Agent
The Agent class represents an actor in the scene that collects observations and
carries out actions. The Agent class is typically attached to the GameObject in

See [Agents](Learning-Environment-Design-Agents.md) for detailed information
about programming your own Agents.
## Environments
When you create a training environment in Unity, you must set up the scene so
that it can be controlled by the external training process. Considerations
include:
- The training scene must start automatically when your Unity application is
launched by the training process.
- The Academy must reset the scene to a valid starting point for each episode of
training.
- A training episode must have a definite end — either using `Max Steps` or by
each Agent ending its episode manually with `EndEpisode()`.
## Recording Statistics

94
docs/Learning-Environment-Examples.md


- +0.1 Each step agent's hand is in goal location.
- Behavior Parameters:
- Vector Observation space: 26 variables corresponding to position, rotation,
velocity, and angular velocities of the two arm Rigidbodies.
velocity, and angular velocities of the two arm rigid bodies.
- Vector Action space: (Continuous) Size of 4, corresponding to torque
applicable to two joints.
- Visual Observations: None.

![Worm](images/worm.png)
* Set-up: A worm with a head and 3 body segments.
* Goal: The agents must move its body toward the goal direction.
* `WormStaticTarget` - Goal direction is always forward.
* `WormDynamicTarget`- Goal direction is randomized.
* Agents: The environment contains 10 agents with same Behavior Parameters.
* Agent Reward Function (independent):
* +0.01 times body velocity in the goal direction.
* +0.01 times body direction alignment with goal direction.
* Behavior Parameters:
* Vector Observation space: 57 variables corresponding to position, rotation,
- Set-up: A worm with a head and 3 body segments.
- Goal: The agents must move its body toward the goal direction.
- `WormStaticTarget` - Goal direction is always forward.
- `WormDynamicTarget`- Goal direction is randomized.
- Agents: The environment contains 10 agents with same Behavior Parameters.
- Agent Reward Function (independent):
- +0.01 times body velocity in the goal direction.
- +0.01 times body direction alignment with goal direction.
- Behavior Parameters:
- Vector Observation space: 57 variables corresponding to position, rotation,
* Vector Action space: (Continuous) Size of 9, corresponding to target
- Vector Action space: (Continuous) Size of 9, corresponding to target
* Visual Observations: None
* Float Properties: None
* Benchmark Mean Reward for `WormStaticTarget`: 200
* Benchmark Mean Reward for `WormDynamicTarget`: 150
- Visual Observations: None
- Float Properties: None
- Benchmark Mean Reward for `WormStaticTarget`: 200
- Benchmark Mean Reward for `WormDynamicTarget`: 150
## Food Collector

- Set-up: Environment where four agents compete in a 2 vs 2 toy soccer game.
- Goal:
- Get the ball into the opponent's goal while preventing the ball from entering own goal.
- Agents: The environment contains four agents, with the same
Behavior Parameters : SoccerTwos.
- Get the ball into the opponent's goal while preventing the ball from
entering own goal.
- Agents: The environment contains four agents, with the same Behavior
Parameters : SoccerTwos.
- (1 - `accumulated time penalty`) When ball enters opponent's goal `accumulated time penalty` is incremented by
(1 / `MaxStep`) every fixed update and is reset to 0 at the beginning of an episode.
- -1 When ball enters team's goal.
- (1 - `accumulated time penalty`) When ball enters opponent's goal
`accumulated time penalty` is incremented by (1 / `MaxStep`) every fixed
update and is reset to 0 at the beginning of an episode.
- -1 When ball enters team's goal.
- Vector Observation space: 336 corresponding to 11 ray-casts forward distributed over 120 degrees
and 3 ray-casts backward distributed over 90 degrees each detecting 6 possible object types, along with the object's distance.
The forward ray-casts contribute 264 state dimensions and backward 72 state dimensions over three observation stacks.
- Vector Action space: (Discrete) Three branched actions corresponding to forward, backward, sideways movement,
as well as rotation.
- Vector Observation space: 336 corresponding to 11 ray-casts forward
distributed over 120 degrees and 3 ray-casts backward distributed over 90
degrees each detecting 6 possible object types, along with the object's
distance. The forward ray-casts contribute 264 state dimensions and backward
72 state dimensions over three observation stacks.
- Vector Action space: (Discrete) Three branched actions corresponding to
forward, backward, sideways movement, as well as rotation.
- ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
- ball_scale: Specifies the scale of the ball in the 3 dimensions (equal
across the three dimensions)
- Default: 7.5
- Recommended minimum: 4
- Recommended maximum: 10

- Agents: The environment contains three agents. Two Strikers and one Goalie.
Behavior Parameters : Striker, Goalie.
- Striker Agent Reward Function (dependent):
- +1 When ball enters opponent's goal.
- -0.001 Existential penalty.
- +1 When ball enters opponent's goal.
- -0.001 Existential penalty.
- -1 When ball enters goal.
- 0.001 Existential bonus.
- -1 When ball enters goal.
- 0.001 Existential bonus.
- Striker Vector Observation space: 294 corresponding to 11 ray-casts forward distributed over 120 degrees
and 3 ray-casts backward distributed over 90 degrees each detecting 5 possible object types, along with the object's distance.
The forward ray-casts contribute 231 state dimensions and backward 63 state dimensions over three observation stacks.
- Striker Vector Action space: (Discrete) Three branched actions corresponding to forward, backward, sideways movement,
as well as rotation.
- Goalie Vector Observation space: 738 corresponding to 41 ray-casts distributed over 360 degrees
each detecting 4 possible object types, along with the object's distance and 3 observation stacks.
- Goalie Vector Action space: (Discrete) Three branched actions corresponding to forward, backward, sideways movement,
as well as rotation.
- Striker Vector Observation space: 294 corresponding to 11 ray-casts forward
distributed over 120 degrees and 3 ray-casts backward distributed over 90
degrees each detecting 5 possible object types, along with the object's
distance. The forward ray-casts contribute 231 state dimensions and backward
63 state dimensions over three observation stacks.
- Striker Vector Action space: (Discrete) Three branched actions corresponding
to forward, backward, sideways movement, as well as rotation.
- Goalie Vector Observation space: 738 corresponding to 41 ray-casts
distributed over 360 degrees each detecting 4 possible object types, along
with the object's distance and 3 observation stacks.
- Goalie Vector Action space: (Discrete) Three branched actions corresponding
to forward, backward, sideways movement, as well as rotation.
- ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
- ball_scale: Specifies the scale of the ball in the 3 dimensions (equal
across the three dimensions)
- Default: 7.5
- Recommended minimum: 4
- Recommended maximum: 10

- Recommended maximum: 20
## Walker

2
docs/Learning-Environment-Executable.md


INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.
```
You can press Ctrl+C to stop the training, and your trained model will be at
You can press `Ctrl+C` to stop the training, and your trained model will be at
`models/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
latest checkpoint. (**Note:** There is a known bug on Windows that causes the
saving of the model to fail when you early terminate the training, it's

8
docs/Limitations.md


See the package-specific Limitations pages:
- [Unity `com.unity.mlagents` package](../com.unity.ml-agents/Documentation~/com.unity.ml-agents.md)
- [`mlagents` Python package](../ml-agents/README.md)
- [`mlagents_envs` Python package](../ml-agents-envs/README.md)
- [`gym_unity` Python package](../gym-unity/README.md)
- [`com.unity.mlagents` Unity package](../com.unity.ml-agents/Documentation~/com.unity.ml-agents.md#known-limitations)
- [`mlagents` Python package](../ml-agents/README.md#limitations)
- [`mlagents_envs` Python package](../ml-agents-envs/README.md#limitations)
- [`gym_unity` Python package](../gym-unity/README.md#limitations)

100
docs/ML-Agents-Overview.md


- [Key Components](#key-components)
- [Training Modes](#training-modes)
- [Built-in Training and Inference](#built-in-training-and-inference)
- [Cross-Platform Inference](#cross-platform-inference)
- [Custom Training and Inference](#custom-training-and-inference)
- [Flexible Training Scenarios](#flexible-training-scenarios)
- [Training Methods: Environment-agnostic](#training-methods-environment-agnostic)

processes these observations and sends back actions for each medic to take.
During training these actions are mostly exploratory to help the Python API
learn the best policy for each medic. Once training concludes, the learned
policy for each medic can be exported. Given that all our implementations are
based on TensorFlow, the learned policy is just a TensorFlow model file. Then
during the inference phase, we use the TensorFlow model generated from the
training phase. Now during the inference phase, the medics still continue to
generate their observations, but instead of being sent to the Python API, they
will be fed into their (internal, embedded) model to generate the _optimal_
action for each medic to take at every point in time.
To summarize: our built-in implementations are based on TensorFlow, thus, during
training the Python API uses the observations it receives to learn a TensorFlow
model. This model is then embedded within the Agent during inference.
policy for each medic can be exported as a model file. Then during the inference
phase, the medics still continue to generate their observations, but instead of
being sent to the Python API, they will be fed into their (internal, embedded)
model to generate the _optimal_ action for each medic to take at every point in
time.
#### Cross-Platform Inference
It is important to note that the ML-Agents Toolkit leverages the
[Unity Inference Engine](Unity-Inference-Engine.md) to run the models within a
Unity scene such that an agent can take the _optimal_ action at each step. Given
that the Unity Inference Engine support most platforms that Unity does, this
means that any model you train with the ML-Agents Toolkit can be embedded into
your Unity application that runs on any platform. See our
[dedicated blog post](https://blogs.unity3d.com/2019/03/01/unity-ml-agents-toolkit-v0-7-a-leap-towards-cross-platform-inference/)
for additional information.
### Custom Training and Inference
In the previous mode, the Agents were used for training to generate a TensorFlow

and saved as assets. These demonstrations contain information on the
observations, actions, and rewards for a given agent during the recording
session. They can be managed in the Editor, as well as used for training with BC
and GAIL.
and GAIL. See the
[Designing Agents](Learning-Environment-Design-Agents.md#recording-demonstrations)
page for more information on how to record demonstrations for your agent.
### Summary

experience replay mechanism used by SAC. Thus, we recommend that users use PPO.
For further reading on this issue in particular, see the paper
[Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/1702.08887.pdf).
See our
[Designing Agents](Learning-Environment-Design-Agents.md#defining-teams-for-multi-agent-scenarios)
page for more information on setting up teams in your Unity scene. Also, read
our
[blog post on self-play](https://blogs.unity3d.com/2020/02/28/training-intelligent-adversaries-using-self-play-with-ml-agents/)
for additional information.
### Solving Complex Tasks using Curriculum Learning

is not possible to directly replicate the results here using that environment.]_
The ML-Agents Toolkit supports modifying custom environment parameters during
the training process to aid in learning.. This allows elements of the
environment related to difficulty or complexity to be dynamically adjusted based
on training progress.
the training process to aid in learning. This allows elements of the environment
related to difficulty or complexity to be dynamically adjusted based on training
progress. The [Training ML-Agents](Training-ML-Agents.md#curriculum-learning)
page has more information on defining training curriculums.
### Training Robust Agents using Environment Parameter Randomization

learns, the ML-Agents Toolkit provides a way to randomly sample parameters of
the environment during training. We refer to this approach as **Environment
Parameter Randomization**. For those familiar with Reinforcement Learning
research, this approach is based on the concept of Domain Randomization (you can
read more about it [here](https://arxiv.org/abs/1703.06907)). By using parameter
randomization during training, the agent can be better suited to adapt (with
higher performance) to future unseen variations of the environment.
_Example of variations of the 3D Ball environment._
research, this approach is based on the concept of
[Domain Randomization](https://arxiv.org/abs/1703.06907). By using
[parameter randomization during training](Training-ML-Agents.md#environment-parameter-randomization),
the agent can be better suited to adapt (with higher performance) to future
unseen variations of the environment.
In the 3D ball environment example displayed in the figure above, the
environment parameters are `gravity`, `ball_mass` and `ball_scale`.
_Example of variations of the 3D Ball environment. The environment parameters
are `gravity`, `ball_mass` and `ball_scale`._
## Model Types

- **Concurrent Unity Instances** - We enable developers to run concurrent,
parallel instances of the Unity executable during training. For certain
scenarios, this should speed up training.
- **Recording Statistics from Unity** - We enable developers to record
statistics from within their Unity environments. These statistics are
aggregated and generated during the training process.
- **Custom Side Channels** - We enable developers to create custom side channels
to manage data transfer between Unity and Python that is unique to their
training workflow and/or environment.
- **Custom Samplers** - We enable developers to create custom sampling methods
scenarios, this should speed up training. Check out our dedicated page on
[creating a Unity executable](Learning-Environment-Executable.md) and the
[Training ML-Agents](Training-ML-Agents.md#training-using-concurrent-unity-instances)
page for instructions on how to set the number of concurrent instances.
- **Recording Statistics from Unity** - We enable developers to
[record statistics](Learning-Environment-Design.md#recording-statistics) from
within their Unity environments. These statistics are aggregated and generated
during the training process.
- **Custom Side Channels** - We enable developers to
[create custom side channels](Custom-SideChannels.md) to manage data transfer
between Unity and Python that is unique to their training workflow and/or
environment.
- **Custom Samplers** - We enable developers to
[create custom sampling methods](Training-ML-Agents.md#defining-a-new-sampler-type)
for Environment Parameter Randomization. This enables users to customize this
training method for their particular environment.

packed with several features to enable researchers and developers to leverage
(and enhance) machine learning within Unity.
To help you use ML-Agents, we've created several in-depth tutorials for
[installing ML-Agents](Installation.md), [getting started](Getting-Started.md)
with the 3D Balance Ball environment (one of our many
[sample environments](Learning-Environment-Examples.md)) and
[making your own environment](Learning-Environment-Create-New.md).
In terms of next steps:
- For a walkthrough of running ML-Agents with a simple scene, check out the
[Getting Started](Getting-Started.md) guide.
- For a "Hello World" introduction to creating your own Learning Environment,
check out the
[Making a New Learning Environment](Learning-Environment-Create-New.md) page.
- For an overview on the more complex example environments that are provided in
this toolkit, check out the
[Example Environments](Learning-Environment-Examples.md) page.
- For more information on the various training options available, check out the
[Training ML-Agents](Training-ML-Agents.md) page.

68
docs/Profiling-Python.md


# Profiling in Python
As part of the ML-Agents tookit, we provide a lightweight profiling system,
in order to identity hotspots in the training process and help spot regressions from changes.
As part of the ML-Agents Tookit, we provide a lightweight profiling system, in
order to identity hotspots in the training process and help spot regressions
from changes.
Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if
desired. This also means that a function that is called from multiple places in the code will appear in multiple
places in the timing output.
Timers are hierarchical, meaning that the time tracked in a block of code can be
further split into other blocks if desired. This also means that a function that
is called from multiple places in the code will appear in multiple places in the
timing output.
All timers operate using a "global" instance by default, but this can be overridden if necessary (mainly for testing).
All timers operate using a "global" instance by default, but this can be
overridden if necessary (mainly for testing).
There are two ways to indicate code should be included in profiling. The simplest way is to add the `@timed`
decorator to a function or method of interested.
There are two ways to indicate code should be included in profiling. The
simplest way is to add the `@timed` decorator to a function or method of
interested.
```python
class TrainerController:

You can also used the `hierarchical_timer` context manager.
``` python
```python
The context manager may be easier than the `@timed` decorator for profiling different parts of a large function, or
profiling calls to abstract methods that might not use decorator.
The context manager may be easier than the `@timed` decorator for profiling
different parts of a large function, or profiling calls to abstract methods that
might not use decorator.
By default, at the end of training, timers are collected and written in json format to
`{summaries_dir}/{run_id}_timers.json`. The output consists of node objects with the following keys:
* total (float): The total time in seconds spent in the block, including child calls.
* count (int): The number of times the block was called.
* self (float): The total time in seconds spent in the block, excluding child calls.
* children (dictionary): A dictionary of child nodes, keyed by the node name.
* is_parallel (bool): Indicates that the block of code was executed in multiple threads or processes (see below). This
is optional and defaults to false.
By default, at the end of training, timers are collected and written in json
format to `{summaries_dir}/{run_id}_timers.json`. The output consists of node
objects with the following keys:
- total (float): The total time in seconds spent in the block, including child
calls.
- count (int): The number of times the block was called.
- self (float): The total time in seconds spent in the block, excluding child
calls.
- children (dictionary): A dictionary of child nodes, keyed by the node name.
- is_parallel (bool): Indicates that the block of code was executed in multiple
threads or processes (see below). This is optional and defaults to false.
For code that executes in multiple processes (for example, SubprocessEnvManager), we periodically send the timer
information back to the "main" process, aggregate the timers there, and flush them in the subprocess. Note that
(depending on the number of processes) this can result in timers where the total time may exceed the parent's total
time. This is analogous to the difference between "real" and "user" values reported from the unix `time` command. In the
timer output, blocks that were run in parallel are indicated by the `is_parallel` flag.
For code that executes in multiple processes (for example,
SubprocessEnvManager), we periodically send the timer information back to the
"main" process, aggregate the timers there, and flush them in the subprocess.
Note that (depending on the number of processes) this can result in timers where
the total time may exceed the parent's total time. This is analogous to the
difference between "real" and "user" values reported from the unix `time`
command. In the timer output, blocks that were run in parallel are indicated by
the `is_parallel` flag.
Timers currently use `time.perf_counter()` to track time spent, which may not give accurate results for multiple
threads. If this is problematic, set `threaded: false` in your trainer configuration.
Timers currently use `time.perf_counter()` to track time spent, which may not
give accurate results for multiple threads. If this is problematic, set
`threaded: false` in your trainer configuration.

14
docs/Readme.md


- [Making a New Learning Environment](Learning-Environment-Create-New.md)
- [Designing a Learning Environment](Learning-Environment-Design.md)
- [Designing Agents](Learning-Environment-Design-Agents.md)
### Advanced Usage
- [Designing Agents](Learning-Environment-Design-Agents.md)
## Training
## Training & Inference
- [Training Configuration File](Training-Configuration-File.md)
- [Using TensorBoard to Observe Training](Using-Tensorboard.md)
- [Using TensorBoard to Observe Training](Using-Tensorboard.md)
## Inference
- [Creating Custom Samplers for Environment Parameter Randomization](Training-ML-Agents.md#defining-a-new-sampler-type)
## Help

60
docs/Unity-Inference-Engine.md


# Unity Inference Engine
The ML-Agents Toolkit allows you to use pre-trained neural network models
inside your Unity games. This support is possible thanks to the Unity Inference
Engine. The Unity Inference Engine is using
[compute shaders](https://docs.unity3d.com/Manual/class-ComputeShader.html)
to run the neural network within Unity.
The ML-Agents Toolkit allows you to use pre-trained neural network models inside
your Unity games. This support is possible thanks to the
[Unity Inference Engine](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html)
(codenamed Barracuda). The Unity Inference Engine uses
[compute shaders](https://docs.unity3d.com/Manual/class-ComputeShader.html) to
run the neural network within Unity.
__Note__: The ML-Agents Toolkit only supports the models created with our
**Note**: The ML-Agents Toolkit only supports the models created with our
Scripting Backends : The Unity Inference Engine is generally faster with
__IL2CPP__ than with __Mono__ for Standalone builds.
In the Editor, It is not possible to use the Unity Inference Engine with
GPU device selected when Editor Graphics Emulation is set to __OpenGL(ES)
3.0 or 2.0 emulation__. Also there might be non-fatal build time errors
when target platform includes Graphics API that does not support
__Unity Compute Shaders__.
The Unity Inference Engine supposedly works on any Unity supported platform
but we only tested for the following platforms :
See the Unity Inference Engine documentation for a list of the
[supported platforms](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html#supported-platforms).
* Linux 64 bits
* Mac OS X 64 bits (`OpenGLCore` Graphics API is not supported)
* Windows 64 bits
* iOS
* Android
Scripting Backends : The Unity Inference Engine is generally faster with
**IL2CPP** than with **Mono** for Standalone builds. In the Editor, It is not
possible to use the Unity Inference Engine with GPU device selected when Editor
Graphics Emulation is set to **OpenGL(ES) 3.0 or 2.0 emulation**. Also there
might be non-fatal build time errors when target platform includes Graphics API
that does not support **Unity Compute Shaders**.
* Barracuda (`.nn`) files use a proprietary format produced by the [`tensorflow_to_barracuda.py`]() script.
* ONNX (`.onnx`) files use an [industry-standard open format](https://onnx.ai/about.html) produced by the [tf2onnx package](https://github.com/onnx/tensorflow-onnx).
Export to ONNX is currently considered beta. To enable it, make sure `tf2onnx>=1.5.5` is installed in pip.
tf2onnx does not currently support tensorflow 2.0.0 or later, or earlier than 1.12.0.
- Barracuda (`.nn`) files use a proprietary format produced by the
[`tensorflow_to_barracuda.py`]() script.
- ONNX (`.onnx`) files use an
[industry-standard open format](https://onnx.ai/about.html) produced by the
[tf2onnx package](https://github.com/onnx/tensorflow-onnx).
Export to ONNX is currently considered beta. To enable it, make sure
`tf2onnx>=1.5.5` is installed in pip. tf2onnx does not currently support
tensorflow 2.0.0 or later, or earlier than 1.12.0.
When using a model, drag the model file into the **Model** field in the Inspector of the Agent.
Select the **Inference Device** : CPU or GPU you want to use for Inference.
When using a model, drag the model file into the **Model** field in the
Inspector of the Agent. Select the **Inference Device** : CPU or GPU you want to
use for Inference.
**Note:** For most of the models generated with the ML-Agents Toolkit, CPU will be faster than GPU.
You should use the GPU only if you use the
ResNet visual encoder or have a large number of agents with visual observations.
**Note:** For most of the models generated with the ML-Agents Toolkit, CPU will
be faster than GPU. You should use the GPU only if you use the ResNet visual
encoder or have a large number of agents with visual observations.

158
gym-unity/README.md


information on the gym interface, see [here](https://github.com/openai/gym).
We provide a gym wrapper and instructions for using it with existing machine
learning algorithms which utilize gym. Our wrapper provides interfaces on top
of our `UnityEnvironment` class, which is the default way of interfacing with a
learning algorithms which utilize gym. Our wrapper provides interfaces on top of
our `UnityEnvironment` class, which is the default way of interfacing with a
Unity environment via Python.
## Installation

```sh
pip install gym_unity
pip3 install gym_unity
pip install -e .
pip3 install -e .
```
## Using the Gym Wrapper

env = UnityToGymWrapper(unity_environment, worker_id, use_visual, uint8_visual)
```
* `unity_environment` refers to the Unity environment to be wrapped.
- `unity_environment` refers to the Unity environment to be wrapped.
* `use_visual` refers to whether to use visual observations (True) or vector
observations (False) as the default observation provided by the `reset` and
`step` functions. Defaults to `False`.
- `use_visual` refers to whether to use visual observations (True) or vector
observations (False) as the default observation provided by the `reset` and
`step` functions. Defaults to `False`.
* `uint8_visual` refers to whether to output visual observations as `uint8` values
(0-255). Many common Gym environments (e.g. Atari) do this. By default they
will be floats (0.0-1.0). Defaults to `False`.
- `uint8_visual` refers to whether to output visual observations as `uint8`
values (0-255). Many common Gym environments (e.g. Atari) do this. By default
they will be floats (0.0-1.0). Defaults to `False`.
* `flatten_branched` will flatten a branched discrete action space into a Gym Discrete.
Otherwise, it will be converted into a MultiDiscrete. Defaults to `False`.
- `flatten_branched` will flatten a branched discrete action space into a Gym
Discrete. Otherwise, it will be converted into a MultiDiscrete. Defaults to
`False`.
* `allow_multiple_visual_obs` will return a list of observation instead of only
one if disabled. Defaults to `False`.
- `allow_multiple_visual_obs` will return a list of observation instead of only
one if disabled. Defaults to `False`.
* It is only possible to use an environment with a **single** Agent.
* By default, the first visual observation is provided as the `observation`, if
present. Otherwise, vector observations are provided. You can receive all visual
observations by using the `allow_multiple_visual_obs=True` option in the gym
parameters. If set to `True`, you will receive a list of `observation` instead
of only the first one.
* The `TerminalSteps` or `DecisionSteps` output from the environment can still be
accessed from the `info` provided by `env.step(action)`.
* Stacked vector observations are not supported.
* Environment registration for use with `gym.make()` is currently not supported.
- It is only possible to use an environment with a **single** Agent.
- By default, the first visual observation is provided as the `observation`, if
present. Otherwise, vector observations are provided. You can receive all
visual observations by using the `allow_multiple_visual_obs=True` option in
the gym parameters. If set to `True`, you will receive a list of `observation`
instead of only the first one.
- The `TerminalSteps` or `DecisionSteps` output from the environment can still
be accessed from the `info` provided by `env.step(action)`.
- Stacked vector observations are not supported.
- Environment registration for use with `gym.make()` is currently not supported.
## Running OpenAI Baselines Algorithms

### Other Algorithms
Other algorithms in the Baselines repository can be run using scripts similar to
the examples from the baselines package. In most cases, the primary changes needed
to use a Unity environment are to import `UnityToGymWrapper`, and to replace the
environment creation code, typically `gym.make()`, with a call to
the examples from the baselines package. In most cases, the primary changes
needed to use a Unity environment are to import `UnityToGymWrapper`, and to
replace the environment creation code, typically `gym.make()`, with a call to
`UnityToGymWrapper(unity_environment)` passing the environment as input.
A typical rule of thumb is that for vision-based environments, modification

Some algorithms will make use of `make_env()` or `make_mujoco_env()`
functions. You can define a similar function for Unity environments. An example of
such a method using the PPO2 baseline:
Some algorithms will make use of `make_env()` or `make_mujoco_env()` functions.
You can define a similar function for Unity environments. An example of such a
method using the PPO2 baseline:
```python
from mlagents_envs.environment import UnityEnvironment

## Run Google Dopamine Algorithms
Google provides a framework [Dopamine](https://github.com/google/dopamine), and
implementations of algorithms, e.g. DQN, Rainbow, and the C51 variant of Rainbow.
Using the Gym wrapper, we can run Unity environments using Dopamine.
implementations of algorithms, e.g. DQN, Rainbow, and the C51 variant of
Rainbow. Using the Gym wrapper, we can run Unity environments using Dopamine.
First, after installing the Gym wrapper, clone the Dopamine repository.

Then, follow the appropriate install instructions as specified on
[Dopamine's homepage](https://github.com/google/dopamine). Note that the Dopamine
guide specifies using a virtualenv. If you choose to do so, make sure your unity_env
package is also installed within the same virtualenv as Dopamine.
[Dopamine's homepage](https://github.com/google/dopamine). Note that the
Dopamine guide specifies using a virtualenv. If you choose to do so, make sure
your unity_env package is also installed within the same virtualenv as Dopamine.
First, open `dopamine/atari/run_experiment.py`. Alternatively, copy the entire `atari`
folder, and name it something else (e.g. `unity`). If you choose the copy approach,
be sure to change the package names in the import statements in `train.py` to your new
directory.
First, open `dopamine/atari/run_experiment.py`. Alternatively, copy the entire
`atari` folder, and name it something else (e.g. `unity`). If you choose the
copy approach, be sure to change the package names in the import statements in
`train.py` to your new directory.
Within `run_experiment.py`, we will need to make changes to which environment is
instantiated, just as in the Baselines example. At the top of the file, insert

from gym_unity.envs import UnityToGymWrapper
```
to import the Gym Wrapper. Navigate to the `create_atari_environment` method
in the same file, and switch to instantiating a Unity environment by replacing
the method with the following code.
to import the Gym Wrapper. Navigate to the `create_atari_environment` method in
the same file, and switch to instantiating a Unity environment by replacing the
method with the following code.
```python
game_version = 'v0' if sticky_actions else 'v4'

return env
```
`./envs/GridWorld` is the path to your built Unity executable. For more information on
building Unity environments, see [here](../docs/Learning-Environment-Executable.md), and note
the Limitations section below.
`./envs/GridWorld` is the path to your built Unity executable. For more
information on building Unity environments, see
[here](../docs/Learning-Environment-Executable.md), and note the Limitations
section below.
Note that we are not using the preprocessor from Dopamine,
as it uses many Atari-specific calls. Furthermore, frame-skipping can be done from within Unity,
Note that we are not using the preprocessor from Dopamine, as it uses many
Atari-specific calls. Furthermore, frame-skipping can be done from within Unity,
Since Dopamine is designed around variants of DQN, it is only compatible
with discrete action spaces, and specifically the Discrete Gym space. For environments
that use branched discrete action spaces (e.g.
Since Dopamine is designed around variants of DQN, it is only compatible with
discrete action spaces, and specifically the Discrete Gym space. For
environments that use branched discrete action spaces (e.g.
`flatten_branched` parameter in `UnityToGymWrapper`, which treats each combination of branched
actions as separate actions.
`flatten_branched` parameter in `UnityToGymWrapper`, which treats each
combination of branched actions as separate actions.
Furthermore, when building your environments, ensure that your Agent is using visual
observations with greyscale enabled, and that the dimensions of the visual observations
is 84 by 84 (matches the parameter found in `dqn_agent.py` and `rainbow_agent.py`).
Dopamine's agents currently do not automatically adapt to the observation
dimensions or number of channels.
Furthermore, when building your environments, ensure that your Agent is using
visual observations with greyscale enabled, and that the dimensions of the
visual observations is 84 by 84 (matches the parameter found in `dqn_agent.py`
and `rainbow_agent.py`). Dopamine's agents currently do not automatically adapt
to the observation dimensions or number of channels.
The hyperparameters provided by Dopamine are tailored to the Atari games, and you will
likely need to adjust them for ML-Agents environments. Here is a sample
The hyperparameters provided by Dopamine are tailored to the Atari games, and
you will likely need to adjust them for ML-Agents environments. Here is a sample
`dopamine/agents/rainbow/configs/rainbow.gin` file that is known to work with
GridWorld.

This example assumed you copied `atari` to a separate folder named `unity`.
Replace `unity` in `import dopamine.unity.run_experiment` with the folder you
copied your `run_experiment.py` and `trainer.py` files to.
If you directly modified the existing files, then use `atari` here.
copied your `run_experiment.py` and `trainer.py` files to. If you directly
modified the existing files, then use `atari` here.
### Starting a Run

--gin_files='dopamine/agents/rainbow/configs/rainbow.gin'
```
Again, we assume that you've copied `atari` into a separate folder.
Remember to replace `unity` with the directory you copied your files into. If you
edited the Atari files directly, this should be `atari`.
Again, we assume that you've copied `atari` into a separate folder. Remember to
replace `unity` with the directory you copied your files into. If you edited the
Atari files directly, this should be `atari`.
Dopamine as run on the GridWorld example environment. All Dopamine (DQN, Rainbow,
C51) runs were done with the same epsilon, epsilon decay, replay history, training steps,
and buffer settings as specified above. Note that the first 20000 steps are used to pre-fill
the training buffer, and no learning happens.
Dopamine as run on the GridWorld example environment. All Dopamine (DQN,
Rainbow, C51) runs were done with the same epsilon, epsilon decay, replay
history, training steps, and buffer settings as specified above. Note that the
first 20000 steps are used to pre-fill the training buffer, and no learning
happens.
We provide results from our PPO implementation and the DQN from Baselines as reference.
Note that all runs used the same greyscale GridWorld as Dopamine. For PPO, `num_layers`
was set to 2, and all other hyperparameters are the default for GridWorld in `trainer_config.yaml`.
For Baselines DQN, the provided hyperparameters in the previous section are used. Note
that Baselines implements certain features (e.g. dueling-Q) that are not enabled
in Dopamine DQN.
We provide results from our PPO implementation and the DQN from Baselines as
reference. Note that all runs used the same greyscale GridWorld as Dopamine. For
PPO, `num_layers` was set to 2, and all other hyperparameters are the default
for GridWorld in `trainer_config.yaml`. For Baselines DQN, the provided
hyperparameters in the previous section are used. Note that Baselines implements
certain features (e.g. dueling-Q) that are not enabled in Dopamine DQN.
![Dopamine on GridWorld](images/dopamine_gridworld_plot.png)

algorithm to train on the VisualBanana environment, and provide the results below.
The same hyperparameters were used as in the GridWorld case, except that
algorithm to train on the VisualBanana environment, and provide the results
below. The same hyperparameters were used as in the GridWorld case, except that
`replay_history` and `epsilon_decay` were increased to 100000.
![Dopamine on VisualBanana](images/dopamine_visualbanana_plot.png)

37
ml-agents-envs/README.md


The `mlagents_envs` Python package is part of the
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
`mlagents_envs` provides a Python API that allows direct interaction with the Unity
game engine. It is used by the trainer implementation in `mlagents` as well as
the `gym-unity` package to perform reinforcement learning within Unity. `mlagents_envs` can be
used independently of `mlagents` for Python communication.
The `mlagents_envs` Python package contains a low level API which allows you to interact
directly with a Unity Environment. See [here](../docs/Python-API.md) for more information
on using this API.
`mlagents_envs` provides a Python API that allows direct interaction with the
Unity game engine. It is used by the trainer implementation in `mlagents` as
well as the `gym-unity` package to perform reinforcement learning within Unity.
`mlagents_envs` can be used independently of `mlagents` for Python
communication.
## Installation

pip install mlagents_envs
pip3 install mlagents_envs
For more detailed documentation, check out the
[ML-Agents Toolkit documentation.](../docs/Readme.md)
See the [Python API Guide](../docs/Python-API.md) for more information on how to
use the API to interact with a Unity environment.
For more information on the ML-Agents Toolkit and how to instrument a Unity
scene with the ML-Agents SDK, check out the main
[ML-Agents Toolkit documentation](../docs/Readme.md).
- `mlagents_envs` uses localhost ports to exchange data between Unity and Python. As such,
multiple instances can have their ports collide leading to errors. Make sure to use a
different port if you are using multiple instances of `UnityEnvironment`.
- Communication between Unity and the Python `UnityEnvironment` is not secure.
- On Linux, ports are not released immediately after the communication closes. As such, you
cannot reuse ports right after closing a `UnityEnvironment`.
- `mlagents_envs` uses localhost ports to exchange data between Unity and
Python. As such, multiple instances can have their ports collide, leading to
errors. Make sure to use a different port if you are using multiple instances
of `UnityEnvironment`.
- Communication between Unity and the Python `UnityEnvironment` is not secure.
- On Linux, ports are not released immediately after the communication closes.
As such, you cannot reuse ports right after closing a `UnityEnvironment`.

35
ml-agents/README.md


# Unity ML-Agents Trainers
The `mlagents` Python package is part of the
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
`mlagents` provides a set of reinforcement and imitation learning algorithms designed to be
used with Unity environments. The algorithms interface with the Python API provided by the `mlagents_envs`
package. See [here](../docs/Python-API.md) for more information on `mlagents_envs`.
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents`
provides a set of reinforcement and imitation learning algorithms designed to be
used with Unity environments. The algorithms interface with the Python API
provided by the `mlagents_envs` package. See [here](../docs/Python-API.md) for
more information on `mlagents_envs`.
The algorithms can be accessed using the: `mlagents-learn` access
point. See
[here](../docs/Training-ML-Agents.md)
for more information on using this package.
The algorithms can be accessed using the: `mlagents-learn` access point. See
[here](../docs/Training-ML-Agents.md) for more information on using this
package.
## Installation

pip install mlagents
pip3 install mlagents
For more detailed documentation, check out the
[ML-Agents Toolkit documentation.](../docs/Readme.md)
For more information on the ML-Agents Toolkit and how to instrument a Unity
scene with the ML-Agents SDK, check out the main
[ML-Agents Toolkit documentation](../docs/Readme.md).
- `mlagents` does not yet explicitly support multi-agent scenarios so training
cooperative behavior among different agents is not stable.
- Resuming self-play from a checkpoint resets the reported ELO to the default value.
- Resuming curriculum learning from a checkpoint requires the last lesson be specified
using the `--lesson` CLI option
- `mlagents` does not yet explicitly support multi-agent scenarios so training
cooperative behavior among different agents is not stable.
- Resuming self-play from a checkpoint resets the reported ELO to the default
value.
- Resuming curriculum learning from a checkpoint requires the last lesson be
specified using the `--lesson` CLI option
正在加载...
取消
保存