GitHub
6 年前
当前提交
0c417c55
共有 316 个文件被更改,包括 5942 次插入 和 2613 次删除
-
58.gitignore
-
5CODE_OF_CONDUCT.md
-
60CONTRIBUTING.md
-
11Dockerfile
-
201LICENSE
-
106README.md
-
29docs/API-Reference.md
-
15docs/Background-Jupyter.md
-
301docs/Background-Machine-Learning.md
-
74docs/Background-TensorFlow.md
-
12docs/Background-Unity.md
-
242docs/Basic-Guide.md
-
136docs/FAQ.md
-
57docs/Feature-Memory.md
-
50docs/Feature-Monitor.md
-
393docs/Getting-Started-with-Balance-Ball.md
-
68docs/Glossary.md
-
251docs/Installation-Windows.md
-
90docs/Installation.md
-
64docs/Learning-Environment-Best-Practices.md
-
363docs/Learning-Environment-Create-New.md
-
55docs/Learning-Environment-Design-Academy.md
-
469docs/Learning-Environment-Design-Agents.md
-
106docs/Learning-Environment-Design-Brains.md
-
118docs/Learning-Environment-Design-External-Internal-Brains.md
-
34docs/Learning-Environment-Design-Heuristic-Brains.md
-
47docs/Learning-Environment-Design-Player-Brains.md
-
203docs/Learning-Environment-Design.md
-
381docs/Learning-Environment-Examples.md
-
219docs/Learning-Environment-Executable.md
-
27docs/Limitations.md
-
703docs/ML-Agents-Overview.md
-
135docs/Migrating.md
-
160docs/Python-API.md
-
80docs/Readme.md
-
143docs/Training-Curriculum-Learning.md
-
78docs/Training-Imitation-Learning.md
-
228docs/Training-ML-Agents.md
-
218docs/Training-PPO.md
-
118docs/Training-on-Amazon-Web-Service.md
-
112docs/Training-on-Microsoft-Azure-Custom-Instance.md
-
107docs/Training-on-Microsoft-Azure.md
-
117docs/Using-Docker.md
-
179docs/Using-TensorFlow-Sharp-in-Unity.md
-
79docs/Using-Tensorboard.md
-
8docs/dox-ml-agents.conf
-
611docs/images/banner.png
-
129docs/images/player_brain.png
-
79docs/images/scene-hierarchy.png
-
309docs/images/unity-logo-rgb.png
-
5docs/localized/zh-CN/README.md
-
26docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
-
4docs/localized/zh-CN/docs/Installation.md
-
4docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
-
14docs/localized/zh-CN/docs/Learning-Environment-Design.md
-
42docs/localized/zh-CN/docs/Learning-Environment-Examples.md
-
2docs/localized/zh-CN/docs/ML-Agents-Overview.md
-
2config/curricula/push-block/PushBlockBrain.json
-
2config/curricula/test/TestBrain.json
-
2ml-agents/requirements.txt
-
15ml-agents/mlagents/trainers/buffer.py
-
1ml-agents/mlagents/envs/__init__.py
-
4ml-agents/mlagents/envs/exception.py
-
7ml-agents/mlagents/envs/communicator.py
-
4ml-agents/mlagents/envs/socket_communicator.py
-
6ml-agents/mlagents/envs/rpc_communicator.py
-
1UnitySDK/ProjectSettings/EditorBuildSettings.asset
-
57UnitySDK/ProjectSettings/ProjectSettings.asset
-
4UnitySDK/ProjectSettings/UnityConnectSettings.asset
-
2UnitySDK/ProjectSettings/ProjectVersion.txt
-
12UnitySDK/Assets/ML-Agents/Scripts/CoreBrainInternal.cs.meta
-
28UnitySDK/Assets/ML-Agents/Scripts/Academy.cs
-
17UnitySDK/Assets/ML-Agents/Scripts/Batcher.cs
-
24UnitySDK/Assets/ML-Agents/Scripts/Brain.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/CoreBrain.cs
-
9UnitySDK/Assets/ML-Agents/Scripts/CoreBrainHeuristic.cs
-
17UnitySDK/Assets/ML-Agents/Scripts/CoreBrainPlayer.cs
-
43UnitySDK/Assets/ML-Agents/Scripts/Monitor.cs
-
20UnitySDK/Assets/ML-Agents/Scripts/RpcCommunicator.cs
-
22UnitySDK/Assets/ML-Agents/Scripts/SocketCommunicator.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/UnityAgentsException.cs
-
165UnitySDK/Assets/ML-Agents/Scripts/Agent.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/AgentActionProto.cs.meta
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/AgentInfoProto.cs.meta
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/BrainParametersProto.cs.meta
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/BrainTypeProto.cs.meta
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/CommandProto.cs.meta
-
19UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/EngineConfigurationProto.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/EngineConfigurationProto.cs.meta
-
21UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/EnvironmentParametersProto.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/EnvironmentParametersProto.cs.meta
-
14UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/Header.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/Header.cs.meta
-
15UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/ResolutionProto.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/ResolutionProto.cs.meta
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/SpaceTypeProto.cs.meta
-
27UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityInput.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityInput.cs.meta
-
32UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityMessage.cs
-
2UnitySDK/Assets/ML-Agents/Scripts/CommunicatorObjects/UnityMessage.cs.meta
|
|||
# Contribution Guidelines |
|||
|
|||
Thank you for your interest in contributing to the ML-Agents toolkit! We are incredibly |
|||
excited to see how members of our community will use and extend the ML-Agents toolkit. |
|||
To facilitate your contributions, we've outlined a brief set of guidelines |
|||
to ensure that your extensions can be easily integrated. |
|||
Thank you for your interest in contributing to the ML-Agents toolkit! We are |
|||
incredibly excited to see how members of our community will use and extend the |
|||
ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set |
|||
of guidelines to ensure that your extensions can be easily integrated. |
|||
### Communication |
|||
## Communication |
|||
First, please read through our [code of conduct](CODE_OF_CONDUCT.md), |
|||
as we expect all our contributors to follow it. |
|||
First, please read through our [code of conduct](CODE_OF_CONDUCT.md), as we |
|||
expect all our contributors to follow it. |
|||
Second, before starting on a project that you intend to contribute |
|||
to the ML-Agents toolkit (whether environments or modifications to the codebase), |
|||
we **strongly** recommend posting on our |
|||
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) and |
|||
briefly outlining the changes you plan to make. This will enable us to provide |
|||
some context that may be helpful for you. This could range from advice and |
|||
feedback on how to optimally perform your changes or reasons for not doing it. |
|||
Second, before starting on a project that you intend to contribute to the |
|||
ML-Agents toolkit (whether environments or modifications to the codebase), we |
|||
**strongly** recommend posting on our |
|||
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) |
|||
and briefly outlining the changes you plan to make. This will enable us to |
|||
provide some context that may be helpful for you. This could range from advice |
|||
and feedback on how to optimally perform your changes or reasons for not doing |
|||
it. |
|||
### Git Branches |
|||
## Git Branches |
|||
Starting with v0.3, we adopted the |
|||
Starting with v0.3, we adopted the |
|||
Consequently, the `master` branch corresponds to the latest release of |
|||
Consequently, the `master` branch corresponds to the latest release of |
|||
|
|||
* Corresponding changes to documentation, unit tests and sample environments |
|||
(if applicable) |
|||
* Corresponding changes to documentation, unit tests and sample environments (if |
|||
applicable) |
|||
### Environments |
|||
## Environments |
|||
We are also actively open to adding community contributed environments as |
|||
examples, as long as they are small, simple, demonstrate a unique feature of |
|||
the platform, and provide a unique non-trivial challenge to modern |
|||
We are also actively open to adding community contributed environments as |
|||
examples, as long as they are small, simple, demonstrate a unique feature of |
|||
the platform, and provide a unique non-trivial challenge to modern |
|||
PR explaining the nature of the environment and task. |
|||
PR explaining the nature of the environment and task. |
|||
### Style Guide |
|||
## Style Guide |
|||
When performing changes to the codebase, ensure that you follow the style |
|||
guide of the file you're modifying. For Python, we follow |
|||
[PEP 8](https://www.python.org/dev/peps/pep-0008/). For C#, we will soon be |
|||
adding a formal style guide for our repository. |
|||
When performing changes to the codebase, ensure that you follow the style guide |
|||
of the file you're modifying. For Python, we follow |
|||
[PEP 8](https://www.python.org/dev/peps/pep-0008/). |
|||
For C#, we will soon be adding a formal style guide for our repository. |
|
|||
Apache License |
|||
Version 2.0, January 2004 |
|||
http://www.apache.org/licenses/ |
|||
|
|||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION |
|||
|
|||
1. Definitions. |
|||
|
|||
"License" shall mean the terms and conditions for use, reproduction, |
|||
and distribution as defined by Sections 1 through 9 of this document. |
|||
|
|||
"Licensor" shall mean the copyright owner or entity authorized by |
|||
the copyright owner that is granting the License. |
|||
|
|||
"Legal Entity" shall mean the union of the acting entity and all |
|||
other entities that control, are controlled by, or are under common |
|||
control with that entity. For the purposes of this definition, |
|||
"control" means (i) the power, direct or indirect, to cause the |
|||
direction or management of such entity, whether by contract or |
|||
otherwise, or (ii) ownership of fifty percent (50%) or more of the |
|||
outstanding shares, or (iii) beneficial ownership of such entity. |
|||
|
|||
"You" (or "Your") shall mean an individual or Legal Entity |
|||
exercising permissions granted by this License. |
|||
|
|||
"Source" form shall mean the preferred form for making modifications, |
|||
including but not limited to software source code, documentation |
|||
source, and configuration files. |
|||
|
|||
"Object" form shall mean any form resulting from mechanical |
|||
transformation or translation of a Source form, including but |
|||
not limited to compiled object code, generated documentation, |
|||
and conversions to other media types. |
|||
|
|||
"Work" shall mean the work of authorship, whether in Source or |
|||
Object form, made available under the License, as indicated by a |
|||
copyright notice that is included in or attached to the work |
|||
(an example is provided in the Appendix below). |
|||
|
|||
"Derivative Works" shall mean any work, whether in Source or Object |
|||
form, that is based on (or derived from) the Work and for which the |
|||
editorial revisions, annotations, elaborations, or other modifications |
|||
represent, as a whole, an original work of authorship. For the purposes |
|||
of this License, Derivative Works shall not include works that remain |
|||
separable from, or merely link (or bind by name) to the interfaces of, |
|||
the Work and Derivative Works thereof. |
|||
|
|||
"Contribution" shall mean any work of authorship, including |
|||
the original version of the Work and any modifications or additions |
|||
to that Work or Derivative Works thereof, that is intentionally |
|||
submitted to Licensor for inclusion in the Work by the copyright owner |
|||
or by an individual or Legal Entity authorized to submit on behalf of |
|||
the copyright owner. For the purposes of this definition, "submitted" |
|||
means any form of electronic, verbal, or written communication sent |
|||
to the Licensor or its representatives, including but not limited to |
|||
communication on electronic mailing lists, source code control systems, |
|||
and issue tracking systems that are managed by, or on behalf of, the |
|||
Licensor for the purpose of discussing and improving the Work, but |
|||
excluding communication that is conspicuously marked or otherwise |
|||
designated in writing by the copyright owner as "Not a Contribution." |
|||
|
|||
"Contributor" shall mean Licensor and any individual or Legal Entity |
|||
on behalf of whom a Contribution has been received by Licensor and |
|||
subsequently incorporated within the Work. |
|||
|
|||
2. Grant of Copyright License. Subject to the terms and conditions of |
|||
this License, each Contributor hereby grants to You a perpetual, |
|||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable |
|||
copyright license to reproduce, prepare Derivative Works of, |
|||
publicly display, publicly perform, sublicense, and distribute the |
|||
Work and such Derivative Works in Source or Object form. |
|||
|
|||
3. Grant of Patent License. Subject to the terms and conditions of |
|||
this License, each Contributor hereby grants to You a perpetual, |
|||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable |
|||
(except as stated in this section) patent license to make, have made, |
|||
use, offer to sell, sell, import, and otherwise transfer the Work, |
|||
where such license applies only to those patent claims licensable |
|||
by such Contributor that are necessarily infringed by their |
|||
Contribution(s) alone or by combination of their Contribution(s) |
|||
with the Work to which such Contribution(s) was submitted. If You |
|||
institute patent litigation against any entity (including a |
|||
cross-claim or counterclaim in a lawsuit) alleging that the Work |
|||
or a Contribution incorporated within the Work constitutes direct |
|||
or contributory patent infringement, then any patent licenses |
|||
granted to You under this License for that Work shall terminate |
|||
as of the date such litigation is filed. |
|||
|
|||
4. Redistribution. You may reproduce and distribute copies of the |
|||
Work or Derivative Works thereof in any medium, with or without |
|||
modifications, and in Source or Object form, provided that You |
|||
meet the following conditions: |
|||
|
|||
(a) You must give any other recipients of the Work or |
|||
Derivative Works a copy of this License; and |
|||
|
|||
(b) You must cause any modified files to carry prominent notices |
|||
stating that You changed the files; and |
|||
|
|||
(c) You must retain, in the Source form of any Derivative Works |
|||
that You distribute, all copyright, patent, trademark, and |
|||
attribution notices from the Source form of the Work, |
|||
excluding those notices that do not pertain to any part of |
|||
the Derivative Works; and |
|||
|
|||
(d) If the Work includes a "NOTICE" text file as part of its |
|||
distribution, then any Derivative Works that You distribute must |
|||
include a readable copy of the attribution notices contained |
|||
within such NOTICE file, excluding those notices that do not |
|||
pertain to any part of the Derivative Works, in at least one |
|||
of the following places: within a NOTICE text file distributed |
|||
as part of the Derivative Works; within the Source form or |
|||
documentation, if provided along with the Derivative Works; or, |
|||
within a display generated by the Derivative Works, if and |
|||
wherever such third-party notices normally appear. The contents |
|||
of the NOTICE file are for informational purposes only and |
|||
do not modify the License. You may add Your own attribution |
|||
notices within Derivative Works that You distribute, alongside |
|||
or as an addendum to the NOTICE text from the Work, provided |
|||
that such additional attribution notices cannot be construed |
|||
as modifying the License. |
|||
|
|||
You may add Your own copyright statement to Your modifications and |
|||
may provide additional or different license terms and conditions |
|||
for use, reproduction, or distribution of Your modifications, or |
|||
for any such Derivative Works as a whole, provided Your use, |
|||
reproduction, and distribution of the Work otherwise complies with |
|||
the conditions stated in this License. |
|||
|
|||
5. Submission of Contributions. Unless You explicitly state otherwise, |
|||
any Contribution intentionally submitted for inclusion in the Work |
|||
by You to the Licensor shall be under the terms and conditions of |
|||
this License, without any additional terms or conditions. |
|||
Notwithstanding the above, nothing herein shall supersede or modify |
|||
the terms of any separate license agreement you may have executed |
|||
with Licensor regarding such Contributions. |
|||
|
|||
6. Trademarks. This License does not grant permission to use the trade |
|||
names, trademarks, service marks, or product names of the Licensor, |
|||
except as required for reasonable and customary use in describing the |
|||
origin of the Work and reproducing the content of the NOTICE file. |
|||
|
|||
7. Disclaimer of Warranty. Unless required by applicable law or |
|||
agreed to in writing, Licensor provides the Work (and each |
|||
Contributor provides its Contributions) on an "AS IS" BASIS, |
|||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or |
|||
implied, including, without limitation, any warranties or conditions |
|||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A |
|||
PARTICULAR PURPOSE. You are solely responsible for determining the |
|||
appropriateness of using or redistributing the Work and assume any |
|||
risks associated with Your exercise of permissions under this License. |
|||
|
|||
8. Limitation of Liability. In no event and under no legal theory, |
|||
whether in tort (including negligence), contract, or otherwise, |
|||
unless required by applicable law (such as deliberate and grossly |
|||
negligent acts) or agreed to in writing, shall any Contributor be |
|||
liable to You for damages, including any direct, indirect, special, |
|||
incidental, or consequential damages of any character arising as a |
|||
result of this License or out of the use or inability to use the |
|||
Work (including but not limited to damages for loss of goodwill, |
|||
work stoppage, computer failure or malfunction, or any and all |
|||
other commercial damages or losses), even if such Contributor |
|||
has been advised of the possibility of such damages. |
|||
|
|||
9. Accepting Warranty or Additional Liability. While redistributing |
|||
the Work or Derivative Works thereof, You may choose to offer, |
|||
and charge a fee for, acceptance of support, warranty, indemnity, |
|||
or other liability obligations and/or rights consistent with this |
|||
License. However, in accepting such obligations, You may act only |
|||
on Your own behalf and on Your sole responsibility, not on behalf |
|||
of any other Contributor, and only if You agree to indemnify, |
|||
defend, and hold each Contributor harmless for any liability |
|||
incurred by, or claims asserted against, such Contributor by reason |
|||
of your accepting any such warranty or additional liability. |
|||
|
|||
END OF TERMS AND CONDITIONS |
|||
|
|||
APPENDIX: How to apply the Apache License to your work. |
|||
|
|||
To apply the Apache License to your work, attach the following |
|||
boilerplate notice, with the fields enclosed by brackets "{}" |
|||
replaced with your own identifying information. (Don't include |
|||
the brackets!) The text should be enclosed in the appropriate |
|||
comment syntax for the file format. We also recommend that a |
|||
file or class name and description of purpose be included on the |
|||
same "printed page" as the copyright notice for easier |
|||
identification within third-party archives. |
|||
|
|||
Copyright 2017 Unity Technologies |
|||
|
|||
Licensed under the Apache License, Version 2.0 (the "License"); |
|||
you may not use this file except in compliance with the License. |
|||
You may obtain a copy of the License at |
|||
|
|||
http://www.apache.org/licenses/LICENSE-2.0 |
|||
|
|||
Unless required by applicable law or agreed to in writing, software |
|||
distributed under the License is distributed on an "AS IS" BASIS, |
|||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
|||
See the License for the specific language governing permissions and |
|||
limitations under the License. |
|
|||
# API Reference |
|||
|
|||
Our developer-facing C# classes (Academy, Agent, Decision and |
|||
Monitor) have been documented to be compatabile with |
|||
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML |
|||
Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been |
|||
documented to be compatible with |
|||
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML |
|||
To generate the API reference, |
|||
[download Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) and run |
|||
the following command within the `docs/` directory: |
|||
To generate the API reference, |
|||
[download Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) |
|||
and run the following command within the `docs/` directory: |
|||
doxygen dox-ml-agents.conf |
|||
```sh |
|||
doxygen dox-ml-agents.conf |
|||
``` |
|||
that includes the classes that have been properly formatted. |
|||
The generated HTML files will be placed |
|||
in the `html/` subdirectory. Open `index.html` within that subdirectory to |
|||
navigate to the API reference home. Note that `html/` is already included in |
|||
the repository's `.gitignore` file. |
|||
that includes the classes that have been properly formatted. The generated HTML |
|||
files will be placed in the `html/` subdirectory. Open `index.html` within that |
|||
subdirectory to navigate to the API reference home. Note that `html/` is already |
|||
included in the repository's `.gitignore` file. |
|||
In the near future, we aim to expand our documentation |
|||
to include all the Unity C# classes and Python API. |
|||
In the near future, we aim to expand our documentation to include all the Unity |
|||
C# classes and Python API. |
|
|||
# Background: Jupyter |
|||
|
|||
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with |
|||
embedded visualizations. We provide one such notebook, `python/Basics.ipynb`, |
|||
for testing the Python control interface to a Unity build. This notebook is |
|||
introduced in the |
|||
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with |
|||
embedded visualizations. We provide one such notebook, |
|||
`notebooks/getting-started.ipynb`, for testing the Python control interface to a |
|||
Unity build. This notebook is introduced in the |
|||
in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the command line: |
|||
in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the |
|||
command line: |
|||
`jupyter notebook` |
|||
```sh |
|||
jupyter notebook |
|||
``` |
|||
|
|||
Then navigate to `localhost:8888` to access your notebooks. |
|
|||
# Frequently Asked Questions |
|||
|
|||
## Scripting Runtime Environment not setup correctly |
|||
### Scripting Runtime Environment not setup correctly |
|||
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6 |
|||
or .NET 4.x, you will see such error message: |
|||
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6 or .NET 4.x, you will see such error message: |
|||
|
|||
``` |
|||
```console |
|||
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution. |
|||
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer |
|||
to [Setting Up The ML-Agents Toolkit Within |
|||
Unity](Installation.md#setting-up-ml-agent-within-unity) for solution. |
|||
### TensorFlowSharp flag not turned on. |
|||
## TensorFlowSharp flag not turned on |
|||
If you have already imported the TensorFlowSharp plugin, but havn't set ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the following error message: |
|||
If you have already imported the TensorFlowSharp plugin, but haven't set |
|||
ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the |
|||
following error message: |
|||
``` |
|||
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain. |
|||
```console |
|||
You need to install and enable the TensorFlowSharp plugin in order to use the Internal Brain. |
|||
This error message occurs because the TensorFlowSharp plugin won't be usage without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution. |
|||
This error message occurs because the TensorFlowSharp plugin won't be usage |
|||
without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit |
|||
Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution. |
|||
### Tensorflow epsilon placeholder error |
|||
## Instance of CoreBrainInternal couldn't be created |
|||
If you have a graph placeholder set in the internal Brain inspector that is not present in the TensorFlow graph, you will see some error like this: |
|||
If you try to use ML-Agents in Unity versions 2017.1 - 2017.3, you might |
|||
encounter an error that looks like this: |
|||
``` |
|||
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>. |
|||
```console |
|||
Instance of CoreBrainInternal couldn't be created. The the script |
|||
class needs to derive from ScriptableObject. |
|||
UnityEngine.ScriptableObject:CreateInstance(String) |
|||
Solution: Go to all of your Brain object, find `Graph placeholders` and change its `size` to 0 to remove the `epsilon` placeholder. |
|||
You can fix the error by removing `CoreBrain` from CoreBrainInternal.cs:16, |
|||
clicking on your Brain Gameobject to let the scene recompile all the changed |
|||
C# scripts, then adding the `CoreBrain` back. Make sure your brain is in |
|||
Internal mode, your TensorFlowSharp plugin is imported and the |
|||
ENABLE_TENSORFLOW flag is set. This fix is only valid locally and unstable. |
|||
|
|||
## Tensorflow epsilon placeholder error |
|||
Similarly, if you have a graph scope set in the internal Brain inspector that is not correctly set, you will see some error like this: |
|||
If you have a graph placeholder set in the Internal Brain inspector that is not |
|||
present in the TensorFlow graph, you will see some error like this: |
|||
```console |
|||
UnityAgentsException: One of the TensorFlow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>. |
|||
|
|||
Solution: Go to all of your Brain object, find `Graph placeholders` and change |
|||
its `size` to 0 to remove the `epsilon` placeholder. |
|||
|
|||
Similarly, if you have a graph scope set in the Internal Brain inspector that is |
|||
not correctly set, you will see some error like this: |
|||
|
|||
```console |
|||
Solution: Make sure your Graph Scope field matches the corresponding brain object name in your Hierachy Inspector when there is multiple brain. |
|||
Solution: Make sure your Graph Scope field matches the corresponding Brain |
|||
object name in your Hierarchy Inspector when there are multiple Brains. |
|||
### Environment Permission Error |
|||
## Environment Permission Error |
|||
If you directly import your Unity environment without building it in the |
|||
editor, you might need to give it additional permissions to execute it. |
|||
If you directly import your Unity environment without building it in the |
|||
editor, you might need to give it additional permissions to execute it. |
|||
`chmod -R 755 *.app` |
|||
```sh |
|||
chmod -R 755 *.app |
|||
``` |
|||
`chmod -R 755 *.x86_64` |
|||
```sh |
|||
chmod -R 755 *.x86_64 |
|||
``` |
|||
On Windows, you can find |
|||
On Windows, you can find |
|||
### Environment Connection Timeout |
|||
## Environment Connection Timeout |
|||
If you are able to launch the environment from `UnityEnvironment` but |
|||
then receive a timeout error, there may be a number of possible causes. |
|||
* _Cause_: There may be no Brains in your environment which are set |
|||
to `External`. In this case, the environment will not attempt to |
|||
communicate with python. _Solution_: Set the Brains(s) you wish to |
|||
externally control through the Python API to `External` from the |
|||
Unity Editor, and rebuild the environment. |
|||
* _Cause_: On OSX, the firewall may be preventing communication with |
|||
the environment. _Solution_: Add the built environment binary to the |
|||
list of exceptions on the firewall by following |
|||
[instructions](https://support.apple.com/en-us/HT201642). |
|||
* _Cause_: An error happened in the Unity Environment preventing |
|||
communication. _Solution_: Look into the |
|||
[log files](https://docs.unity3d.com/Manual/LogFiles.html) |
|||
generated by the Unity Environment to figure what error happened. |
|||
If you are able to launch the environment from `UnityEnvironment` but then |
|||
receive a timeout error, there may be a number of possible causes. |
|||
### Communication port {} still in use |
|||
* _Cause_: There may be no Brains in your environment which are set to |
|||
`External`. In this case, the environment will not attempt to communicate |
|||
with python. _Solution_: Set the Brains(s) you wish to externally control |
|||
through the Python API to `External` from the Unity Editor, and rebuild the |
|||
environment. |
|||
* _Cause_: On OSX, the firewall may be preventing communication with the |
|||
environment. _Solution_: Add the built environment binary to the list of |
|||
exceptions on the firewall by following |
|||
[instructions](https://support.apple.com/en-us/HT201642). |
|||
* _Cause_: An error happened in the Unity Environment preventing communication. |
|||
_Solution_: Look into the [log |
|||
files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity |
|||
Environment to figure what error happened. |
|||
If you receive an exception `"Couldn't launch new environment because |
|||
communication port {} is still in use. "`, you can change the worker |
|||
number in the Python script when calling |
|||
## Communication port {} still in use |
|||
`UnityEnvironment(file_name=filename, worker_id=X)` |
|||
If you receive an exception `"Couldn't launch new environment because |
|||
communication port {} is still in use. "`, you can change the worker number in |
|||
the Python script when calling |
|||
### Mean reward : nan |
|||
```python |
|||
UnityEnvironment(file_name=filename, worker_id=X) |
|||
``` |
|||
If you receive a message `Mean reward : nan` when attempting to train a |
|||
model using PPO, this is due to the episodes of the learning environment |
|||
not terminating. In order to address this, set `Max Steps` for either |
|||
the Academy or Agents within the Scene Inspector to a value greater |
|||
than 0. Alternatively, it is possible to manually set `done` conditions |
|||
for episodes from within scripts for custom episode-terminating events. |
|||
## Mean reward : nan |
|||
|
|||
If you receive a message `Mean reward : nan` when attempting to train a model |
|||
using PPO, this is due to the episodes of the Learning Environment not |
|||
terminating. In order to address this, set `Max Steps` for either the Academy or |
|||
Agents within the Scene Inspector to a value greater than 0. Alternatively, it |
|||
is possible to manually set `done` conditions for episodes from within scripts |
|||
for custom episode-terminating events. |
|
|||
# Getting Started with the 3D Balance Ball Environment |
|||
|
|||
This tutorial walks through the end-to-end process of opening a ML-Agents toolkit |
|||
example environment in Unity, building the Unity executable, training an agent |
|||
in it, and finally embedding the trained model into the Unity environment. |
|||
This tutorial walks through the end-to-end process of opening a ML-Agents |
|||
toolkit example environment in Unity, building the Unity executable, training an |
|||
Agent in it, and finally embedding the trained model into the Unity environment. |
|||
The ML-Agents toolkit includes a number of [example environments](Learning-Environment-Examples.md) |
|||
which you can examine to help understand the different ways in which the ML-Agents toolkit |
|||
can be used. These environments can also serve as templates for new |
|||
environments or as ways to test new ML algorithms. After reading this tutorial, |
|||
you should be able to explore and build the example environments. |
|||
The ML-Agents toolkit includes a number of [example |
|||
environments](Learning-Environment-Examples.md) which you can examine to help |
|||
understand the different ways in which the ML-Agents toolkit can be used. These |
|||
environments can also serve as templates for new environments or as ways to test |
|||
new ML algorithms. After reading this tutorial, you should be able to explore |
|||
and build the example environments. |
|||
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains |
|||
a number of platforms and balls (which are all copies of each other). |
|||
Each platform tries to keep its ball from falling by rotating either |
|||
horizontally or vertically. In this environment, a platform is an **agent** |
|||
that receives a reward for every step that it balances the ball. An agent is |
|||
also penalized with a negative reward for dropping the ball. The goal of the |
|||
training process is to have the platforms learn to never drop the ball. |
|||
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball |
|||
contains a number of platforms and balls (which are all copies of each other). |
|||
Each platform tries to keep its ball from falling by rotating either |
|||
horizontally or vertically. In this environment, a platform is an **Agent** that |
|||
receives a reward for every step that it balances the ball. An agent is also |
|||
penalized with a negative reward for dropping the ball. The goal of the training |
|||
process is to have the platforms learn to never drop the ball. |
|||
In order to install and set up the ML-Agents toolkit, the Python dependencies and Unity, |
|||
see the [installation instructions](Installation.md). |
|||
In order to install and set up the ML-Agents toolkit, the Python dependencies |
|||
and Unity, see the [installation instructions](Installation.md). |
|||
An agent is an autonomous actor that observes and interacts with an |
|||
_environment_. In the context of Unity, an environment is a scene containing |
|||
an Academy and one or more Brain and Agent objects, and, of course, the other |
|||
entities that an agent interacts with. |
|||
An agent is an autonomous actor that observes and interacts with an |
|||
_environment_. In the context of Unity, an environment is a scene containing an |
|||
Academy and one or more Brain and Agent objects, and, of course, the other |
|||
entities that an agent interacts with. |
|||
**Note:** In Unity, the base object of everything in a scene is the |
|||
_GameObject_. The GameObject is essentially a container for everything else, |
|||
including behaviors, graphics, physics, etc. To see the components that make |
|||
up a GameObject, select the GameObject in the Scene window, and open the |
|||
Inspector window. The Inspector shows every component on a GameObject. |
|||
|
|||
The first thing you may notice after opening the 3D Balance Ball scene is that |
|||
it contains not one, but several platforms. Each platform in the scene is an |
|||
independent agent, but they all share the same brain. 3D Balance Ball does this |
|||
**Note:** In Unity, the base object of everything in a scene is the |
|||
_GameObject_. The GameObject is essentially a container for everything else, |
|||
including behaviors, graphics, physics, etc. To see the components that make up |
|||
a GameObject, select the GameObject in the Scene window, and open the Inspector |
|||
window. The Inspector shows every component on a GameObject. |
|||
|
|||
The first thing you may notice after opening the 3D Balance Ball scene is that |
|||
it contains not one, but several platforms. Each platform in the scene is an |
|||
independent agent, but they all share the same Brain. 3D Balance Ball does this |
|||
The Academy object for the scene is placed on the Ball3DAcademy GameObject. |
|||
When you look at an Academy component in the inspector, you can see several |
|||
properties that control how the environment works. For example, the |
|||
**Training** and **Inference Configuration** properties set the graphics and |
|||
timescale properties for the Unity application. The Academy uses the |
|||
**Training Configuration** during training and the **Inference Configuration** |
|||
when not training. (*Inference* means that the agent is using a trained model |
|||
or heuristics or direct control — in other words, whenever **not** training.) |
|||
Typically, you set low graphics quality and a high time scale for the |
|||
**Training configuration** and a high graphics quality and the timescale to |
|||
`1.0` for the **Inference Configuration** . |
|||
The Academy object for the scene is placed on the Ball3DAcademy GameObject. When |
|||
you look at an Academy component in the inspector, you can see several |
|||
properties that control how the environment works. For example, the **Training** |
|||
and **Inference Configuration** properties set the graphics and timescale |
|||
properties for the Unity application. The Academy uses the **Training |
|||
Configuration** during training and the **Inference Configuration** when not |
|||
training. (*Inference* means that the Agent is using a trained model or |
|||
heuristics or direct control — in other words, whenever **not** training.) |
|||
Typically, you set low graphics quality and a high time scale for the **Training |
|||
configuration** and a high graphics quality and the timescale to `1.0` for the |
|||
**Inference Configuration** . |
|||
**Note:** if you want to observe the environment during training, you can |
|||
adjust the **Inference Configuration** settings to use a larger window and a |
|||
timescale closer to 1:1. Be sure to set these parameters back when training in |
|||
earnest; otherwise, training can take a very long time. |
|||
**Note:** if you want to observe the environment during training, you can adjust |
|||
the **Inference Configuration** settings to use a larger window and a timescale |
|||
closer to 1:1. Be sure to set these parameters back when training in earnest; |
|||
otherwise, training can take a very long time. |
|||
Another aspect of an environment to look at is the Academy implementation. |
|||
Since the base Academy class is abstract, you must always define a subclass. |
|||
There are three functions you can implement, though they are all optional: |
|||
Another aspect of an environment to look at is the Academy implementation. Since |
|||
the base Academy class is abstract, you must always define a subclass. There are |
|||
three functions you can implement, though they are all optional: |
|||
* Academy.AcademyStep() — Called at every simulation step before |
|||
Agent.AgentAction() (and after the agents collect their observations). |
|||
* Academy.AcademyReset() — Called when the Academy starts or restarts the |
|||
simulation (including the first time). |
|||
* Academy.AcademyStep() — Called at every simulation step before |
|||
agent.AgentAction() (and after the Agents collect their observations). |
|||
* Academy.AcademyReset() — Called when the Academy starts or restarts the |
|||
simulation (including the first time). |
|||
The 3D Balance Ball environment does not use these functions — each agent |
|||
resets itself when needed — but many environments do use these functions to |
|||
control the environment around the agents. |
|||
The 3D Balance Ball environment does not use these functions — each Agent resets |
|||
itself when needed — but many environments do use these functions to control the |
|||
environment around the Agents. |
|||
The Ball3DBrain GameObject in the scene, which contains a Brain component, |
|||
is a child of the Academy object. (All Brain objects in a scene must be |
|||
children of the Academy.) All the agents in the 3D Balance Ball environment |
|||
use the same Brain instance. |
|||
A Brain doesn't store any information about an agent, |
|||
it just routes the agent's collected observations to the decision making |
|||
process and returns the chosen action to the agent. Thus, all agents can share |
|||
the same brain, but act independently. The Brain settings tell you quite a bit |
|||
about how an agent works. |
|||
The Ball3DBrain GameObject in the scene, which contains a Brain component, is a |
|||
child of the Academy object. (All Brain objects in a scene must be children of |
|||
the Academy.) All the Agents in the 3D Balance Ball environment use the same |
|||
Brain instance. A Brain doesn't store any information about an Agent, it just |
|||
routes the Agent's collected observations to the decision making process and |
|||
returns the chosen action to the Agent. Thus, all Agents can share the same |
|||
Brain, but act independently. The Brain settings tell you quite a bit about how |
|||
an Agent works. |
|||
The **Brain Type** determines how an agent makes its decisions. The |
|||
**External** and **Internal** types work together — use **External** when |
|||
training your agents; use **Internal** when using the trained model. |
|||
The **Heuristic** brain allows you to hand-code the agent's logic by extending |
|||
the Decision class. Finally, the **Player** brain lets you map keyboard |
|||
commands to actions, which can be useful when testing your agents and |
|||
environment. If none of these types of brains do what you need, you can |
|||
implement your own CoreBrain to create your own type. |
|||
The **Brain Type** determines how an Agent makes its decisions. The **External** |
|||
and **Internal** types work together — use **External** when training your |
|||
Agents; use **Internal** when using the trained model. The **Heuristic** Brain |
|||
allows you to hand-code the Agent's logic by extending the Decision class. |
|||
Finally, the **Player** Brain lets you map keyboard commands to actions, which |
|||
can be useful when testing your agents and environment. If none of these types |
|||
of Brains do what you need, you can implement your own CoreBrain to create your |
|||
own type. |
|||
In this tutorial, you will set the **Brain Type** to **External** for training; |
|||
In this tutorial, you will set the **Brain Type** to **External** for training; |
|||
**Vector Observation Space** |
|||
#### Vector Observation Space |
|||
Before making a decision, an agent collects its observation about its state |
|||
in the world. The ML-Agents toolkit classifies vector observations into two types: |
|||
**Continuous** and **Discrete**. The **Continuous** vector observation space |
|||
collects observations in a vector of floating point numbers. The **Discrete** |
|||
vector observation space is an index into a table of states. Most of the example |
|||
environments use a continuous vector observation space. |
|||
Before making a decision, an agent collects its observation about its state in |
|||
the world. The vector observation is a vector of floating point numbers which |
|||
contain relevant information for the agent to make decisions. |
|||
The Brain instance used in the 3D Balance Ball example uses the **Continuous** |
|||
vector observation space with a **State Size** of 8. This means that the |
|||
feature vector containing the agent's observations contains eight elements: |
|||
the `x` and `z` components of the platform's rotation and the `x`, `y`, and `z` |
|||
components of the ball's relative position and velocity. (The observation |
|||
values are defined in the agent's `CollectObservations()` function.) |
|||
The Brain instance used in the 3D Balance Ball example uses the **Continuous** |
|||
vector observation space with a **State Size** of 8. This means that the feature |
|||
vector containing the Agent's observations contains eight elements: the `x` and |
|||
`z` components of the platform's rotation and the `x`, `y`, and `z` components |
|||
of the ball's relative position and velocity. (The observation values are |
|||
defined in the Agent's `CollectObservations()` function.) |
|||
**Vector Action Space** |
|||
#### Vector Action Space |
|||
An agent is given instructions from the brain in the form of *actions*. Like |
|||
states, ML-Agents toolkit classifies actions into two types: the **Continuous** |
|||
vector action space is a vector of numbers that can vary continuously. What |
|||
each element of the vector means is defined by the agent logic (the PPO |
|||
training process just learns what values are better given particular state |
|||
observations based on the rewards received when it tries different values). |
|||
For example, an element might represent a force or torque applied to a |
|||
`RigidBody` in the agent. The **Discrete** action vector space defines its |
|||
actions as a table. A specific action given to the agent is an index into |
|||
this table. |
|||
An Agent is given instructions from the Brain in the form of *actions*. |
|||
ML-Agents toolkit classifies actions into two types: the **Continuous** vector |
|||
action space is a vector of numbers that can vary continuously. What each |
|||
element of the vector means is defined by the Agent logic (the PPO training |
|||
process just learns what values are better given particular state observations |
|||
based on the rewards received when it tries different values). For example, an |
|||
element might represent a force or torque applied to a `Rigidbody` in the Agent. |
|||
The **Discrete** action vector space defines its actions as tables. An action |
|||
given to the Agent is an array of indices into tables. |
|||
space. |
|||
You can try training with both settings to observe whether there is a |
|||
difference. (Set the `Vector Action Space Size` to 4 when using the discrete |
|||
space. You can try training with both settings to observe whether there is a |
|||
difference. (Set the `Vector Action Space Size` to 4 when using the discrete |
|||
|
|||
|
|||
The Agent is the actor that observes and takes actions in the environment. |
|||
In the 3D Balance Ball environment, the Agent components are placed on the |
|||
twelve Platform GameObjects. The base Agent object has a few properties that |
|||
affect its behavior: |
|||
The Agent is the actor that observes and takes actions in the environment. In |
|||
the 3D Balance Ball environment, the Agent components are placed on the twelve |
|||
Platform GameObjects. The base Agent object has a few properties that affect its |
|||
behavior: |
|||
* **Brain** — Every agent must have a Brain. The brain determines how an agent |
|||
makes decisions. All the agents in the 3D Balance Ball scene share the same |
|||
brain. |
|||
* **Visual Observations** — Defines any Camera objects used by the agent to |
|||
observe its environment. 3D Balance Ball does not use camera observations. |
|||
* **Max Step** — Defines how many simulation steps can occur before the agent |
|||
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps. |
|||
* **Reset On Done** — Defines whether an agent starts over when it is finished. |
|||
3D Balance Ball sets this true so that the agent restarts after reaching the |
|||
**Max Step** count or after dropping the ball. |
|||
* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent |
|||
makes decisions. All the Agents in the 3D Balance Ball scene share the same |
|||
Brain. |
|||
* **Visual Observations** — Defines any Camera objects used by the Agent to |
|||
observe its environment. 3D Balance Ball does not use camera observations. |
|||
* **Max Step** — Defines how many simulation steps can occur before the Agent |
|||
decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps. |
|||
* **Reset On Done** — Defines whether an Agent starts over when it is finished. |
|||
3D Balance Ball sets this true so that the Agent restarts after reaching the |
|||
**Max Step** count or after dropping the ball. |
|||
Perhaps the more interesting aspect of an agent is the Agent subclass |
|||
implementation. When you create an agent, you must extend the base Agent class. |
|||
Perhaps the more interesting aspect of an agents is the Agent subclass |
|||
implementation. When you create an Agent, you must extend the base Agent class. |
|||
* Agent.AgentReset() — Called when the Agent resets, including at the beginning |
|||
of a session. The Ball3DAgent class uses the reset function to reset the |
|||
platform and ball. The function randomizes the reset values so that the |
|||
training generalizes to more than a specific starting position and platform |
|||
attitude. |
|||
* Agent.CollectObservations() — Called every simulation step. Responsible for |
|||
collecting the agent's observations of the environment. Since the Brain |
|||
instance assigned to the agent is set to the continuous vector observation |
|||
space with a state size of 8, the `CollectObservations()` must call |
|||
`AddVectorObs` 8 times. |
|||
* Agent.AgentAction() — Called every simulation step. Receives the action chosen |
|||
by the brain. The Ball3DAgent example handles both the continuous and the |
|||
discrete action space types. There isn't actually much difference between the |
|||
two state types in this environment — both vector action spaces result in a |
|||
small change in platform rotation at each step. The `AgentAction()` function |
|||
assigns a reward to the agent; in this example, an agent receives a small |
|||
positive reward for each step it keeps the ball on the platform and a larger, |
|||
negative reward for dropping the ball. An agent is also marked as done when it |
|||
drops the ball so that it will reset with a new ball for the next simulation |
|||
step. |
|||
* agent.AgentReset() — Called when the Agent resets, including at the beginning |
|||
of a session. The Ball3DAgent class uses the reset function to reset the |
|||
platform and ball. The function randomizes the reset values so that the |
|||
training generalizes to more than a specific starting position and platform |
|||
attitude. |
|||
* agent.CollectObservations() — Called every simulation step. Responsible for |
|||
collecting the Agent's observations of the environment. Since the Brain |
|||
instance assigned to the Agent is set to the continuous vector observation |
|||
space with a state size of 8, the `CollectObservations()` must call |
|||
`AddVectorObs` 8 times. |
|||
* agent.AgentAction() — Called every simulation step. Receives the action chosen |
|||
by the Brain. The Ball3DAgent example handles both the continuous and the |
|||
discrete action space types. There isn't actually much difference between the |
|||
two state types in this environment — both vector action spaces result in a |
|||
small change in platform rotation at each step. The `AgentAction()` function |
|||
assigns a reward to the Agent; in this example, an Agent receives a small |
|||
positive reward for each step it keeps the ball on the platform and a larger, |
|||
negative reward for dropping the ball. An Agent is also marked as done when it |
|||
drops the ball so that it will reset with a new ball for the next simulation |
|||
step. |
|||
Now that we have an environment, we can perform the training. |
|||
Now that we have an environment, we can perform the training. |
|||
In order to train an agent to correctly balance the ball, we will use a |
|||
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). |
|||
This is a method that has been shown to be safe, efficient, and more general |
|||
purpose than many other RL algorithms, as such we have chosen it as the |
|||
example algorithm for use with ML-Agents toolkit. For more information on PPO, |
|||
OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) |
|||
In order to train an agent to correctly balance the ball, we will use a |
|||
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This |
|||
is a method that has been shown to be safe, efficient, and more general purpose |
|||
than many other RL algorithms, as such we have chosen it as the example |
|||
algorithm for use with ML-Agents toolkit. For more information on PPO, OpenAI |
|||
has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) |
|||
|
|||
To train the agents within the Ball Balance environment, we will be using the python |
|||
package. We have provided a convenient Python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases. |
|||
To train the agents within the Ball Balance environment, we will be using the |
|||
Python package. We have provided a convenient script called `mlagents-learn` |
|||
which accepts arguments used to configure both training and inference phases. |
|||
We can use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When using TensorBoard to observe the training statistics, it helps to set this to a sequential value |
|||
for each training run. In other words, "BalanceBall1" for the first run, |
|||
"BalanceBall2" or the second, and so on. If you don't, the summaries for |
|||
every training run are saved to the same directory and will all be included |
|||
on the same graph. |
|||
We can use `run_id` to identify the experiment and create a folder where the |
|||
model and summary statistics are stored. When using TensorBoard to observe the |
|||
training statistics, it helps to set this to a sequential value for each |
|||
training run. In other words, "BalanceBall1" for the first run, "BalanceBall2" |
|||
or the second, and so on. If you don't, the summaries for every training run are |
|||
saved to the same directory and will all be included on the same graph. |
|||
To summarize, go to your command line, enter the `ml-agents/python` directory and type: |
|||
To summarize, go to your command line, enter the `ml-agents` directory and type: |
|||
``` |
|||
python3 learn.py --run-id=<run-identifier> --train |
|||
```sh |
|||
mlagents-learn config/trainer_config.yaml --run-id=<run-identifier> --train |
|||
When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor. |
|||
When the message _"Start training by pressing the Play button in the Unity |
|||
Editor"_ is displayed on the screen, you can press the :arrow_forward: button in |
|||
Unity to start training in the Editor. |
|||
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first. |
|||
**Note**: If you're using Anaconda, don't forget to activate the ml-agents |
|||
environment first. |
|||
The `--train` flag tells the ML-Agents toolkit to run in training mode. |
|||
The `--train` flag tells the ML-Agents toolkit to run in training mode. |
|||
**Note**: You can train using an executable rather than the Editor. To do so, follow the intructions in |
|||
[Using an Execuatble](Learning-Environment-Executable.md). |
|||
|
|||
**Note**: You can train using an executable rather than the Editor. To do so, |
|||
follow the intructions in |
|||
[Using an Executable](Learning-Environment-Executable.md). |
|||
Once you start training using `learn.py` in the way described in the previous section, the `ml-agents/python` folder will |
|||
contain a `summaries` directory. In order to observe the training process |
|||
in more detail, you can use TensorBoard. From the command line navigate to `ml-agents/python` folder and run: |
|||
Once you start training using `mlagents-learn` in the way described in the |
|||
previous section, the `ml-agents` directory will contain a `summaries` |
|||
directory. In order to observe the training process in more detail, you can use |
|||
TensorBoard. From the command line run: |
|||
`tensorboard --logdir=summaries` |
|||
```sh |
|||
tensorboard --logdir=summaries |
|||
``` |
|||
* Lesson - only interesting when performing |
|||
[curriculum training](Training-Curriculum-Learning.md). |
|||
This is not used in the 3D Balance Ball environment. |
|||
* Cumulative Reward - The mean cumulative episode reward over all agents. |
|||
Should increase during a successful training session. |
|||
* Entropy - How random the decisions of the model are. Should slowly decrease |
|||
during a successful training process. If it decreases too quickly, the `beta` |
|||
hyperparameter should be increased. |
|||
* Episode Length - The mean length of each episode in the environment for all |
|||
agents. |
|||
* Learning Rate - How large a step the training algorithm takes as it searches |
|||
for the optimal policy. Should decrease over time. |
|||
* Lesson - only interesting when performing [curriculum |
|||
training](Training-Curriculum-Learning.md). This is not used in the 3D Balance |
|||
Ball environment. |
|||
* Cumulative Reward - The mean cumulative episode reward over all agents. Should |
|||
increase during a successful training session. |
|||
* Entropy - How random the decisions of the model are. Should slowly decrease |
|||
during a successful training process. If it decreases too quickly, the `beta` |
|||
hyperparameter should be increased. |
|||
* Episode Length - The mean length of each episode in the environment for all |
|||
agents. |
|||
* Learning Rate - How large a step the training algorithm takes as it searches |
|||
for the optimal policy. Should decrease over time. |
|||
much the policy (process for deciding actions) is changing. The magnitude of |
|||
this should decrease during a successful training session. |
|||
* Value Estimate - The mean value estimate for all states visited by the agent. |
|||
Should increase during a successful training session. |
|||
much the policy (process for deciding actions) is changing. The magnitude of |
|||
this should decrease during a successful training session. |
|||
* Value Estimate - The mean value estimate for all states visited by the agent. |
|||
Should increase during a successful training session. |
|||
well the model is able to predict the value of each state. This should decrease |
|||
during a successful training session. |
|||
well the model is able to predict the value of each state. This should |
|||
decrease during a successful training session. |
|||
Once the training process completes, and the training process saves the model |
|||
(denoted by the `Saved Model` message) you can add it to the Unity project and |
|||
use it with agents having an **Internal** brain type. |
|||
**Note:** Do not just close the Unity Window once the `Saved Model` message appears. Either wait for the training process to close the window or press Ctrl+C at the command-line prompt. If you simply close the window manually, the .bytes file containing the trained model is not exported into the ml-agents folder. |
|||
Once the training process completes, and the training process saves the model |
|||
(denoted by the `Saved Model` message) you can add it to the Unity project and |
|||
use it with Agents having an **Internal** Brain type. **Note:** Do not just |
|||
close the Unity Window once the `Saved Model` message appears. Either wait for |
|||
the training process to close the window or press Ctrl+C at the command-line |
|||
prompt. If you simply close the window manually, the .bytes file containing the |
|||
trained model is not exported into the ml-agents folder. |
|||
Because TensorFlowSharp support is still experimental, it is disabled by |
|||
default. In order to enable it, you must follow these steps. Please note that |
|||
Because TensorFlowSharp support is still experimental, it is disabled by |
|||
default. In order to enable it, you must follow these steps. Please note that |
|||
To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section. |
|||
of the Basic Guide page. |
|||
To set up the TensorFlowSharp Support, follow [Setting up ML-Agents Toolkit |
|||
within Unity](Basic-Guide.md#setting-up-ml-agents-within-unity) section. of the |
|||
Basic Guide page. |
|||
To embed the trained model into Unity, follow the later part of [Training the Brain with Reinforcement Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section of the Basic Buides page. |
|||
To embed the trained model into Unity, follow the later part of [Training the |
|||
Brain with Reinforcement |
|||
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section |
|||
of the Basic Guide page. |
|
|||
# ML-Agents Toolkit Glossary |
|||
|
|||
* **Academy** - Unity Component which controls timing, reset, and |
|||
training/inference settings of the environment. |
|||
* **Action** - The carrying-out of a decision on the part of an |
|||
agent within the environment. |
|||
* **Agent** - Unity Component which produces observations and |
|||
takes actions in the environment. Agents actions are determined |
|||
by decisions produced by a linked Brain. |
|||
* **Brain** - Unity Component which makes decisions for the agents |
|||
linked to it. |
|||
* **Decision** - The specification produced by a Brain for an action |
|||
to be carried out given an observation. |
|||
* **Editor** - The Unity Editor, which may include any pane |
|||
(e.g. Hierarchy, Scene, Inspector). |
|||
* **Environment** - The Unity scene which contains Agents, Academy, |
|||
and Brains. |
|||
* **FixedUpdate** - Unity method called each time the the game engine |
|||
is stepped. ML-Agents logic should be placed here. |
|||
* **Frame** - An instance of rendering the main camera for the |
|||
display. Corresponds to each `Update` call of the game engine. |
|||
* **Observation** - Partial information describing the state of the |
|||
environment available to a given agent. (e.g. Vector, Visual, Text) |
|||
* **Policy** - Function for producing decisions from observations. |
|||
* **Reward** - Signal provided at every step used to indicate |
|||
desirability of an agent’s action within the current state |
|||
of the environment. |
|||
* **State** - The underlying properties of the environment |
|||
(including all agents within it) at a given time. |
|||
* **Step** - Corresponds to each `FixedUpdate` call of the game engine. |
|||
Is the smallest atomic change to the state possible. |
|||
* **Update** - Unity function called each time a frame is rendered. |
|||
ML-Agents logic should not be placed here. |
|||
* **External Coordinator** - ML-Agents class responsible for |
|||
communication with outside processes (in this case, the Python API). |
|||
* **Trainer** - Python class which is responsible for training a given |
|||
external brain. Contains TensorFlow graph which makes decisions |
|||
for external brain. |
|||
* **Academy** - Unity Component which controls timing, reset, and |
|||
training/inference settings of the environment. |
|||
* **Action** - The carrying-out of a decision on the part of an agent within the |
|||
environment. |
|||
* **Agent** - Unity Component which produces observations and takes actions in |
|||
the environment. Agents actions are determined by decisions produced by a |
|||
linked Brain. |
|||
* **Brain** - Unity Component which makes decisions for the agents linked to it. |
|||
* **Decision** - The specification produced by a Brain for an action to be |
|||
carried out given an observation. |
|||
* **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy, |
|||
Scene, Inspector). |
|||
* **Environment** - The Unity scene which contains Agents, Academy, and Brains. |
|||
* **FixedUpdate** - Unity method called each time the the game engine is |
|||
stepped. ML-Agents logic should be placed here. |
|||
* **Frame** - An instance of rendering the main camera for the display. |
|||
Corresponds to each `Update` call of the game engine. |
|||
* **Observation** - Partial information describing the state of the environment |
|||
available to a given agent. (e.g. Vector, Visual, Text) |
|||
* **Policy** - Function for producing decisions from observations. |
|||
* **Reward** - Signal provided at every step used to indicate desirability of an |
|||
agent’s action within the current state of the environment. |
|||
* **State** - The underlying properties of the environment (including all agents |
|||
within it) at a given time. |
|||
* **Step** - Corresponds to each `FixedUpdate` call of the game engine. Is the |
|||
smallest atomic change to the state possible. |
|||
* **Update** - Unity function called each time a frame is rendered. ML-Agents |
|||
logic should not be placed here. |
|||
* **External Coordinator** - ML-Agents class responsible for communication with |
|||
outside processes (in this case, the Python API). |
|||
* **Trainer** - Python class which is responsible for training a given External |
|||
Brain. Contains TensorFlow graph which makes decisions for External Brain. |
|
|||
# Installation |
|||
|
|||
To install and use ML-Agents, you need install Unity, clone this repository |
|||
and install Python with additional dependencies. Each of the subsections |
|||
below overviews each step, in addition to a Docker set-up. |
|||
To install and use ML-Agents, you need install Unity, clone this repository and |
|||
install Python with additional dependencies. Each of the subsections below |
|||
overviews each step, in addition to a Docker set-up. |
|||
## Install **Unity 2017.1** or Later |
|||
## Install **Unity 2017.4** or Later |
|||
like to use our Docker set-up (introduced later), make sure to select the |
|||
_Linux Build Support_ component when installing Unity. |
|||
like to use our Docker set-up (introduced later), make sure to select the _Linux |
|||
Build Support_ component when installing Unity. |
|||
<img src="images/unity_linux_build_support.png" |
|||
alt="Linux Build Support" |
|||
width="500" border="10" /> |
|||
<img src="images/unity_linux_build_support.png" |
|||
alt="Linux Build Support" |
|||
width="500" border="10" /> |
|||
## Clone the Ml-Agents Repository |
|||
## Clone the ML-Agents Toolkit Repository |
|||
|
|||
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository. |
|||
|
|||
```sh |
|||
git clone https://github.com/Unity-Technologies/ml-agents.git |
|||
``` |
|||
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository. |
|||
The `UnitySDK` subdirectory contains the Unity Assets to add to your projects. |
|||
It also contains many [example environments](Learning-Environment-Examples.md) |
|||
that can be used to help get you familiar with Unity. |
|||
git clone https://github.com/Unity-Technologies/ml-agents.git |
|||
The `ml-agents` subdirectory contains Python packages which provide |
|||
trainers and a Python API to interface with Unity. |
|||
The `unity-environment` directory in this repository contains the Unity Assets |
|||
to add to your projects. The `python` directory contains the training code. |
|||
Both directories are located at the root of the repository. |
|||
The `gym-unity` subdirectory contains a package to interface with OpenAI Gym. |
|||
## Install Python (with Dependencies) |
|||
## Install Python and mlagents Package |
|||
In order to use ML-Agents toolkit, you need Python 3.5 or 3.6 along with |
|||
the dependencies listed in the [requirements file](../python/requirements.txt). |
|||
In order to use ML-Agents toolkit, you need Python 3.6 along with the |
|||
dependencies listed in the [requirements file](../ml-agents/requirements.txt). |
|||
- [TensorFlow](Background-TensorFlow.md) |
|||
- [Jupyter](Background-Jupyter.md) |
|||
|
|||
- [TensorFlow](Background-TensorFlow.md) |
|||
- [Jupyter](Background-Jupyter.md) |
|||
|
|||
### NOTES |
|||
|
|||
- We do not currently support Python 3.7 or Python 3.5. |
|||
- If you are using Anaconda and are having trouble with TensorFlow, please see |
|||
the following |
|||
[note](https://www.tensorflow.org/install/install_mac#installing_with_anaconda) |
|||
on how to install TensorFlow in an Anaconda environment. |
|||
If you are a Windows user who is new to Python and TensorFlow, follow [this guide](Installation-Windows.md) to set up your Python environment. |
|||
If you are a Windows user who is new to Python and TensorFlow, follow [this |
|||
guide](Installation-Windows.md) to set up your Python environment. |
|||
[Download](https://www.python.org/downloads/) and install Python 3 if you do not already have it. |
|||
[Download](https://www.python.org/downloads/) and install Python 3.6 if you do not |
|||
already have it. |
|||
If your Python environment doesn't include `pip`, see these |
|||
If your Python environment doesn't include `pip3`, see these |
|||
To install dependencies, **go into the `python` subdirectory** of the repository, |
|||
and run from the command line: |
|||
To install the dependencies and `mlagents` Python package, enter the |
|||
`ml-agents/` subdirectory and run from the command line: |
|||
|
|||
```sh |
|||
pip3 install . |
|||
``` |
|||
pip3 install . |
|||
If you installed this correctly, you should be able to run |
|||
`mlagents-learn --help` |
|||
If you'd like to use Docker for ML-Agents, please follow |
|||
[this guide](Using-Docker.md). |
|||
If you'd like to use Docker for ML-Agents, please follow |
|||
[this guide](Using-Docker.md). |
|||
The [Basic Guide](Basic-Guide.md) page contains several short |
|||
tutorials on setting up the ML-Agents toolkit within Unity, running a pre-trained model, in |
|||
The [Basic Guide](Basic-Guide.md) page contains several short tutorials on |
|||
setting up the ML-Agents toolkit within Unity, running a pre-trained model, in |
|||
If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and our [Limitations](Limitations.md) pages. If you can't find anything please |
|||
If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and |
|||
our [Limitations](Limitations.md) pages. If you can't find anything please |
|||
make sure to cite relevant information on OS, Python version, and exact error |
|||
message (whenever possible). |
|||
make sure to cite relevant information on OS, Python version, and exact error |
|||
message (whenever possible). |
|
|||
# Environment Design Best Practices |
|||
|
|||
## General |
|||
* It is often helpful to start with the simplest version of the problem, to ensure the agent can learn it. From there increase |
|||
complexity over time. This can either be done manually, or via Curriculum Learning, where a set of lessons which progressively increase in difficulty are presented to the agent ([learn more here](Training-Curriculum-Learning.md)). |
|||
* When possible, it is often helpful to ensure that you can complete the task by using a Player Brain to control the agent. |
|||
* It is often helpful to make many copies of the agent, and attach the brain to be trained to all of these agents. In this way the brain can get more feedback information from all of these agents, which helps it train faster. |
|||
|
|||
* It is often helpful to start with the simplest version of the problem, to |
|||
ensure the agent can learn it. From there increase complexity over time. This |
|||
can either be done manually, or via Curriculum Learning, where a set of |
|||
lessons which progressively increase in difficulty are presented to the agent |
|||
([learn more here](Training-Curriculum-Learning.md)). |
|||
* When possible, it is often helpful to ensure that you can complete the task by |
|||
using a Player Brain to control the agent. |
|||
* It is often helpful to make many copies of the agent, and attach the Brain to |
|||
be trained to all of these agents. In this way the Brain can get more feedback |
|||
information from all of these agents, which helps it train faster. |
|||
* The magnitude of any given reward should typically not be greater than 1.0 in order to ensure a more stable learning process. |
|||
* Positive rewards are often more helpful to shaping the desired behavior of an agent than negative rewards. |
|||
* For locomotion tasks, a small positive reward (+0.1) for forward velocity is typically used. |
|||
* If you want the agent to finish a task quickly, it is often helpful to provide a small penalty every step (-0.05) that the agent does not complete the task. In this case completion of the task should also coincide with the end of the episode. |
|||
* Overly-large negative rewards can cause undesirable behavior where an agent learns to avoid any behavior which might produce the negative reward, even if it is also behavior which can eventually lead to a positive reward. |
|||
|
|||
* The magnitude of any given reward should typically not be greater than 1.0 in |
|||
order to ensure a more stable learning process. |
|||
* Positive rewards are often more helpful to shaping the desired behavior of an |
|||
agent than negative rewards. |
|||
* For locomotion tasks, a small positive reward (+0.1) for forward velocity is |
|||
typically used. |
|||
* If you want the agent to finish a task quickly, it is often helpful to provide |
|||
a small penalty every step (-0.05) that the agent does not complete the task. |
|||
In this case completion of the task should also coincide with the end of the |
|||
episode. |
|||
* Overly-large negative rewards can cause undesirable behavior where an agent |
|||
learns to avoid any behavior which might produce the negative reward, even if |
|||
it is also behavior which can eventually lead to a positive reward. |
|||
* Vector Observations should include all variables relevant to allowing the agent to take the optimally informed decision. |
|||
* In cases where Vector Observations need to be remembered or compared over time, increase the `Stacked Vectors` value to allow the agent to keep track of multiple observations into the past. |
|||
* Categorical variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (i.e. `3` > `0, 0, 1`). |
|||
* Besides encoding non-numeric values, all inputs should be normalized to be in the range 0 to +1 (or -1 to 1). For example, the `x` position information of an agent where the maximum possible value is `maxValue` should be recorded as `AddVectorObs(transform.position.x / maxValue);` rather than `AddVectorObs(transform.position.x);`. See the equation below for one approach of normalization. |
|||
* Positional information of relevant GameObjects should be encoded in relative coordinates wherever possible. This is often relative to the agent position. |
|||
|
|||
* Vector Observations should include all variables relevant to allowing the |
|||
agent to take the optimally informed decision. |
|||
* In cases where Vector Observations need to be remembered or compared over |
|||
time, increase the `Stacked Vectors` value to allow the agent to keep track of |
|||
multiple observations into the past. |
|||
* Categorical variables such as type of object (Sword, Shield, Bow) should be |
|||
encoded in one-hot fashion (i.e. `3` > `0, 0, 1`). |
|||
* Besides encoding non-numeric values, all inputs should be normalized to be in |
|||
the range 0 to +1 (or -1 to 1). For example, the `x` position information of |
|||
an agent where the maximum possible value is `maxValue` should be recorded as |
|||
`AddVectorObs(transform.position.x / maxValue);` rather than |
|||
`AddVectorObs(transform.position.x);`. See the equation below for one approach |
|||
of normalization. |
|||
* Positional information of relevant GameObjects should be encoded in relative |
|||
coordinates wherever possible. This is often relative to the agent position. |
|||
* When using continuous control, action values should be clipped to an appropriate range. The provided PPO model automatically clips these values between -1 and 1, but third party training systems may not do so. |
|||
* Be sure to set the Vector Action's Space Size to the number of used Vector Actions, and not greater, as doing the latter can interfere with the efficiency of the training process. |
|||
|
|||
* When using continuous control, action values should be clipped to an |
|||
appropriate range. The provided PPO model automatically clips these values |
|||
between -1 and 1, but third party training systems may not do so. |
|||
* Be sure to set the Vector Action's Space Size to the number of used Vector |
|||
Actions, and not greater, as doing the latter can interfere with the |
|||
efficiency of the training process. |
|
|||
# External and Internal Brains |
|||
|
|||
The **External** and **Internal** types of Brains work in different phases of training. When training your agents, set their brain types to **External**; when using the trained models, set their brain types to **Internal**. |
|||
The **External** and **Internal** types of Brains work in different phases of |
|||
training. When training your Agents, set their Brain types to **External**; when |
|||
using the trained models, set their Brain types to **Internal**. |
|||
When [running an ML-Agents training algorithm](Training-ML-Agents.md), at least one Brain object in a scene must be set to **External**. This allows the training process to collect the observations of agents using that brain and give the agents their actions. |
|||
When [running an ML-Agents training algorithm](Training-ML-Agents.md), at least |
|||
one Brain object in a scene must be set to **External**. This allows the |
|||
training process to collect the observations of Agents using that Brain and give |
|||
the Agents their actions. |
|||
In addition to using an External brain for training using the ML-Agents learning algorithms, you can use an External brain to control agents in a Unity environment using an external Python program. See [Python API](Python-API.md) for more information. |
|||
In addition to using an External Brain for training using the ML-Agents learning |
|||
algorithms, you can use an External Brain to control Agents in a Unity |
|||
environment using an external Python program. See [Python API](Python-API.md) |
|||
for more information. |
|||
Unlike the other types, the External Brain has no properties to set in the Unity Inspector window. |
|||
Unlike the other types, the External Brain has no properties to set in the Unity |
|||
Inspector window. |
|||
The Internal Brain type uses a [TensorFlow model](https://www.tensorflow.org/get_started/get_started_for_beginners#models_and_training) to make decisions. The Proximal Policy Optimization (PPO) and Behavioral Cloning algorithms included with the ML-Agents SDK produce trained TensorFlow models that you can use with the Internal Brain type. |
|||
The Internal Brain type uses a |
|||
[TensorFlow model](https://www.tensorflow.org/get_started/get_started_for_beginners#models_and_training) |
|||
to make decisions. The Proximal Policy Optimization (PPO) and Behavioral Cloning |
|||
algorithms included with the ML-Agents SDK produce trained TensorFlow models |
|||
that you can use with the Internal Brain type. |
|||
A __model__ is a mathematical relationship mapping an agent's observations to its actions. TensorFlow is a software library for performing numerical computation through data flow graphs. A TensorFlow model, then, defines the mathematical relationship between your agent's observations and its actions using a TensorFlow data flow graph. |
|||
A __model__ is a mathematical relationship mapping an agent's observations to |
|||
its actions. TensorFlow is a software library for performing numerical |
|||
computation through data flow graphs. A TensorFlow model, then, defines the |
|||
mathematical relationship between your Agent's observations and its actions |
|||
using a TensorFlow data flow graph. |
|||
The training algorithms included in the ML-Agents SDK produce TensorFlow graph models as the end result of the training process. See [Training ML-Agents](Training-ML-Agents.md) for instructions on how to train a model. |
|||
The training algorithms included in the ML-Agents SDK produce TensorFlow graph |
|||
models as the end result of the training process. See |
|||
[Training ML-Agents](Training-ML-Agents.md) for instructions on how to train a |
|||
model. |
|||
1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy GameObject and must have a Brain component.) |
|||
1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. |
|||
(The Brain GameObject must be a child of the Academy GameObject and must have |
|||
a Brain component.) |
|||
**Note:** In order to see the **Internal** Brain Type option, you must |
|||
[enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md). |
|||
3. Import the `environment_run-id.bytes` file produced by the PPO training |
|||
program. (Where `environment_run-id` is the name of the model file, which is |
|||
constructed from the name of your Unity environment executable and the run-id |
|||
value you assigned when running the training process.) |
|||
**Note:** In order to see the **Internal** Brain Type option, you must [enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md). |
|||
|
|||
3. Import the `environment_run-id.bytes` file produced by the PPO training program. (Where `environment_run-id` is the name of the model file, which is constructed from the name of your Unity environment executable and the run-id value you assigned when running the training process.) |
|||
|
|||
You can [import assets into Unity](https://docs.unity3d.com/Manual/ImportingAssets.html) in various ways. The easiest way is to simply drag the file into the **Project** window and drop it into an appropriate folder. |
|||
|
|||
4. Once the `environment.bytes` file is imported, drag it from the **Project** window to the **Graph Model** field of the Brain component. |
|||
You can |
|||
[import assets into Unity](https://docs.unity3d.com/Manual/ImportingAssets.html) |
|||
in various ways. The easiest way is to simply drag the file into the |
|||
**Project** window and drop it into an appropriate folder. |
|||
4. Once the `environment.bytes` file is imported, drag it from the **Project** |
|||
window to the **Graph Model** field of the Brain component. |
|||
If you are using a model produced by the ML-Agents `learn.py` program, use the default values for the other Internal Brain parameters. |
|||
If you are using a model produced by the ML-Agents `mlagents-learn` command, use |
|||
the default values for the other Internal Brain parameters. |
|||
The default values of the TensorFlow graph parameters work with the model produced by the PPO and BC training code in the ML-Agents SDK. To use a default ML-Agents model, the only parameter that you need to set is the `Graph Model`, which must be set to the .bytes file containing the trained model itself. |
|||
The default values of the TensorFlow graph parameters work with the model |
|||
produced by the PPO and BC training code in the ML-Agents SDK. To use a default |
|||
ML-Agents model, the only parameter that you need to set is the `Graph Model`, |
|||
which must be set to the .bytes file containing the trained model itself. |
|||
* `Graph Model` : This must be the `bytes` file corresponding to the pre-trained |
|||
TensorFlow graph. (You must first drag this file into your Resources folder |
|||
and then from the Resources folder into the inspector) |
|||
* `Graph Model` : This must be the `bytes` file corresponding to the pre-trained TensorFlow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector) |
|||
Only change the following Internal Brain properties if you have created your own |
|||
TensorFlow model and are not using an ML-Agents model: |
|||
Only change the following Internal Brain properties if you have created your own TensorFlow model and are not using an ML-Agents model: |
|||
|
|||
* `Graph Scope` : If you set a scope while training your TensorFlow model, all your placeholder name will have a prefix. You must specify that prefix here. Note that if more than one Brain were set to external during training, you must give a `Graph Scope` to the internal Brain corresponding to the name of the Brain GameObject. |
|||
* `Batch Size Node Name` : If the batch size is one of the inputs of your graph, you must specify the name if the placeholder here. The brain will make the batch size equal to the number of agents connected to the brain automatically. |
|||
* `State Node Name` : If your graph uses the state as an input, you must specify the name of the placeholder here. |
|||
* `Recurrent Input Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the input placeholder here. |
|||
* `Recurrent Output Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the output placeholder here. |
|||
* `Observation Placeholder Name` : If your graph uses observations as input, you must specify it here. Note that the number of observations is equal to the length of `Camera Resolutions` in the brain parameters. |
|||
* `Action Node Name` : Specify the name of the placeholder corresponding to the actions of the brain in your graph. If the action space type is continuous, the output must be a one dimensional tensor of float of length `Action Space Size`, if the action space type is discrete, the output must be a one dimensional tensor of int of length 1. |
|||
* `Graph Placeholder` : If your graph takes additional inputs that are fixed (example: noise level) you can specify them here. Note that in your graph, these must correspond to one dimensional tensors of int or float of size 1. |
|||
* `Name` : Corresponds to the name of the placeholder. |
|||
* `Value Type` : Either Integer or Floating Point. |
|||
* `Min Value` and `Max Value` : Specify the range of the value here. The value will be sampled from the uniform distribution ranging from `Min Value` to `Max Value` inclusive. |
|||
|
|||
* `Graph Scope` : If you set a scope while training your TensorFlow model, all |
|||
your placeholder name will have a prefix. You must specify that prefix here. |
|||
Note that if more than one Brain were set to external during training, you |
|||
must give a `Graph Scope` to the Internal Brain corresponding to the name of |
|||
the Brain GameObject. |
|||
* `Batch Size Node Name` : If the batch size is one of the inputs of your |
|||
graph, you must specify the name if the placeholder here. The Brain will make |
|||
the batch size equal to the number of Agents connected to the Brain |
|||
automatically. |
|||
* `State Node Name` : If your graph uses the state as an input, you must specify |
|||
the name of the placeholder here. |
|||
* `Recurrent Input Node Name` : If your graph uses a recurrent input / memory as |
|||
input and outputs new recurrent input / memory, you must specify the name if |
|||
the input placeholder here. |
|||
* `Recurrent Output Node Name` : If your graph uses a recurrent input / memory |
|||
as input and outputs new recurrent input / memory, you must specify the name |
|||
if the output placeholder here. |
|||
* `Observation Placeholder Name` : If your graph uses observations as input, you |
|||
must specify it here. Note that the number of observations is equal to the |
|||
length of `Camera Resolutions` in the Brain parameters. |
|||
* `Action Node Name` : Specify the name of the placeholder corresponding to the |
|||
actions of the Brain in your graph. If the action space type is continuous, |
|||
the output must be a one dimensional tensor of float of length `Action Space |
|||
Size`, if the action space type is discrete, the output must be a one |
|||
dimensional tensor of int of the same length as the `Branches` array. |
|||
* `Graph Placeholder` : If your graph takes additional inputs that are fixed |
|||
(example: noise level) you can specify them here. Note that in your graph, |
|||
these must correspond to one dimensional tensors of int or float of size 1. |
|||
* `Name` : Corresponds to the name of the placeholder. |
|||
* `Value Type` : Either Integer or Floating Point. |
|||
* `Min Value` and `Max Value` : Specify the range of the value here. The value |
|||
will be sampled from the uniform distribution ranging from `Min Value` to |
|||
`Max Value` inclusive. |
|
|||
# Heuristic Brain |
|||
|
|||
The **Heuristic** brain type allows you to hand code an agent's decision making process. A Heuristic brain requires an implementation of the Decision interface to which it delegates the decision making process. |
|||
The **Heuristic** Brain type allows you to hand code an Agent's decision making |
|||
process. A Heuristic Brain requires an implementation of the Decision interface |
|||
to which it delegates the decision making process. |
|||
When you set the **Brain Type** property of a Brain to **Heuristic**, you must add a component implementing the Decision interface to the same GameObject as the Brain. |
|||
When you set the **Brain Type** property of a Brain to **Heuristic**, you must |
|||
add a component implementing the Decision interface to the same GameObject as |
|||
the Brain. |
|||
When creating your Decision class, extend MonoBehaviour (so you can use the class as a Unity component) and extend the Decision interface. |
|||
When creating your Decision class, extend MonoBehaviour (so you can use the |
|||
class as a Unity component) and extend the Decision interface. |
|||
public class HeuristicLogic : MonoBehaviour, Decision |
|||
public class HeuristicLogic : MonoBehaviour, Decision |
|||
The Decision interface defines two methods, `Decide()` and `MakeMemory()`. |
|||
The Decision interface defines two methods, `Decide()` and `MakeMemory()`. |
|||
The `Decide()` method receives an agents current state, consisting of the agent's observations, reward, memory and other aspects of the agent's state, and must return an array containing the action that the agent should take. The format of the returned action array depends on the **Vector Action Space Type**. When using a **Continuous** action space, the action array is just a float array with a length equal to the **Vector Action Space Size** setting. When using a **Discrete** action space, the array contains just a single value. In the discrete action space, the **Space Size** value defines the number of discrete values that your `Decide()` function can return, which don't need to be consecutive integers. |
|||
The `Decide()` method receives an Agents current state, consisting of the |
|||
agent's observations, reward, memory and other aspects of the Agent's state, and |
|||
must return an array containing the action that the Agent should take. The |
|||
format of the returned action array depends on the **Vector Action Space Type**. |
|||
When using a **Continuous** action space, the action array is just a float array |
|||
with a length equal to the **Vector Action Space Size** setting. When using a |
|||
**Discrete** action space, the action array is an integer array with the same |
|||
size as the `Branches` array. In the discrete action space, the values of the |
|||
**Branches** array define the number of discrete values that your `Decide()` |
|||
function can return for each branch, which don't need to be consecutive |
|||
integers. |
|||
The `MakeMemory()` function allows you to pass data forward to the next iteration of an agent's decision making process. The array you return from `MakeMemory()` is passed to the `Decide()` function in the next iteration. You can use the memory to allow the agent's decision process to take past actions and observations into account when making the current decision. If your heuristic logic does not require memory, just return an empty array. |
|||
The `MakeMemory()` function allows you to pass data forward to the next |
|||
iteration of an Agent's decision making process. The array you return from |
|||
`MakeMemory()` is passed to the `Decide()` function in the next iteration. You |
|||
can use the memory to allow the Agent's decision process to take past actions |
|||
and observations into account when making the current decision. If your |
|||
heuristic logic does not require memory, just return an empty array. |
|
|||
# Player Brain |
|||
|
|||
The **Player** brain type allows you to control an agent using keyboard commands. You can use Player brains to control a "teacher" agent that trains other agents during [imitation learning](Training-Imitation-Learning.md). You can also use Player brains to test your agents and environment before changing their brain types to **External** and running the training process. |
|||
The **Player** Brain type allows you to control an Agent using keyboard |
|||
commands. You can use Player Brains to control a "teacher" Agent that trains |
|||
other Agents during [imitation learning](Training-Imitation-Learning.md). You |
|||
can also use Player Brains to test your Agents and environment before changing |
|||
their Brain types to **External** and running the training process. |
|||
The **Player** brain properties allow you to assign one or more keyboard keys to each action and a unique value to send when a key is pressed. |
|||
The **Player** Brain properties allow you to assign one or more keyboard keys to |
|||
each action and a unique value to send when a key is pressed. |
|||
Note the differences between the discrete and continuous action spaces. When a brain uses the discrete action space, you can send one integer value as the action per step. In contrast, when a brain uses the continuous action space you can send any number of floating point values (up to the **Vector Action Space Size** setting). |
|||
|
|||
| **Property** | | **Description** | |
|||
| :-- |:-- | :-- | |
|||
|**Continuous Player Actions**|| The mapping for the continuous vector action space. Shown when the action space is **Continuous**|. |
|||
|| **Size** | The number of key commands defined. You can assign more than one command to the same action index in order to send different values for that action. (If you press both keys at the same time, deterministic results are not guaranteed.)| |
|||
||**Element 0–N**| The mapping of keys to action values. | |
|||
|| **Key** | The key on the keyboard. | |
|||
|| **Index** | The element of the agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index).| |
|||
|| **Value** | The value to send to the agent as its action for the specified index when the mapped key is pressed. All other members of the action vector are set to 0. | |
|||
|**Discrete Player Actions**|| The mapping for the discrete vector action space. Shown when the action space is **Discrete**.| |
|||
|| **Default Action** | The value to send when no keys are pressed.| |
|||
|| **Size** | The number of key commands defined. | |
|||
||**Element 0–N**| The mapping of keys to action values. | |
|||
|| **Key** | The key on the keyboard. | |
|||
|| **Value** | The value to send to the agent as its action when the mapped key is pressed.| |
|||
Note the differences between the discrete and continuous action spaces. When a |
|||
Brain uses the discrete action space, you can send one integer value as the |
|||
action per step. In contrast, when a Brain uses the continuous action space you |
|||
can send any number of floating point values (up to the **Vector Action Space |
|||
Size** setting). |
|||
For more information about the Unity input system, see [Input](https://docs.unity3d.com/ScriptReference/Input.html). |
|||
| **Property** | | **Description** | |
|||
| :---------------------------- | :--------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
|||
| **Continuous Player Actions** | | The mapping for the continuous vector action space. Shown when the action space is **Continuous**. | |
|||
| | **Size** | The number of key commands defined. You can assign more than one command to the same action index in order to send different values for that action. (If you press both keys at the same time, deterministic results are not guaranteed.) | |
|||
| | **Element 0–N** | The mapping of keys to action values. | |
|||
| | **Key** | The key on the keyboard. | |
|||
| | **Index** | The element of the Agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index). | |
|||
| | **Value** | The value to send to the Agent as its action for the specified index when the mapped key is pressed. All other members of the action vector are set to 0. | |
|||
| **Discrete Player Actions** | | The mapping for the discrete vector action space. Shown when the action space is **Discrete**. | |
|||
| | **Size** | The number of key commands defined. | |
|||
| | **Element 0–N** | The mapping of keys to action values. | |
|||
| | **Key** | The key on the keyboard. | |
|||
| | **Branch Index** | The element of the Agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index). | |
|||
| | **Value** | The value to send to the Agent as its action when the mapped key is pressed. Cannot exceed the max value for the associated branch (minus 1, since it is an array index). | |
|||
For more information about the Unity input system, see |
|||
[Input](https://docs.unity3d.com/ScriptReference/Input.html). |
|
|||
# Reinforcement Learning in Unity |
|||
|
|||
Reinforcement learning is an artificial intelligence technique that trains _agents_ to perform tasks by rewarding desirable behavior. During reinforcement learning, an agent explores its environment, observes the state of things, and, based on those observations, takes an action. If the action leads to a better state, the agent receives a positive reward. If it leads to a less desirable state, then the agent receives no reward or a negative reward (punishment). As the agent learns during training, it optimizes its decision making so that it receives the maximum reward over time. |
|||
Reinforcement learning is an artificial intelligence technique that trains |
|||
_agents_ to perform tasks by rewarding desirable behavior. During reinforcement |
|||
learning, an agent explores its environment, observes the state of things, and, |
|||
based on those observations, takes an action. If the action leads to a better |
|||
state, the agent receives a positive reward. If it leads to a less desirable |
|||
state, then the agent receives no reward or a negative reward (punishment). As |
|||
the agent learns during training, it optimizes its decision making so that it |
|||
receives the maximum reward over time. |
|||
The ML-Agents toolkit uses a reinforcement learning technique called [Proximal Policy Optimization (PPO)](https://blog.openai.com/openai-baselines-ppo/). PPO uses a neural network to approximate the ideal function that maps an agent's observations to the best action an agent can take in a given state. The ML-Agents PPO algorithm is implemented in TensorFlow and runs in a separate Python process (communicating with the running Unity application over a socket). |
|||
The ML-Agents toolkit uses a reinforcement learning technique called |
|||
[Proximal Policy Optimization (PPO)](https://blog.openai.com/openai-baselines-ppo/). |
|||
PPO uses a neural network to approximate the ideal function that maps an agent's |
|||
observations to the best action an agent can take in a given state. The |
|||
ML-Agents PPO algorithm is implemented in TensorFlow and runs in a separate |
|||
Python process (communicating with the running Unity application over a socket). |
|||
**Note:** if you aren't studying machine and reinforcement learning as a subject and just want to train agents to accomplish tasks, you can treat PPO training as a _black box_. There are a few training-related parameters to adjust inside Unity as well as on the Python training side, but you do not need in-depth knowledge of the algorithm itself to successfully create and train agents. Step-by-step procedures for running the training process are provided in the [Training section](Training-ML-Agents.md). |
|||
**Note:** if you aren't studying machine and reinforcement learning as a subject |
|||
and just want to train agents to accomplish tasks, you can treat PPO training as |
|||
a _black box_. There are a few training-related parameters to adjust inside |
|||
Unity as well as on the Python training side, but you do not need in-depth |
|||
knowledge of the algorithm itself to successfully create and train agents. |
|||
Step-by-step procedures for running the training process are provided in the |
|||
[Training section](Training-ML-Agents.md). |
|||
Training and simulation proceed in steps orchestrated by the ML-Agents Academy class. The Academy works with Agent and Brain objects in the scene to step through the simulation. When either the Academy has reached its maximum number of steps or all agents in the scene are _done_, one training episode is finished. |
|||
Training and simulation proceed in steps orchestrated by the ML-Agents Academy |
|||
class. The Academy works with Agent and Brain objects in the scene to step |
|||
through the simulation. When either the Academy has reached its maximum number |
|||
of steps or all Agents in the scene are _done_, one training episode is |
|||
finished. |
|||
During training, the external Python training process communicates with the Academy to run a series of episodes while it collects data and optimizes its neural network model. The type of Brain assigned to an agent determines whether it participates in training or not. The **External** brain communicates with the external process to train the TensorFlow model. When training is completed successfully, you can add the trained model file to your Unity project for use with an **Internal** brain. |
|||
During training, the external Python training process communicates with the |
|||
Academy to run a series of episodes while it collects data and optimizes its |
|||
neural network model. The type of Brain assigned to an Agent determines whether |
|||
it participates in training or not. The **External** Brain communicates with the |
|||
external process to train the TensorFlow model. When training is completed |
|||
successfully, you can add the trained model file to your Unity project for use |
|||
with an **Internal** Brain. |
|||
2. Calls the `AgentReset()` function for each agent in the scene. |
|||
3. Calls the `CollectObservations()` function for each agent in the scene. |
|||
4. Uses each agent's Brain class to decide on the agent's next action. |
|||
5. Calls your subclass's `AcademyAct()` function. |
|||
6. Calls the `AgentAction()` function for each agent in the scene, passing in the action chosen by the agent's brain. (This function is not called if the agent is done.) |
|||
7. Calls the agent's `AgentOnDone()` function if the agent has reached its `Max Step` count or has otherwise marked itself as `done`. Optionally, you can set an agent to restart if it finishes before the end of an episode. In this case, the Academy calls the `AgentReset()` function. |
|||
8. When the Academy reaches its own `Max Step` count, it starts the next episode again by calling your Academy subclass's `AcademyReset()` function. |
|||
2. Calls the `AgentReset()` function for each Agent in the scene. |
|||
3. Calls the `CollectObservations()` function for each Agent in the scene. |
|||
4. Uses each Agent's Brain class to decide on the Agent's next action. |
|||
5. Calls your subclass's `AcademyStep()` function. |
|||
6. Calls the `AgentAction()` function for each Agent in the scene, passing in |
|||
the action chosen by the Agent's Brain. (This function is not called if the |
|||
Agent is done.) |
|||
7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max |
|||
Step` count or has otherwise marked itself as `done`. Optionally, you can set |
|||
an Agent to restart if it finishes before the end of an episode. In this |
|||
case, the Academy calls the `AgentReset()` function. |
|||
8. When the Academy reaches its own `Max Step` count, it starts the next episode |
|||
again by calling your Academy subclass's `AcademyReset()` function. |
|||
To create a training environment, extend the Academy and Agent classes to implement the above methods. The `Agent.CollectObservations()` and `Agent.AgentAction()` functions are required; the other methods are optional — whether you need to implement them or not depends on your specific scenario. |
|||
|
|||
**Note:** The API used by the Python PPO training process to communicate with and control the Academy during training can be used for other purposes as well. For example, you could use the API to use Unity as the simulation engine for your own machine learning algorithms. See [Python API](Python-API.md) for more information. |
|||
To create a training environment, extend the Academy and Agent classes to |
|||
implement the above methods. The `Agent.CollectObservations()` and |
|||
`Agent.AgentAction()` functions are required; the other methods are optional — |
|||
whether you need to implement them or not depends on your specific scenario. |
|||
|
|||
**Note:** The API used by the Python PPO training process to communicate with |
|||
and control the Academy during training can be used for other purposes as well. |
|||
For example, you could use the API to use Unity as the simulation engine for |
|||
your own machine learning algorithms. See [Python API](Python-API.md) for more |
|||
information. |
|||
To train and use the ML-Agents toolkit in a Unity scene, the scene must contain a single Academy subclass along with as many Brain objects and Agent subclasses as you need. Any Brain instances in the scene must be attached to GameObjects that are children of the Academy in the Unity Scene Hierarchy. Agent instances should be attached to the GameObject representing that agent. |
|||
To train and use the ML-Agents toolkit in a Unity scene, the scene must contain |
|||
a single Academy subclass along with as many Brain objects and Agent subclasses |
|||
as you need. Any Brain instances in the scene must be attached to GameObjects |
|||
that are children of the Academy in the Unity Scene Hierarchy. Agent instances |
|||
should be attached to the GameObject representing that Agent. |
|||
You must assign a brain to every agent, but you can share brains between multiple agents. Each agent will make its own observations and act independently, but will use the same decision-making logic and, for **Internal** brains, the same trained TensorFlow model. |
|||
You must assign a Brain to every Agent, but you can share Brains between |
|||
multiple Agents. Each Agent will make its own observations and act |
|||
independently, but will use the same decision-making logic and, for **Internal** |
|||
Brains, the same trained TensorFlow model. |
|||
The Academy object orchestrates agents and their decision making processes. Only place a single Academy object in a scene. |
|||
The Academy object orchestrates Agents and their decision making processes. Only |
|||
place a single Academy object in a scene. |
|||
You must create a subclass of the Academy class (since the base class is abstract). When you create your Academy subclass, you can implement the following methods (all are optional): |
|||
You must create a subclass of the Academy class (since the base class is |
|||
abstract). When you create your Academy subclass, you can implement the |
|||
following methods (all are optional): |
|||
* `AcademyReset()` — Prepare the environment and agents for the next training episode. Use this function to place and initialize entities in the scene as necessary. |
|||
* `AcademyStep()` — Prepare the environment for the next simulation step. The base Academy class calls this function before calling any `AgentAction()` methods for the current step. You can use this function to update other objects in the scene before the agents take their actions. Note that the agents have already collected their observations and chosen an action before the Academy invokes this method. |
|||
* `AcademyReset()` — Prepare the environment and Agents for the next training |
|||
episode. Use this function to place and initialize entities in the scene as |
|||
necessary. |
|||
* `AcademyStep()` — Prepare the environment for the next simulation step. The |
|||
base Academy class calls this function before calling any `AgentAction()` |
|||
methods for the current step. You can use this function to update other |
|||
objects in the scene before the Agents take their actions. Note that the |
|||
Agents have already collected their observations and chosen an action before |
|||
the Academy invokes this method. |
|||
|
|||
The base Academy classes also defines several important properties that you can |
|||
set in the Unity Editor Inspector. For training, the most important of these |
|||
properties is `Max Steps`, which determines how long each training episode |
|||
lasts. Once the Academy's step counter reaches this value, it calls the |
|||
`AcademyReset()` function to start the next episode. |
|||
The base Academy classes also defines several important properties that you can set in the Unity Editor Inspector. For training, the most important of these properties is `Max Steps`, which determines how long each training episode lasts. Once the Academy's step counter reaches this value, it calls the `AcademyReset()` function to start the next episode. |
|||
|
|||
See [Academy](Learning-Environment-Design-Academy.md) for a complete list of the Academy properties and their uses. |
|||
See [Academy](Learning-Environment-Design-Academy.md) for a complete list of |
|||
the Academy properties and their uses. |
|||
|
|||
The Brain encapsulates the decision making process. Brain objects must be children of the Academy in the Unity scene hierarchy. Every Agent must be assigned a Brain, but you can use the same Brain with more than one Agent. |
|||
Use the Brain class directly, rather than a subclass. Brain behavior is determined by the brain type. During training, set your agent's brain type to **External**. To use the trained model, import the model file into the Unity project and change the brain type to **Internal**. See [Brains](Learning-Environment-Design-Brains.md) for details on using the different types of brains. You can extend the CoreBrain class to create different brain types if the four built-in types don't do what you need. |
|||
The Brain encapsulates the decision making process. Brain objects must be |
|||
children of the Academy in the Unity scene hierarchy. Every Agent must be |
|||
assigned a Brain, but you can use the same Brain with more than one Agent. |
|||
The Brain class has several important properties that you can set using the Inspector window. These properties must be appropriate for the agents using the brain. For example, the `Vector Observation Space Size` property must match the length of the feature vector created by an agent exactly. See [Agents](Learning-Environment-Design-Agents.md) for information about creating agents and setting up a Brain instance correctly. |
|||
Use the Brain class directly, rather than a subclass. Brain behavior is |
|||
determined by the Brain type. During training, set your Agent's Brain type to |
|||
**External**. To use the trained model, import the model file into the Unity |
|||
project and change the Brain type to **Internal**. See |
|||
[Brains](Learning-Environment-Design-Brains.md) for details on using the |
|||
different types of Brains. You can extend the CoreBrain class to create |
|||
different Brain types if the four built-in types don't do what you need. |
|||
|
|||
The Brain class has several important properties that you can set using the |
|||
Inspector window. These properties must be appropriate for the Agents using the |
|||
Brain. For example, the `Vector Observation Space Size` property must match the |
|||
length of the feature vector created by an Agent exactly. See |
|||
[Agents](Learning-Environment-Design-Agents.md) for information about creating |
|||
agents and setting up a Brain instance correctly. |
|||
See [Brains](Learning-Environment-Design-Brains.md) for a complete list of the Brain properties. |
|||
See [Brains](Learning-Environment-Design-Brains.md) for a complete list of the |
|||
Brain properties. |
|||
The Agent class represents an actor in the scene that collects observations and carries out actions. The Agent class is typically attached to the GameObject in the scene that otherwise represents the actor — for example, to a player object in a football game or a car object in a vehicle simulation. Every Agent must be assigned a Brain. |
|||
The Agent class represents an actor in the scene that collects observations and |
|||
carries out actions. The Agent class is typically attached to the GameObject in |
|||
the scene that otherwise represents the actor — for example, to a player object |
|||
in a football game or a car object in a vehicle simulation. Every Agent must be |
|||
assigned a Brain. |
|||
|
|||
To create an Agent, extend the Agent class and implement the essential |
|||
`CollectObservations()` and `AgentAction()` methods: |
|||
To create an agent, extend the Agent class and implement the essential `CollectObservations()` and `AgentAction()` methods: |
|||
* `CollectObservations()` — Collects the Agent's observation of its environment. |
|||
* `AgentAction()` — Carries out the action chosen by the Agent's Brain and |
|||
assigns a reward to the current state. |
|||
* `CollectObservations()` — Collects the agent's observation of its environment. |
|||
* `AgentAction()` — Carries out the action chosen by the agent's brain and assigns a reward to the current state. |
|||
Your implementations of these functions determine how the properties of the |
|||
Brain assigned to this Agent must be set. |
|||
Your implementations of these functions determine how the properties of the Brain assigned to this agent must be set. |
|||
|
|||
You must also determine how an Agent finishes its task or times out. You can manually set an agent to done in your `AgentAction()` function when the agent has finished (or irrevocably failed) its task. You can also set the agent's `Max Steps` property to a positive value and the agent will consider itself done after it has taken that many steps. When the Academy reaches its own `Max Steps` count, it starts the next episode. If you set an agent's `ResetOnDone` property to true, then the agent can attempt its task several times in one episode. (Use the `Agent.AgentReset()` function to prepare the agent to start again.) |
|||
You must also determine how an Agent finishes its task or times out. You can |
|||
manually set an Agent to done in your `AgentAction()` function when the Agent |
|||
has finished (or irrevocably failed) its task. You can also set the Agent's `Max |
|||
Steps` property to a positive value and the Agent will consider itself done |
|||
after it has taken that many steps. When the Academy reaches its own `Max Steps` |
|||
count, it starts the next episode. If you set an Agent's `ResetOnDone` property |
|||
to true, then the Agent can attempt its task several times in one episode. (Use |
|||
the `Agent.AgentReset()` function to prepare the Agent to start again.) |
|||
See [Agents](Learning-Environment-Design-Agents.md) for detailed information about programing your own agents. |
|||
See [Agents](Learning-Environment-Design-Agents.md) for detailed information |
|||
about programing your own Agents. |
|||
An _environment_ in the ML-Agents toolkit can be any scene built in Unity. The Unity scene provides the environment in which agents observe, act, and learn. How you set up the Unity scene to serve as a learning environment really depends on your goal. You may be trying to solve a specific reinforcement learning problem of limited scope, in which case you can use the same scene for both training and for testing trained agents. Or, you may be training agents to operate in a complex game or simulation. In this case, it might be more efficient and practical to create a purpose-built training scene. |
|||
|
|||
Both training and testing (or normal game) scenes must contain an Academy object to control the agent decision making process. The Academy defines several properties that can be set differently for a training scene versus a regular scene. The Academy's **Configuration** properties control rendering and time scale. You can set the **Training Configuration** to minimize the time Unity spends rendering graphics in order to speed up training. You may need to adjust the other functional, Academy settings as well. For example, `Max Steps` should be as short as possible for training — just long enough for the agent to accomplish its task, with some extra time for "wandering" while it learns. In regular scenes, you often do not want the Academy to reset the scene at all; if so, `Max Steps` should be set to zero. |
|||
An _environment_ in the ML-Agents toolkit can be any scene built in Unity. The |
|||
Unity scene provides the environment in which agents observe, act, and learn. |
|||
How you set up the Unity scene to serve as a learning environment really depends |
|||
on your goal. You may be trying to solve a specific reinforcement learning |
|||
problem of limited scope, in which case you can use the same scene for both |
|||
training and for testing trained agents. Or, you may be training agents to |
|||
operate in a complex game or simulation. In this case, it might be more |
|||
efficient and practical to create a purpose-built training scene. |
|||
When you create a training environment in Unity, you must set up the scene so that it can be controlled by the external training process. Considerations include: |
|||
Both training and testing (or normal game) scenes must contain an Academy object |
|||
to control the agent decision making process. The Academy defines several |
|||
properties that can be set differently for a training scene versus a regular |
|||
scene. The Academy's **Configuration** properties control rendering and time |
|||
scale. You can set the **Training Configuration** to minimize the time Unity |
|||
spends rendering graphics in order to speed up training. You may need to adjust |
|||
the other functional, Academy settings as well. For example, `Max Steps` should |
|||
be as short as possible for training — just long enough for the agent to |
|||
accomplish its task, with some extra time for "wandering" while it learns. In |
|||
regular scenes, you often do not want the Academy to reset the scene at all; if |
|||
so, `Max Steps` should be set to zero. |
|||
* The training scene must start automatically when your Unity application is launched by the training process. |
|||
* The scene must include at least one **External** brain. |
|||
* The Academy must reset the scene to a valid starting point for each episode of training. |
|||
* A training episode must have a definite end — either using `Max Steps` or by each agent setting itself to `done`. |
|||
When you create a training environment in Unity, you must set up the scene so |
|||
that it can be controlled by the external training process. Considerations |
|||
include: |
|||
* The training scene must start automatically when your Unity application is |
|||
launched by the training process. |
|||
* The scene must include at least one **External** Brain. |
|||
* The Academy must reset the scene to a valid starting point for each episode of |
|||
training. |
|||
* A training episode must have a definite end — either using `Max Steps` or by |
|||
each Agent setting itself to `done`. |
|
|||
# Limitations |
|||
# Limitations |
|||
|
|||
If you enable Headless mode, you will not be able to collect visual |
|||
observations from your agents. |
|||
|
|||
If you enable Headless mode, you will not be able to collect visual observations |
|||
from your agents. |
|||
Currently the speed of the game physics can only be increased to 100x |
|||
real-time. The Academy also moves in time with FixedUpdate() rather than |
|||
Update(), so game behavior implemented in Update() may be out of sync with the Agent decision making. See [Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html) for more information. |
|||
|
|||
Currently the speed of the game physics can only be increased to 100x real-time. |
|||
The Academy also moves in time with FixedUpdate() rather than Update(), so game |
|||
behavior implemented in Update() may be out of sync with the agent decision |
|||
making. See |
|||
[Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html) |
|||
for more information. |
|||
As of version 0.3, we no longer support Python 2. |
|||
|
|||
As of version 0.3, we no longer support Python 2. |
|||
|
|||
### TensorFlow support |
|||
### Tensorflow support |
|||
Currently the Ml-Agents toolkit uses TensorFlow 1.7.1 due to the version of the TensorFlowSharp plugin we are using. |
|||
Currently the Ml-Agents toolkit uses TensorFlow 1.7.1 due to the version of the |
|||
TensorFlowSharp plugin we are using. |
|
|||
# Python API |
|||
# Unity ML-Agents Python Interface and Trainers |
|||
The ML-Agents toolkit provides a Python API for controlling the agent simulation loop of a environment or game built with Unity. This API is used by the ML-Agent training algorithms (run with `learn.py`), but you can also write your Python programs using this API. |
|||
The `mlagents` Python package is part of the [ML-Agents |
|||
Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents` provides a |
|||
Python API that allows direct interaction with the Unity game engine as well as |
|||
a collection of trainers and algorithms to train agents in Unity environments. |
|||
The key objects in the Python API include: |
|||
The `mlagents` Python package contains two components: a low level API which |
|||
allows you to interact directly with a Unity Environment (`mlagents.envs`) and |
|||
an entry point to train (`mlagents-learn`) which allows you to train agents in |
|||
Unity Environments using our implementations of reinforcement learning or |
|||
imitation learning. |
|||
* **UnityEnvironment** — the main interface between the Unity application and your code. Use UnityEnvironment to start and control a simulation or training session. |
|||
* **BrainInfo** — contains all the data from agents in the simulation, such as observations and rewards. |
|||
* **BrainParameters** — describes the data elements in a BrainInfo object. For example, provides the array length of an observation in BrainInfo. |
|||
## mlagents.envs |
|||
These classes are all defined in the `python/unityagents` folder of the ML-Agents SDK. |
|||
The ML-Agents Toolkit provides a Python API for controlling the Agent simulation |
|||
loop of an environment or game built with Unity. This API is used by the |
|||
training algorithms inside the ML-Agent Toolkit, but you can also write your own |
|||
Python programs using this API. |
|||
To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature). |
|||
The key objects in the Python API include: |
|||
For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook (`python/Basics.ipynb`), which opens an environment, runs a few simulation steps taking random actions, and closes the environment. |
|||
- **UnityEnvironment** — the main interface between the Unity application and |
|||
your code. Use UnityEnvironment to start and control a simulation or training |
|||
session. |
|||
- **BrainInfo** — contains all the data from Agents in the simulation, such as |
|||
observations and rewards. |
|||
- **BrainParameters** — describes the data elements in a BrainInfo object. For |
|||
example, provides the array length of an observation in BrainInfo. |
|||
_Notice: Currently communication between Unity and Python takes place over an open socket without authentication. As such, please make sure that the network where training takes place is secure. This will be addressed in a future release._ |
|||
These classes are all defined in the `ml-agents/mlagents/envs` folder of |
|||
the ML-Agents SDK. |
|||
## Loading a Unity Environment |
|||
To communicate with an Agent in a Unity environment from a Python program, the |
|||
Agent must either use an **External** Brain or use a Brain that is broadcasting |
|||
(has its **Broadcast** property set to true). Your code is expected to return |
|||
actions for Agents with external Brains, but can only observe broadcasting |
|||
Brains (the information you receive for an Agent is the same in both cases). |
|||
Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. For example, if the filename of your Unity environment is 3DBall.app, in python, run: |
|||
_Notice: Currently communication between Unity and Python takes place over an |
|||
open socket without authentication. As such, please make sure that the network |
|||
where training takes place is secure. This will be addressed in a future |
|||
release._ |
|||
|
|||
### Loading a Unity Environment |
|||
|
|||
Python-side communication happens through `UnityEnvironment` which is located in |
|||
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary |
|||
file, put the file in the same directory as `envs`. For example, if the filename |
|||
of your Unity environment is 3DBall.app, in python, run: |
|||
from unityagents import UnityEnvironment |
|||
from mlagents.env import UnityEnvironment |
|||
* `file_name` is the name of the environment binary (located in the root directory of the python project). |
|||
* `worker_id` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C. |
|||
* `seed` indicates the seed to use when generating random numbers during the training process. In environments which do not involve physics calculations, setting the seed enables reproducible experimentation by ensuring that the environment and trainers utilize the same random seed. |
|||
- `file_name` is the name of the environment binary (located in the root |
|||
directory of the python project). |
|||
- `worker_id` indicates which port to use for communication with the |
|||
environment. For use in parallel training regimes such as A3C. |
|||
- `seed` indicates the seed to use when generating random numbers during the |
|||
training process. In environments which do not involve physics calculations, |
|||
setting the seed enables reproducible experimentation by ensuring that the |
|||
environment and trainers utilize the same random seed. |
|||
If you want to directly interact with the Editor, you need to use `file_name=None`, then press the :arrow_forward: button in the Editor when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen |
|||
If you want to directly interact with the Editor, you need to use |
|||
`file_name=None`, then press the :arrow_forward: button in the Editor when the |
|||
message _"Start training by pressing the Play button in the Unity Editor"_ is |
|||
displayed on the screen |
|||
## Interacting with a Unity Environment |
|||
### Interacting with a Unity Environment |
|||
* **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of the list corresponds to the n<sup>th</sup> observation of the brain. |
|||
* **`vector_observations`** : A two dimensional numpy array of dimension `(batch size, vector observation size)` if the vector observation space is continuous and `(batch size, 1)` if the vector observation space is discrete. |
|||
* **`text_observations`** : A list of string corresponding to the agents text observations. |
|||
* **`memories`** : A two dimensional numpy array of dimension `(batch size, memory size)` which corresponds to the memories sent at the previous step. |
|||
* **`rewards`** : A list as long as the number of agents using the brain containing the rewards they each obtained at the previous step. |
|||
* **`local_done`** : A list as long as the number of agents using the brain containing `done` flags (whether or not the agent is done). |
|||
* **`max_reached`** : A list as long as the number of agents using the brain containing true if the agents reached their max steps. |
|||
* **`agents`** : A list of the unique ids of the agents using the brain. |
|||
* **`previous_actions`** : A two dimensional numpy array of dimension `(batch size, vector action size)` if the vector action space is continuous and `(batch size, 1)` if the vector action space is discrete. |
|||
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of |
|||
the list corresponds to the n<sup>th</sup> observation of the Brain. |
|||
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch |
|||
size, vector observation size)`. |
|||
- **`text_observations`** : A list of string corresponding to the Agents text |
|||
observations. |
|||
- **`memories`** : A two dimensional numpy array of dimension `(batch size, |
|||
memory size)` which corresponds to the memories sent at the previous step. |
|||
- **`rewards`** : A list as long as the number of Agents using the Brain |
|||
containing the rewards they each obtained at the previous step. |
|||
- **`local_done`** : A list as long as the number of Agents using the Brain |
|||
containing `done` flags (whether or not the Agent is done). |
|||
- **`max_reached`** : A list as long as the number of Agents using the Brain |
|||
containing true if the Agents reached their max steps. |
|||
- **`agents`** : A list of the unique ids of the Agents using the Brain. |
|||
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch |
|||
size, vector action size)` if the vector action space is continuous and |
|||
`(batch size, number of branches)` if the vector action space is discrete. |
|||
Once loaded, you can use your UnityEnvironment object, which referenced by a variable named `env` in this example, can be used in the following way: |
|||
Once loaded, you can use your UnityEnvironment object, which referenced by a |
|||
variable named `env` in this example, can be used in the following way: |
|||
|
|||
Prints all parameters relevant to the loaded environment and the external brains. |
|||
Prints all parameters relevant to the loaded environment and the external |
|||
Brains. |
|||
Send a reset signal to the environment, and provides a dictionary mapping brain names to BrainInfo objects. |
|||
- `train_model` indicates whether to run the environment in train (`True`) or test (`False`) mode. |
|||
- `config` is an optional dictionary of configuration flags specific to the environment. For generic environments, `config` can be ignored. `config` is a dictionary of strings to floats where the keys are the names of the `resetParameters` and the values are their corresponding float values. Define the reset parameters on the [Academy Inspector](Learning-Environment-Design-Academy.md#academy-properties) window in the Unity Editor. |
|||
Send a reset signal to the environment, and provides a dictionary mapping |
|||
Brain names to BrainInfo objects. |
|||
- `train_model` indicates whether to run the environment in train (`True`) or |
|||
test (`False`) mode. |
|||
- `config` is an optional dictionary of configuration flags specific to the |
|||
environment. For generic environments, `config` can be ignored. `config` is |
|||
a dictionary of strings to floats where the keys are the names of the |
|||
`resetParameters` and the values are their corresponding float values. |
|||
Define the reset parameters on the Academy Inspector window in the Unity |
|||
Editor. |
|||
Sends a step signal to the environment using the actions. For each brain : |
|||
- `action` can be one dimensional arrays or two dimensional arrays if you have multiple agents per brains. |
|||
- `memory` is an optional input that can be used to send a list of floats per agents to be retrieved at the next step. |
|||
- `text_action` is an optional input that be used to send a single string per agent. |
|||
Sends a step signal to the environment using the actions. For each Brain : |
|||
- `action` can be one dimensional arrays or two dimensional arrays if you have |
|||
multiple Agents per Brain. |
|||
- `memory` is an optional input that can be used to send a list of floats per |
|||
Agents to be retrieved at the next step. |
|||
- `text_action` is an optional input that be used to send a single string per |
|||
Agent. |
|||
Returns a dictionary mapping brain names to BrainInfo objects. |
|||
|
|||
For example, to access the BrainInfo belonging to a brain called 'brain_name', and the BrainInfo field 'vector_observations': |
|||
Returns a dictionary mapping Brain names to BrainInfo objects. |
|||
|
|||
For example, to access the BrainInfo belonging to a Brain called |
|||
'brain_name', and the BrainInfo field 'vector_observations': |
|||
|
|||
``` |
|||
``` |
|||
Note that if you have more than one external brain in the environment, you must provide dictionaries from brain names to arrays for `action`, `memory` and `value`. For example: If you have two external brains named `brain1` and `brain2` each with one agent taking two continuous actions, then you can have: |
|||
Note that if you have more than one external Brain in the environment, you |
|||
must provide dictionaries from Brain names to arrays for `action`, `memory` |
|||
and `value`. For example: If you have two external Brains named `brain1` and |
|||
`brain2` each with one Agent taking two continuous actions, then you can |
|||
have: |
|||
|
|||
Returns a dictionary mapping brain names to BrainInfo objects. |
|||
- **Close : `env.close()`** |
|||
Sends a shutdown signal to the environment and closes the communication socket. |
|||
Returns a dictionary mapping Brain names to BrainInfo objects. |
|||
- **Close : `env.close()`** |
|||
Sends a shutdown signal to the environment and closes the communication |
|||
socket. |
|||
|
|||
## mlagents-learn |
|||
|
|||
For more detailed documentation on using `mlagents-learn`, check out |
|||
[Training ML-Agents](Training-ML-Agents.md) |
|
|||
# Unity ML-Agents Toolkit Documentation |
|||
|
|||
## Installation & Set-up |
|||
* [Installation](Installation.md) |
|||
* [Background: Jupyter Notebooks](Background-Jupyter.md) |
|||
* [Docker Set-up](Using-Docker.md) |
|||
* [Basic Guide](Basic-Guide.md) |
|||
|
|||
* [Installation](Installation.md) |
|||
* [Background: Jupyter Notebooks](Background-Jupyter.md) |
|||
* [Docker Set-up](Using-Docker.md) |
|||
* [Basic Guide](Basic-Guide.md) |
|||
* [ML-Agents Toolkit Overview](ML-Agents-Overview.md) |
|||
* [Background: Unity](Background-Unity.md) |
|||
* [Background: Machine Learning](Background-Machine-Learning.md) |
|||
* [Background: TensorFlow](Background-TensorFlow.md) |
|||
* [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md) |
|||
* [Example Environments](Learning-Environment-Examples.md) |
|||
|
|||
* [ML-Agents Toolkit Overview](ML-Agents-Overview.md) |
|||
* [Background: Unity](Background-Unity.md) |
|||
* [Background: Machine Learning](Background-Machine-Learning.md) |
|||
* [Background: TensorFlow](Background-TensorFlow.md) |
|||
* [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md) |
|||
* [Example Environments](Learning-Environment-Examples.md) |
|||
* [Making a New Learning Environment](Learning-Environment-Create-New.md) |
|||
* [Designing a Learning Environment](Learning-Environment-Design.md) |
|||
* [Agents](Learning-Environment-Design-Agents.md) |
|||
* [Academy](Learning-Environment-Design-Academy.md) |
|||
* [Brains](Learning-Environment-Design-Brains.md): [Player](Learning-Environment-Design-Player-Brains.md), [Heuristic](Learning-Environment-Design-Heuristic-Brains.md), [Internal & External](Learning-Environment-Design-External-Internal-Brains.md) |
|||
* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md) |
|||
* [Using the Monitor](Feature-Monitor.md) |
|||
* [Using an Executable Environment](Learning-Environment-Executable.md) |
|||
* [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md) |
|||
|
|||
|
|||
* [Making a New Learning Environment](Learning-Environment-Create-New.md) |
|||
* [Designing a Learning Environment](Learning-Environment-Design.md) |
|||
* [Agents](Learning-Environment-Design-Agents.md) |
|||
* [Academy](Learning-Environment-Design-Academy.md) |
|||
* [Brains](Learning-Environment-Design-Brains.md): |
|||
[Player](Learning-Environment-Design-Player-Brains.md), |
|||
[Heuristic](Learning-Environment-Design-Heuristic-Brains.md), |
|||
[Internal & External](Learning-Environment-Design-External-Internal-Brains.md) |
|||
* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md) |
|||
* [Using the Monitor](Feature-Monitor.md) |
|||
* [Using an Executable Environment](Learning-Environment-Executable.md) |
|||
* [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md) |
|||
|
|||
* [Training ML-Agents](Training-ML-Agents.md) |
|||
* [Training with Proximal Policy Optimization](Training-PPO.md) |
|||
* [Training with Curriculum Learning](Training-Curriculum-Learning.md) |
|||
* [Training with Imitation Learning](Training-Imitation-Learning.md) |
|||
* [Training with LSTM](Feature-Memory.md) |
|||
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md) |
|||
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md) |
|||
* [Using TensorBoard to Observe Training](Using-Tensorboard.md) |
|||
|
|||
* [Training ML-Agents](Training-ML-Agents.md) |
|||
* [Training with Proximal Policy Optimization](Training-PPO.md) |
|||
* [Training with Curriculum Learning](Training-Curriculum-Learning.md) |
|||
* [Training with Imitation Learning](Training-Imitation-Learning.md) |
|||
* [Training with LSTM](Feature-Memory.md) |
|||
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md) |
|||
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md) |
|||
* [Using TensorBoard to Observe Training](Using-Tensorboard.md) |
|||
* [Migrating from earlier versions of ML-Agents](Migrating.md) |
|||
* [Frequently Asked Questions](FAQ.md) |
|||
* [ML-Agents Glossary](Glossary.md) |
|||
* [Limitations](Limitations.md) |
|||
|
|||
|
|||
* [Migrating from earlier versions of ML-Agents](Migrating.md) |
|||
* [Frequently Asked Questions](FAQ.md) |
|||
* [ML-Agents Glossary](Glossary.md) |
|||
* [Limitations](Limitations.md) |
|||
|
|||
* [API Reference](API-Reference.md) |
|||
* [How to use the Python API](Python-API.md) |
|||
|
|||
* [API Reference](API-Reference.md) |
|||
* [How to use the Python API](Python-API.md) |
|||
* [Wrapping Learning Environment as a Gym](../gym-unity/README.md) |
|
|||
# Imitation Learning |
|||
|
|||
It is often more intuitive to simply demonstrate the behavior we want an agent to perform, rather than attempting to have it learn via trial-and-error methods. Consider our [running example](ML-Agents-Overview.md#running-example-training-npc-behaviors) of training a medic NPC : instead of indirectly training a medic with the help of a reward function, we can give the medic real world examples of observations from the game and actions from a game controller to guide the medic's behavior. More specifically, in this mode, the Brain type during training is set to Player and all the actions performed with the controller (in addition to the agent observations) will be recorded and sent to the Python API. The imitation learning algorithm will then use these pairs of observations and actions from the human player to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs). |
|||
It is often more intuitive to simply demonstrate the behavior we want an agent |
|||
to perform, rather than attempting to have it learn via trial-and-error methods. |
|||
Consider our |
|||
[running example](ML-Agents-Overview.md#running-example-training-npc-behaviors) |
|||
of training a medic NPC : instead of indirectly training a medic with the help |
|||
of a reward function, we can give the medic real world examples of observations |
|||
from the game and actions from a game controller to guide the medic's behavior. |
|||
More specifically, in this mode, the Brain type during training is set to Player |
|||
and all the actions performed with the controller (in addition to the agent |
|||
observations) will be recorded and sent to the Python API. The imitation |
|||
learning algorithm will then use these pairs of observations and actions from |
|||
the human player to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs). |
|||
There are a variety of possible imitation learning algorithms which can be used, the simplest one of them is Behavioral Cloning. It works by collecting training data from a teacher, and then simply uses it to directly learn a policy, in the same way the supervised learning for image classification or other traditional Machine Learning tasks work. |
|||
There are a variety of possible imitation learning algorithms which can be used, |
|||
the simplest one of them is Behavioral Cloning. It works by collecting training |
|||
data from a teacher, and then simply uses it to directly learn a policy, in the |
|||
same way the supervised learning for image classification or other traditional |
|||
Machine Learning tasks work. |
|||
1. In order to use imitation learning in a scene, the first thing you will need is to create two Brains, one which will be the "Teacher," and the other which will be the "Student." We will assume that the names of the brain `GameObject`s are "Teacher" and "Student" respectively. |
|||
2. Set the "Teacher" brain to Player mode, and properly configure the inputs to map to the corresponding actions. **Ensure that "Broadcast" is checked within the Brain inspector window.** |
|||
3. Set the "Student" brain to External mode. |
|||
4. Link the brains to the desired agents (one agent as the teacher and at least one agent as a student). |
|||
5. In `trainer_config.yaml`, add an entry for the "Student" brain. Set the `trainer` parameter of this entry to `imitation`, and the `brain_to_imitate` parameter to the name of the teacher brain: "Teacher". Additionally, set `batches_per_epoch`, which controls how much training to do each moment. Increase the `max_steps` option if you'd like to keep training the agents for a longer period of time. |
|||
6. Launch the training process with `python3 python/learn.py --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen |
|||
7. From the Unity window, control the agent with the Teacher brain by providing "teacher demonstrations" of the behavior you would like to see. |
|||
8. Watch as the agent(s) with the student brain attached begin to behave similarly to the demonstrations. |
|||
9. Once the Student agents are exhibiting the desired behavior, end the training process with `CTL+C` from the command line. |
|||
10. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the Assets folder (or a subdirectory within Assets of your choosing) , and use with `Internal` brain. |
|||
1. In order to use imitation learning in a scene, the first thing you will need |
|||
is to create two Brains, one which will be the "Teacher," and the other which |
|||
will be the "Student." We will assume that the names of the Brain |
|||
`GameObject`s are "Teacher" and "Student" respectively. |
|||
2. Set the "Teacher" Brain to Player mode, and properly configure the inputs to |
|||
map to the corresponding actions. **Ensure that "Broadcast" is checked within |
|||
the Brain inspector window.** |
|||
3. Set the "Student" Brain to External mode. |
|||
4. Link the Brains to the desired Agents (one Agent as the teacher and at least |
|||
one Agent as a student). |
|||
5. In `config/trainer_config.yaml`, add an entry for the "Student" Brain. Set |
|||
the `trainer` parameter of this entry to `imitation`, and the |
|||
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher". |
|||
Additionally, set `batches_per_epoch`, which controls how much training to do |
|||
each moment. Increase the `max_steps` option if you'd like to keep training |
|||
the Agents for a longer period of time. |
|||
6. Launch the training process with `mlagents-learn config/trainer_config.yaml |
|||
--train --slow`, and press the :arrow_forward: button in Unity when the |
|||
message _"Start training by pressing the Play button in the Unity Editor"_ is |
|||
displayed on the screen |
|||
7. From the Unity window, control the Agent with the Teacher Brain by providing |
|||
"teacher demonstrations" of the behavior you would like to see. |
|||
8. Watch as the Agent(s) with the student Brain attached begin to behave |
|||
similarly to the demonstrations. |
|||
9. Once the Student Agents are exhibiting the desired behavior, end the training |
|||
process with `CTL+C` from the command line. |
|||
10. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the |
|||
Assets folder (or a subdirectory within Assets of your choosing) , and use |
|||
with `Internal` Brain. |
|||
We provide a convenience utility, `BC Teacher Helper` component that you can add to the Teacher Agent. |
|||
We provide a convenience utility, `BC Teacher Helper` component that you can add |
|||
to the Teacher Agent. |
|||
<img src="images/bc_teacher_helper.png" |
|||
alt="BC Teacher Helper" |
|||
width="375" border="10" /> |
|||
<img src="images/bc_teacher_helper.png" |
|||
alt="BC Teacher Helper" |
|||
width="375" border="10" /> |
|||
1. To start and stop recording experiences. This is useful in case you'd like to interact with the game _but not have the agents learn from these interactions_. The default command to toggle this is to press `R` on the keyboard. |
|||
1. To start and stop recording experiences. This is useful in case you'd like to |
|||
interact with the game _but not have the agents learn from these |
|||
interactions_. The default command to toggle this is to press `R` on the |
|||
keyboard. |
|||
2. Reset the training buffer. This enables you to instruct the agents to forget their buffer of recent experiences. This is useful if you'd like to get them to quickly learn a new behavior. The default command to reset the buffer is to press `C` on the keyboard. |
|||
|
|||
2. Reset the training buffer. This enables you to instruct the agents to forget |
|||
their buffer of recent experiences. This is useful if you'd like to get them |
|||
to quickly learn a new behavior. The default command to reset the buffer is |
|||
to press `C` on the keyboard. |
|
|||
# Training with Proximal Policy Optimization |
|||
|
|||
ML-Agents uses a reinforcement learning technique called [Proximal Policy Optimization (PPO)](https://blog.openai.com/openai-baselines-ppo/). PPO uses a neural network to approximate the ideal function that maps an agent's observations to the best action an agent can take in a given state. The ML-Agents PPO algorithm is implemented in TensorFlow and runs in a separate Python process (communicating with the running Unity application over a socket). |
|||
ML-Agents uses a reinforcement learning technique called |
|||
[Proximal Policy Optimization (PPO)](https://blog.openai.com/openai-baselines-ppo/). |
|||
PPO uses a neural network to approximate the ideal function that maps an agent's |
|||
observations to the best action an agent can take in a given state. The |
|||
ML-Agents PPO algorithm is implemented in TensorFlow and runs in a separate |
|||
Python process (communicating with the running Unity application over a socket). |
|||
See [Training ML-Agents](Training-ML-Agents.md) for instructions on running the training program, `learn.py`. |
|||
See [Training ML-Agents](Training-ML-Agents.md) for instructions on running the |
|||
training program, `learn.py`. |
|||
If you are using the recurrent neural network (RNN) to utilize memory, see [Using Recurrent Neural Networks](Feature-Memory.md) for RNN-specific training details. |
|||
If you are using the recurrent neural network (RNN) to utilize memory, see |
|||
[Using Recurrent Neural Networks](Feature-Memory.md) for RNN-specific training |
|||
details. |
|||
If you are using curriculum training to pace the difficulty of the learning task presented to an agent, see [Training with Curriculum Learning](Training-Curriculum-Learning.md). |
|||
If you are using curriculum training to pace the difficulty of the learning task |
|||
presented to an agent, see [Training with Curriculum |
|||
Learning](Training-Curriculum-Learning.md). |
|||
For information about imitation learning, which uses a different training algorithm, see [Training with Imitation Learning](Training-Imitation-Learning.md). |
|||
For information about imitation learning, which uses a different training |
|||
algorithm, see |
|||
[Training with Imitation Learning](Training-Imitation-Learning.md). |
|||
Successfully training a Reinforcement Learning model often involves tuning the training hyperparameters. This guide contains some best practices for tuning the training process when the default parameters don't seem to be giving the level of performance you would like. |
|||
Successfully training a Reinforcement Learning model often involves tuning the |
|||
training hyperparameters. This guide contains some best practices for tuning the |
|||
training process when the default parameters don't seem to be giving the level |
|||
of performance you would like. |
|||
#### Gamma |
|||
### Gamma |
|||
`gamma` corresponds to the discount factor for future rewards. This can be thought of as how far into the future the agent should care about possible rewards. In situations when the agent should be acting in the present in order to prepare for rewards in the distant future, this value should be large. In cases when rewards are more immediate, it can be smaller. |
|||
`gamma` corresponds to the discount factor for future rewards. This can be |
|||
thought of as how far into the future the agent should care about possible |
|||
rewards. In situations when the agent should be acting in the present in order |
|||
to prepare for rewards in the distant future, this value should be large. In |
|||
cases when rewards are more immediate, it can be smaller. |
|||
#### Lambda |
|||
### Lambda |
|||
`lambd` corresponds to the `lambda` parameter used when calculating the Generalized Advantage Estimate ([GAE](https://arxiv.org/abs/1506.02438)). This can be thought of as how much the agent relies on its current value estimate when calculating an updated value estimate. Low values correspond to relying more on the current value estimate (which can be high bias), and high values correspond to relying more on the actual rewards received in the environment (which can be high variance). The parameter provides a trade-off between the two, and the right value can lead to a more stable training process. |
|||
`lambd` corresponds to the `lambda` parameter used when calculating the |
|||
Generalized Advantage Estimate ([GAE](https://arxiv.org/abs/1506.02438)). This |
|||
can be thought of as how much the agent relies on its current value estimate |
|||
when calculating an updated value estimate. Low values correspond to relying |
|||
more on the current value estimate (which can be high bias), and high values |
|||
correspond to relying more on the actual rewards received in the environment |
|||
(which can be high variance). The parameter provides a trade-off between the |
|||
two, and the right value can lead to a more stable training process. |
|||
#### Buffer Size |
|||
### Buffer Size |
|||
`buffer_size` corresponds to how many experiences (agent observations, actions and rewards obtained) should be collected before we do any |
|||
learning or updating of the model. **This should be a multiple of `batch_size`**. Typically larger `buffer_size` correspond to more stable training updates. |
|||
`buffer_size` corresponds to how many experiences (agent observations, actions |
|||
and rewards obtained) should be collected before we do any learning or updating |
|||
of the model. **This should be a multiple of `batch_size`**. Typically a larger |
|||
`buffer_size` corresponds to more stable training updates. |
|||
#### Batch Size |
|||
### Batch Size |
|||
`batch_size` is the number of experiences used for one iteration of a gradient descent update. **This should always be a fraction of the |
|||
`buffer_size`**. If you are using a continuous action space, this value should be large (in the order of 1000s). If you are using a discrete action space, this value |
|||
should be smaller (in order of 10s). |
|||
`batch_size` is the number of experiences used for one iteration of a gradient |
|||
descent update. **This should always be a fraction of the `buffer_size`**. If |
|||
you are using a continuous action space, this value should be large (in the |
|||
order of 1000s). If you are using a discrete action space, this value should be |
|||
smaller (in order of 10s). |
|||
### Number of Epochs |
|||
#### Number of Epochs |
|||
|
|||
`num_epoch` is the number of passes through the experience buffer during gradient descent. The larger the `batch_size`, the |
|||
larger it is acceptable to make this. Decreasing this will ensure more stable updates, at the cost of slower learning. |
|||
`num_epoch` is the number of passes through the experience buffer during |
|||
gradient descent. The larger the `batch_size`, the larger it is acceptable to |
|||
make this. Decreasing this will ensure more stable updates, at the cost of |
|||
slower learning. |
|||
### Learning Rate |
|||
#### Learning Rate |
|||
|
|||
`learning_rate` corresponds to the strength of each gradient descent update step. This should typically be decreased if |
|||
training is unstable, and the reward does not consistently increase. |
|||
`learning_rate` corresponds to the strength of each gradient descent update |
|||
step. This should typically be decreased if training is unstable, and the reward |
|||
does not consistently increase. |
|||
|
|||
#### Time Horizon |
|||
### Time Horizon |
|||
`time_horizon` corresponds to how many steps of experience to collect per-agent before adding it to the experience buffer. |
|||
When this limit is reached before the end of an episode, a value estimate is used to predict the overall expected reward from the agent's current state. |
|||
As such, this parameter trades off between a less biased, but higher variance estimate (long time horizon) and more biased, but less varied estimate (short time horizon). |
|||
In cases where there are frequent rewards within an episode, or episodes are prohibitively large, a smaller number can be more ideal. |
|||
This number should be large enough to capture all the important behavior within a sequence of an agent's actions. |
|||
`time_horizon` corresponds to how many steps of experience to collect per-agent |
|||
before adding it to the experience buffer. When this limit is reached before the |
|||
end of an episode, a value estimate is used to predict the overall expected |
|||
reward from the agent's current state. As such, this parameter trades off |
|||
between a less biased, but higher variance estimate (long time horizon) and more |
|||
biased, but less varied estimate (short time horizon). In cases where there are |
|||
frequent rewards within an episode, or episodes are prohibitively large, a |
|||
smaller number can be more ideal. This number should be large enough to capture |
|||
all the important behavior within a sequence of an agent's actions. |
|||
#### Max Steps |
|||
### Max Steps |
|||
`max_steps` corresponds to how many steps of the simulation (multiplied by frame-skip) are run during the training process. This value should be increased for more complex problems. |
|||
`max_steps` corresponds to how many steps of the simulation (multiplied by |
|||
frame-skip) are run during the training process. This value should be increased |
|||
for more complex problems. |
|||
#### Beta |
|||
### Beta |
|||
`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that agents properly explore the action space during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`. |
|||
`beta` corresponds to the strength of the entropy regularization, which makes |
|||
the policy "more random." This ensures that agents properly explore the action |
|||
space during training. Increasing this will ensure more random actions are |
|||
taken. This should be adjusted such that the entropy (measurable from |
|||
TensorBoard) slowly decreases alongside increases in reward. If entropy drops |
|||
too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`. |
|||
|
|||
#### Epsilon |
|||
### Epsilon |
|||
`epsilon` corresponds to the acceptable threshold of divergence between the old and new policies during gradient descent updating. Setting this value small will result in more stable updates, but will also slow the training process. |
|||
`epsilon` corresponds to the acceptable threshold of divergence between the old |
|||
and new policies during gradient descent updating. Setting this value small will |
|||
result in more stable updates, but will also slow the training process. |
|||
#### Normalize |
|||
### Normalize |
|||
`normalize` corresponds to whether normalization is applied to the vector observation inputs. This normalization is based on the running average and variance of the vector observation. |
|||
Normalization can be helpful in cases with complex continuous control problems, but may be harmful with simpler discrete control problems. |
|||
`normalize` corresponds to whether normalization is applied to the vector |
|||
observation inputs. This normalization is based on the running average and |
|||
variance of the vector observation. Normalization can be helpful in cases with |
|||
complex continuous control problems, but may be harmful with simpler discrete |
|||
control problems. |
|||
#### Number of Layers |
|||
### Number of Layers |
|||
`num_layers` corresponds to how many hidden layers are present after the observation input, or after the CNN encoding of the visual observation. For simple problems, |
|||
fewer layers are likely to train faster and more efficiently. More layers may be necessary for more complex control problems. |
|||
`num_layers` corresponds to how many hidden layers are present after the |
|||
observation input, or after the CNN encoding of the visual observation. For |
|||
simple problems, fewer layers are likely to train faster and more efficiently. |
|||
More layers may be necessary for more complex control problems. |
|||
#### Hidden Units |
|||
### Hidden Units |
|||
`hidden_units` correspond to how many units are in each fully connected layer of the neural network. For simple problems |
|||
where the correct action is a straightforward combination of the observation inputs, this should be small. For problems where |
|||
the action is a very complex interaction between the observation variables, this should be larger. |
|||
`hidden_units` correspond to how many units are in each fully connected layer of |
|||
the neural network. For simple problems where the correct action is a |
|||
straightforward combination of the observation inputs, this should be small. For |
|||
problems where the action is a very complex interaction between the observation |
|||
variables, this should be larger. |
|||
### (Optional) Recurrent Neural Network Hyperparameters |
|||
## (Optional) Recurrent Neural Network Hyperparameters |
|||
#### Sequence Length |
|||
### Sequence Length |
|||
`sequence_length` corresponds to the length of the sequences of experience passed through the network during training. This should be long enough to capture whatever information your agent might need to remember over time. For example, if your agent needs to remember the velocity of objects, then this can be a small value. If your agent needs to remember a piece of information given only once at the beginning of an episode, then this should be a larger value. |
|||
`sequence_length` corresponds to the length of the sequences of experience |
|||
passed through the network during training. This should be long enough to |
|||
capture whatever information your agent might need to remember over time. For |
|||
example, if your agent needs to remember the velocity of objects, then this can |
|||
be a small value. If your agent needs to remember a piece of information given |
|||
only once at the beginning of an episode, then this should be a larger value. |
|||
#### Memory Size |
|||
### Memory Size |
|||
`memory_size` corresponds to the size of the array of floating point numbers used to store the hidden state of the recurrent neural network. This value must be a multiple of 4, and should scale with the amount of information you expect the agent will need to remember in order to successfully complete the task. |
|||
`memory_size` corresponds to the size of the array of floating point numbers |
|||
used to store the hidden state of the recurrent neural network. This value must |
|||
be a multiple of 4, and should scale with the amount of information you expect |
|||
the agent will need to remember in order to successfully complete the task. |
|||
### (Optional) Intrinsic Curiosity Module Hyperparameters |
|||
## (Optional) Intrinsic Curiosity Module Hyperparameters |
|||
#### Curioisty Encoding Size |
|||
### Curioisty Encoding Size |
|||
`curiosity_enc_size` corresponds to the size of the hidden layer used to encode the observations within the intrinsic curiosity module. This value should be small enough to encourage the curiosity module to compress the original observation, but also not too small to prevent it from learning the dynamics of the environment. |
|||
`curiosity_enc_size` corresponds to the size of the hidden layer used to encode |
|||
the observations within the intrinsic curiosity module. This value should be |
|||
small enough to encourage the curiosity module to compress the original |
|||
observation, but also not too small to prevent it from learning the dynamics of |
|||
the environment. |
|||
#### Curiosity Strength |
|||
### Curiosity Strength |
|||
`curiosity_strength` corresponds to the magnitude of the intrinsic reward generated by the intrinsic curiosity module. This should be scaled in order to ensure it is large enough to not be overwhelmed by extrnisic reward signals in the environment. Likewise it should not be too large to overwhelm the extrinsic reward signal. |
|||
`curiosity_strength` corresponds to the magnitude of the intrinsic reward |
|||
generated by the intrinsic curiosity module. This should be scaled in order to |
|||
ensure it is large enough to not be overwhelmed by extrnisic reward signals in |
|||
the environment. Likewise it should not be too large to overwhelm the extrinsic |
|||
reward signal. |
|||
To view training statistics, use TensorBoard. For information on launching and using TensorBoard, see [here](./Getting-Started-with-Balance-Ball.md#observing-training-progress). |
|||
To view training statistics, use TensorBoard. For information on launching and |
|||
using TensorBoard, see |
|||
[here](./Getting-Started-with-Balance-Ball.md#observing-training-progress). |
|||
#### Cumulative Reward |
|||
### Cumulative Reward |
|||
The general trend in reward should consistently increase over time. Small ups and downs are to be expected. Depending on the complexity of the task, a significant increase in reward may not present itself until millions of steps into the training process. |
|||
The general trend in reward should consistently increase over time. Small ups |
|||
and downs are to be expected. Depending on the complexity of the task, a |
|||
significant increase in reward may not present itself until millions of steps |
|||
into the training process. |
|||
#### Entropy |
|||
### Entropy |
|||
This corresponds to how random the decisions of a brain are. This should consistently decrease during training. If it decreases too soon or not at all, `beta` should be adjusted (when using discrete action space). |
|||
This corresponds to how random the decisions of a Brain are. This should |
|||
consistently decrease during training. If it decreases too soon or not at all, |
|||
`beta` should be adjusted (when using discrete action space). |
|||
#### Learning Rate |
|||
### Learning Rate |
|||
#### Policy Loss |
|||
### Policy Loss |
|||
These values will oscillate during training. Generally they should be less than 1.0. |
|||
These values will oscillate during training. Generally they should be less than |
|||
1.0. |
|||
#### Value Estimate |
|||
### Value Estimate |
|||
These values should increase as the cumulative reward increases. They correspond to how much future reward the agent predicts itself receiving at any given point. |
|||
These values should increase as the cumulative reward increases. They correspond |
|||
to how much future reward the agent predicts itself receiving at any given |
|||
point. |
|||
#### Value Loss |
|||
### Value Loss |
|||
These values will increase as the reward increases, and then should decrease once reward becomes stable. |
|||
These values will increase as the reward increases, and then should decrease |
|||
once reward becomes stable. |
|
|||
# Using Docker For ML-Agents |
|||
|
|||
We currently offer a solution for Windows and Mac users who would like to do training or inference using Docker. This option may be appealing to those who would like to avoid installing Python and TensorFlow themselves. The current setup forces both TensorFlow and Unity to _only_ rely on the CPU for computations. Consequently, our Docker simulation does not use a GPU and uses [`Xvfb`](https://en.wikipedia.org/wiki/Xvfb) to do visual rendering. `Xvfb` is a utility that enables `ML-Agents` (or any other application) to do rendering virtually i.e. it does not assume that the machine running `ML-Agents` has a GPU or a display attached to it. This means that rich environments which involve agents using camera-based visual observations might be slower. |
|||
We currently offer a solution for Windows and Mac users who would like to do |
|||
training or inference using Docker. This option may be appealing to those who |
|||
would like to avoid installing Python and TensorFlow themselves. The current |
|||
setup forces both TensorFlow and Unity to _only_ rely on the CPU for |
|||
computations. Consequently, our Docker simulation does not use a GPU and uses |
|||
[`Xvfb`](https://en.wikipedia.org/wiki/Xvfb) to do visual rendering. `Xvfb` is a |
|||
utility that enables `ML-Agents` (or any other application) to do rendering |
|||
virtually i.e. it does not assume that the machine running `ML-Agents` has a GPU |
|||
or a display attached to it. This means that rich environments which involve |
|||
agents using camera-based visual observations might be slower. |
|||
## Requirements |
|||
## Requirements |
|||
- [Download](https://unity3d.com/get-unity/download) the Unity Installer and |
|||
add the _Linux Build Support_ Component |
|||
- [Download](https://unity3d.com/get-unity/download) the Unity Installer and add |
|||
the _Linux Build Support_ Component |
|||
- [Download](https://www.docker.com/community-edition#/download) and |
|||
install Docker if you don't have it setup on your machine. |
|||
- [Download](https://www.docker.com/community-edition#/download) and install |
|||
Docker if you don't have it setup on your machine. |
|||
- Since Docker runs a container in an environment that is isolated from the host machine, a mounted directory in your host machine is used to share data, e.g. the Unity executable, curriculum files and TensorFlow graph. For convenience, we created an empty `unity-volume` directory at the root of the repository for this purpose, but feel free to use any other directory. The remainder of this guide assumes that the `unity-volume` directory is the one used. |
|||
- Since Docker runs a container in an environment that is isolated from the host |
|||
machine, a mounted directory in your host machine is used to share data, e.g. |
|||
the trainer configuration file, Unity executable, curriculum files and |
|||
TensorFlow graph. For convenience, we created an empty `unity-volume` |
|||
directory at the root of the repository for this purpose, but feel free to use |
|||
any other directory. The remainder of this guide assumes that the |
|||
`unity-volume` directory is the one used. |
|||
Using Docker for ML-Agents involves three steps: building the Unity environment with specific flags, building a Docker container and, finally, running the container. If you are not familiar with building a Unity environment for ML-Agents, please read through our [Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md) guide first. |
|||
Using Docker for ML-Agents involves three steps: building the Unity environment |
|||
with specific flags, building a Docker container and, finally, running the |
|||
container. If you are not familiar with building a Unity environment for |
|||
ML-Agents, please read through our [Getting Started with the 3D Balance Ball |
|||
Example](Getting-Started-with-Balance-Ball.md) guide first. |
|||
|
|||
Since Docker typically runs a container sharing a (linux) kernel with the host machine, the |
|||
Unity environment **has** to be built for the **linux platform**. When building a Unity environment, please select the following options from the the Build Settings window: |
|||
Since Docker typically runs a container sharing a (linux) kernel with the host |
|||
machine, the Unity environment **has** to be built for the **linux platform**. |
|||
When building a Unity environment, please select the following options from the |
|||
the Build Settings window: |
|||
|
|||
- If the environment does not contain visual observations, you can select the `headless` option here. |
|||
- If the environment does not contain visual observations, you can select the |
|||
`headless` option here. |
|||
Then click `Build`, pick an environment name (e.g. `3DBall`) and set the output directory to `unity-volume`. After building, ensure that the file `<environment-name>.x86_64` and subdirectory `<environment-name>_Data/` are created under `unity-volume`. |
|||
Then click `Build`, pick an environment name (e.g. `3DBall`) and set the output |
|||
directory to `unity-volume`. After building, ensure that the file |
|||
`<environment-name>.x86_64` and subdirectory `<environment-name>_Data/` are |
|||
created under `unity-volume`. |
|||
First, make sure the Docker engine is running on your machine. Then build the Docker container by calling the following command at the top-level of the repository: |
|||
First, make sure the Docker engine is running on your machine. Then build the |
|||
Docker container by calling the following command at the top-level of the |
|||
repository: |
|||
``` |
|||
```sh |
|||
``` |
|||
Replace `<image-name>` with a name for the Docker image, e.g. `balance.ball.v0.1`. |
|||
``` |
|||
**Note** if you modify hyperparameters in `trainer_config.yaml` you will have to build a new Docker Container before running. |
|||
Replace `<image-name>` with a name for the Docker image, e.g. |
|||
`balance.ball.v0.1`. |
|||
Run the Docker container by calling the following command at the top-level of the repository: |
|||
Run the Docker container by calling the following command at the top-level of |
|||
the repository: |
|||
``` |
|||
```sh |
|||
<image-name>:latest <environment-name> \ |
|||
<image-name>:latest \ |
|||
<trainer-config-file> \ |
|||
--env=<environment-name> \ |
|||
- `<container-name>` is used to identify the container (in case you want to interrupt and terminate it). This is optional and Docker will generate a random name if this is not set. _Note that this must be unique for every run of a Docker image._ |
|||
|
|||
- `<container-name>` is used to identify the container (in case you want to |
|||
interrupt and terminate it). This is optional and Docker will generate a |
|||
random name if this is not set. _Note that this must be unique for every run |
|||
of a Docker image._ |
|||
- `<environemnt-name>` __(Optional)__: If you are training with a linux executable, this is the name of the executable. If you are training in the Editor, do not pass a `<environemnt-name>` argument and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen. |
|||
- `source`: Reference to the path in your host OS where you will store the Unity executable. |
|||
- `target`: Tells Docker to mount the `source` path as a disk with this name. |
|||
- `docker-target-name`: Tells the ML-Agents Python package what the name of the disk where it can read the Unity executable and store the graph. **This should therefore be identical to `target`.** |
|||
- `train`, `run-id`: ML-Agents arguments passed to `learn.py`. `train` trains the algorithm, `run-id` is used to tag each experiment with a unique identifier. |
|||
- `<environment-name>` __(Optional)__: If you are training with a linux |
|||
executable, this is the name of the executable. If you are training in the |
|||
Editor, do not pass a `<environment-name>` argument and press the |
|||
:arrow_forward: button in Unity when the message _"Start training by pressing |
|||
the Play button in the Unity Editor"_ is displayed on the screen. |
|||
- `source`: Reference to the path in your host OS where you will store the Unity |
|||
executable. |
|||
- `target`: Tells Docker to mount the `source` path as a disk with this name. |
|||
- `docker-target-name`: Tells the ML-Agents Python package what the name of the |
|||
disk where it can read the Unity executable and store the graph. **This should |
|||
therefore be identical to `target`.** |
|||
- `trainer-config-file`, `train`, `run-id`: ML-Agents arguments passed to |
|||
`mlagents-learn`. `trainer-config-file` is the filename of the trainer config |
|||
file, `train` trains the algorithm, and `run-id` is used to tag each |
|||
experiment with a unique identifier. We recommend placing the trainer-config |
|||
file inside `unity-volume` so that the container has access to the file. |
|||
``` |
|||
```sh |
|||
trainer_config.yaml \ |
|||
--env=3DBall |
|||
For more detail on Docker mounts, check out [these](https://docs.docker.com/storage/bind-mounts/) docs from Docker. |
|||
|
|||
For more detail on Docker mounts, check out |
|||
[these](https://docs.docker.com/storage/bind-mounts/) docs from Docker. |
|||
If you are satisfied with the training progress, you can stop the Docker container while saving state by either using `Ctrl+C` or `⌘+C` (Mac) or by using the following command: |
|||
If you are satisfied with the training progress, you can stop the Docker |
|||
container while saving state by either using `Ctrl+C` or `⌘+C` (Mac) or by using |
|||
the following command: |
|||
``` |
|||
```sh |
|||
`<container-name>` is the name of the container specified in the earlier `docker run` command. If you didn't specify one, you can find the randomly generated identifier by running `docker container ls`. |
|||
`<container-name>` is the name of the container specified in the earlier `docker |
|||
run` command. If you didn't specify one, you can find the randomly generated |
|||
identifier by running `docker container ls`. |
|
|||
# Using TensorBoard to Observe Training |
|||
|
|||
The ML-Agents toolkit saves statistics during learning session that you can view with a TensorFlow utility named, [TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard). |
|||
The ML-Agents toolkit saves statistics during learning session that you can view |
|||
with a TensorFlow utility named, |
|||
[TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard). |
|||
The `learn.py` program saves training statistics to a folder named `summaries`, organized by the `run-id` value you assign to a training session. |
|||
The `mlagents-learn` command saves training statistics to a folder named |
|||
`summaries`, organized by the `run-id` value you assign to a training session. |
|||
In order to observe the training process, either during training or afterward, |
|||
In order to observe the training process, either during training or afterward, |
|||
2. Navigate to the ml-agents/python folder. |
|||
2. Navigate to the directory where the ML-Agents Toolkit is installed. |
|||
tensorboard --logdir=summaries |
|||
```sh |
|||
tensorboard --logdir=summaries |
|||
``` |
|||
**Note:** If you don't assign a `run-id` identifier, `learn.py` uses the default string, "ppo". All the statistics will be saved to the same sub-folder and displayed as one session in TensorBoard. After a few runs, the displays can become difficult to interpret in this situation. You can delete the folders under the `summaries` directory to clear out old statistics. |
|||
**Note:** If you don't assign a `run-id` identifier, `mlagents-learn` uses the |
|||
default string, "ppo". All the statistics will be saved to the same sub-folder |
|||
and displayed as one session in TensorBoard. After a few runs, the displays can |
|||
become difficult to interpret in this situation. You can delete the folders |
|||
under the `summaries` directory to clear out old statistics. |
|||
On the left side of the TensorBoard window, you can select which of the training runs you want to display. You can select multiple run-ids to compare statistics. The TensorBoard window also provides options for how to display and smooth graphs. |
|||
|
|||
When you run the training program, `learn.py`, you can use the `--save-freq` option to specify how frequently to save the statistics. |
|||
On the left side of the TensorBoard window, you can select which of the training |
|||
runs you want to display. You can select multiple run-ids to compare statistics. |
|||
The TensorBoard window also provides options for how to display and smooth |
|||
graphs. |
|||
|
|||
When you run the training program, `mlagents-learn`, you can use the |
|||
`--save-freq` option to specify how frequently to save the statistics. |
|||
The ML-agents training program saves the following statistics: |
|||
The ML-Agents training program saves the following statistics: |
|||
* Lesson - Plots the progress from lesson to lesson. Only interesting when performing |
|||
[curriculum training](Training-Curriculum-Learning.md). |
|||
* Lesson - Plots the progress from lesson to lesson. Only interesting when |
|||
performing [curriculum training](Training-Curriculum-Learning.md). |
|||
* Cumulative Reward - The mean cumulative episode reward over all agents. |
|||
Should increase during a successful training session. |
|||
* Cumulative Reward - The mean cumulative episode reward over all agents. Should |
|||
increase during a successful training session. |
|||
* Entropy - How random the decisions of the model are. Should slowly decrease |
|||
during a successful training process. If it decreases too quickly, the `beta` |
|||
hyperparameter should be increased. |
|||
* Entropy - How random the decisions of the model are. Should slowly decrease |
|||
during a successful training process. If it decreases too quickly, the `beta` |
|||
hyperparameter should be increased. |
|||
* Episode Length - The mean length of each episode in the environment for all |
|||
agents. |
|||
* Episode Length - The mean length of each episode in the environment for all |
|||
agents. |
|||
* Learning Rate - How large a step the training algorithm takes as it searches |
|||
for the optimal policy. Should decrease over time. |
|||
* Learning Rate - How large a step the training algorithm takes as it searches |
|||
for the optimal policy. Should decrease over time. |
|||
much the policy (process for deciding actions) is changing. The magnitude of |
|||
this should decrease during a successful training session. |
|||
much the policy (process for deciding actions) is changing. The magnitude of |
|||
this should decrease during a successful training session. |
|||
* Value Estimate - The mean value estimate for all states visited by the agent. |
|||
Should increase during a successful training session. |
|||
* Value Estimate - The mean value estimate for all states visited by the agent. |
|||
Should increase during a successful training session. |
|||
well the model is able to predict the value of each state. This should increase |
|||
while the agent is learning, and then decrease once the reward stabilizes. |
|||
well the model is able to predict the value of each state. This should |
|||
increase while the agent is learning, and then decrease once the reward |
|||
stabilizes. |
|||
* _(Curiosity-Specific)_ Intrinsic Reward - This corresponds to the mean cumulative intrinsic reward generated per-episode. |
|||
* _(Curiosity-Specific)_ Intrinsic Reward - This corresponds to the mean |
|||
cumulative intrinsic reward generated per-episode. |
|||
* _(Curiosity-Specific)_ Forward Loss - The mean magnitude of the inverse model loss function. Corresponds to how well the model is able to predict the new observation encoding. |
|||
* _(Curiosity-Specific)_ Forward Loss - The mean magnitude of the inverse model |
|||
loss function. Corresponds to how well the model is able to predict the new |
|||
observation encoding. |
|||
* _(Curiosity-Specific)_ Inverse Loss - The mean magnitude of the forward model loss function. Corresponds to how well the model is able to predict the action taken between two observations. |
|||
* _(Curiosity-Specific)_ Inverse Loss - The mean magnitude of the forward model |
|||
loss function. Corresponds to how well the model is able to predict the action |
|||
taken between two observations. |
|
|||
from .environment import * |
|||
from .brain import * |
|||
from .exception import * |
|||
from .curriculum import * |
|
|||
m_EditorVersion: 2018.1.0f2 |
|||
m_EditorVersion: 2017.4.10f1 |
|
|||
fileFormatVersion: 2 |
|||
guid: 8b23992c8eb17439887f5e944bf04a40 |
|||
timeCreated: 1504070347 |
|||
licenseType: Free |
|||
MonoImporter: |
|||
serializedVersion: 2 |
|||
defaultReferences: [] |
|||
executionOrder: 0 |
|||
icon: {instanceID: 0} |
|||
userData: |
|||
assetBundleName: |
|||
assetBundleVariant: |
部分文件因为文件数量过多而无法显示
撰写
预览
正在加载...
取消
保存
Reference in new issue