浏览代码

Merge pull request #1223 from Unity-Technologies/release-v0.5

Release v0.5
/hotfix-v0.9.2a
GitHub 6 年前
当前提交
25495874
共有 313 个文件被更改,包括 6778 次插入3623 次删除
  1. 58
      .gitignore
  2. 5
      CODE_OF_CONDUCT.md
  3. 60
      CONTRIBUTING.md
  4. 11
      Dockerfile
  5. 201
      LICENSE
  6. 106
      README.md
  7. 29
      docs/API-Reference.md
  8. 15
      docs/Background-Jupyter.md
  9. 301
      docs/Background-Machine-Learning.md
  10. 74
      docs/Background-TensorFlow.md
  11. 12
      docs/Background-Unity.md
  12. 242
      docs/Basic-Guide.md
  13. 136
      docs/FAQ.md
  14. 57
      docs/Feature-Memory.md
  15. 50
      docs/Feature-Monitor.md
  16. 393
      docs/Getting-Started-with-Balance-Ball.md
  17. 68
      docs/Glossary.md
  18. 251
      docs/Installation-Windows.md
  19. 90
      docs/Installation.md
  20. 64
      docs/Learning-Environment-Best-Practices.md
  21. 363
      docs/Learning-Environment-Create-New.md
  22. 55
      docs/Learning-Environment-Design-Academy.md
  23. 469
      docs/Learning-Environment-Design-Agents.md
  24. 106
      docs/Learning-Environment-Design-Brains.md
  25. 118
      docs/Learning-Environment-Design-External-Internal-Brains.md
  26. 34
      docs/Learning-Environment-Design-Heuristic-Brains.md
  27. 47
      docs/Learning-Environment-Design-Player-Brains.md
  28. 203
      docs/Learning-Environment-Design.md
  29. 381
      docs/Learning-Environment-Examples.md
  30. 219
      docs/Learning-Environment-Executable.md
  31. 27
      docs/Limitations.md
  32. 703
      docs/ML-Agents-Overview.md
  33. 135
      docs/Migrating.md
  34. 160
      docs/Python-API.md
  35. 80
      docs/Readme.md
  36. 143
      docs/Training-Curriculum-Learning.md
  37. 78
      docs/Training-Imitation-Learning.md
  38. 228
      docs/Training-ML-Agents.md
  39. 218
      docs/Training-PPO.md
  40. 118
      docs/Training-on-Amazon-Web-Service.md
  41. 112
      docs/Training-on-Microsoft-Azure-Custom-Instance.md
  42. 107
      docs/Training-on-Microsoft-Azure.md
  43. 117
      docs/Using-Docker.md
  44. 179
      docs/Using-TensorFlow-Sharp-in-Unity.md
  45. 79
      docs/Using-Tensorboard.md
  46. 8
      docs/dox-ml-agents.conf
  47. 611
      docs/images/banner.png
  48. 129
      docs/images/player_brain.png
  49. 79
      docs/images/scene-hierarchy.png
  50. 309
      docs/images/unity-logo-rgb.png
  51. 5
      docs/localized/zh-CN/README.md
  52. 26
      docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
  53. 4
      docs/localized/zh-CN/docs/Installation.md
  54. 4
      docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
  55. 14
      docs/localized/zh-CN/docs/Learning-Environment-Design.md
  56. 42
      docs/localized/zh-CN/docs/Learning-Environment-Examples.md
  57. 2
      docs/localized/zh-CN/docs/ML-Agents-Overview.md
  58. 42
      notebooks/getting-started.ipynb
  59. 10
      ml-agents/mlagents/envs/communicator_objects/unity_to_external_pb2_grpc.py
  60. 18
      ml-agents/mlagents/envs/communicator_objects/unity_to_external_pb2.py
  61. 54
      ml-agents/mlagents/envs/communicator_objects/unity_rl_output_pb2.py
  62. 66
      ml-agents/mlagents/envs/communicator_objects/unity_rl_input_pb2.py
  63. 39
      ml-agents/mlagents/envs/communicator_objects/unity_rl_initialization_output_pb2.py
  64. 21
      ml-agents/mlagents/envs/communicator_objects/unity_rl_initialization_input_pb2.py
  65. 40
      ml-agents/mlagents/envs/communicator_objects/unity_output_pb2.py
  66. 39
      ml-agents/mlagents/envs/communicator_objects/unity_message_pb2.py
  67. 40
      ml-agents/mlagents/envs/communicator_objects/unity_input_pb2.py
  68. 25
      ml-agents/mlagents/envs/communicator_objects/space_type_proto_pb2.py
  69. 25
      ml-agents/mlagents/envs/communicator_objects/resolution_proto_pb2.py
  70. 23
      ml-agents/mlagents/envs/communicator_objects/header_pb2.py
  71. 36
      ml-agents/mlagents/envs/communicator_objects/environment_parameters_proto_pb2.py
  72. 31
      ml-agents/mlagents/envs/communicator_objects/engine_configuration_proto_pb2.py
  73. 23
      ml-agents/mlagents/envs/communicator_objects/command_proto_pb2.py
  74. 29
      ml-agents/mlagents/envs/communicator_objects/brain_type_proto_pb2.py
  75. 69
      ml-agents/mlagents/envs/communicator_objects/brain_parameters_proto_pb2.py
  76. 46
      ml-agents/mlagents/envs/communicator_objects/agent_info_proto_pb2.py
  77. 32
      ml-agents/mlagents/envs/communicator_objects/agent_action_proto_pb2.py
  78. 5
      config/curricula/wall-jump/BigWallBrain.json
  79. 2
      config/curricula/test/TestBrain.json
  80. 28
      ml-agents/tests/mock_communicator.py
  81. 63
      config/trainer_config.yaml
  82. 4
      ml-agents/mlagents/envs/socket_communicator.py
  83. 6
      ml-agents/mlagents/envs/rpc_communicator.py
  84. 4
      ml-agents/mlagents/envs/exception.py
  85. 98
      ml-agents/mlagents/envs/environment.py
  86. 7
      ml-agents/mlagents/envs/communicator.py
  87. 32
      ml-agents/mlagents/envs/brain.py
  88. 1
      ml-agents/mlagents/envs/__init__.py
  89. 148
      ml-agents/mlagents/trainers/curriculum.py
  90. 348
      ml-agents/mlagents/trainers/trainer_controller.py
  91. 54
      ml-agents/mlagents/trainers/trainer.py
  92. 315
      ml-agents/mlagents/trainers/ppo/trainer.py
  93. 100
      ml-agents/mlagents/trainers/ppo/models.py
  94. 275
      ml-agents/mlagents/trainers/models.py
  95. 15
      ml-agents/mlagents/trainers/buffer.py
  96. 161
      ml-agents/mlagents/trainers/bc/trainer.py
  97. 83
      ml-agents/mlagents/trainers/bc/models.py
  98. 1
      ml-agents/mlagents/trainers/bc/__init__.py
  99. 6
      ml-agents/mlagents/trainers/__init__.py
  100. 2
      ml-agents/requirements.txt

58
.gitignore


/unity-environment/[Ll]ibrary/
/unity-environment/[Tt]emp/
/unity-environment/[Oo]bj/
/unity-environment/[Bb]uild/
/unity-environment/[Bb]uilds/
/unity-environment/[Pp]ackages/
/unity-environment/[Uu]nity[Pp]ackage[Mm]anager/
/unity-environment/Assets/AssetStoreTools*
/unity-environment/Assets/Plugins*
/unity-environment/Assets/Gizmos*
/UnitySDK/[Ll]ibrary/
/UnitySDK/[Tt]emp/
/UnitySDK/[Oo]bj/
/UnitySDK/[Bb]uild/
/UnitySDK/[Bb]uilds/
/UnitySDK/[Pp]ackages/
/UnitySDK/[Uu]nity[Pp]ackage[Mm]anager/
/UnitySDK/Assets/AssetStoreTools*
/UnitySDK/Assets/Plugins*
/UnitySDK/Assets/Gizmos*
python/models
python/summaries
# Training environments
/envs
*unity-environment.log
*UnitySDK.log
/unity-environment/.vs/
/UnitySDK/.vs/
/unity-environmentExportedObj/
/unity-environment.consulo/
/UnitySDKExportedObj/
/UnitySDK.consulo/
*.csproj
*.unityproj
*.sln

*.pidb.meta
# Unity3D Generated File On Crash Reports
/unity-environment/sysinfo.txt
/UnitySDK/sysinfo.txt
# Builds
*.apk

*.x86
# Tensorflow Sharp Files
/unity-environment/Assets/ML-Agents/Plugins/Android*
/unity-environment/Assets/ML-Agents/Plugins/iOS*
/unity-environment/Assets/ML-Agents/Plugins/Computer*
/unity-environment/Assets/ML-Agents/Plugins/System*
/UnitySDK/Assets/ML-Agents/Plugins/Android*
/UnitySDK/Assets/ML-Agents/Plugins/iOS*
/UnitySDK/Assets/ML-Agents/Plugins/Computer*
/UnitySDK/Assets/ML-Agents/Plugins/System*
# Generated doc folders
/docs/html

*.eggs*
*.gitignore.swp
# VSCode hidden files
*.vscode/
.ipynb_checkpoints
# pytest cache
*.pytest_cache/
# Ignore compiled protobuf files.
ml-agents-protobuf/cs
ml-agents-protobuf/python
ml-agents-protobuf/Grpc*
# Ignore PyPi build files.
dist/
build/

5
CODE_OF_CONDUCT.md


## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct/
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 1.4, available at
https://www.contributor-covenant.org/version/1/4/code-of-conduct/
[homepage]: https://www.contributor-covenant.org

60
CONTRIBUTING.md


# Contribution Guidelines
Thank you for your interest in contributing to the ML-Agents toolkit! We are incredibly
excited to see how members of our community will use and extend the ML-Agents toolkit.
To facilitate your contributions, we've outlined a brief set of guidelines
to ensure that your extensions can be easily integrated.
Thank you for your interest in contributing to the ML-Agents toolkit! We are
incredibly excited to see how members of our community will use and extend the
ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set
of guidelines to ensure that your extensions can be easily integrated.
### Communication
## Communication
First, please read through our [code of conduct](CODE_OF_CONDUCT.md),
as we expect all our contributors to follow it.
First, please read through our [code of conduct](CODE_OF_CONDUCT.md), as we
expect all our contributors to follow it.
Second, before starting on a project that you intend to contribute
to the ML-Agents toolkit (whether environments or modifications to the codebase),
we **strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) and
briefly outlining the changes you plan to make. This will enable us to provide
some context that may be helpful for you. This could range from advice and
feedback on how to optimally perform your changes or reasons for not doing it.
Second, before starting on a project that you intend to contribute to the
ML-Agents toolkit (whether environments or modifications to the codebase), we
**strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues)
and briefly outlining the changes you plan to make. This will enable us to
provide some context that may be helpful for you. This could range from advice
and feedback on how to optimally perform your changes or reasons for not doing
it.
### Git Branches
## Git Branches
Starting with v0.3, we adopted the
Starting with v0.3, we adopted the
Consequently, the `master` branch corresponds to the latest release of
Consequently, the `master` branch corresponds to the latest release of
* Corresponding changes to documentation, unit tests and sample environments
(if applicable)
* Corresponding changes to documentation, unit tests and sample environments (if
applicable)
### Environments
## Environments
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
PR explaining the nature of the environment and task.
PR explaining the nature of the environment and task.
### Style Guide
## Style Guide
When performing changes to the codebase, ensure that you follow the style
guide of the file you're modifying. For Python, we follow
[PEP 8](https://www.python.org/dev/peps/pep-0008/). For C#, we will soon be
adding a formal style guide for our repository.
When performing changes to the codebase, ensure that you follow the style guide
of the file you're modifying. For Python, we follow
[PEP 8](https://www.python.org/dev/peps/pep-0008/).
For C#, we will soon be adding a formal style guide for our repository.

11
Dockerfile


# xvfb is used to do CPU based rendering of Unity
RUN apt-get install -y xvfb
ADD python/requirements.txt .
RUN pip install --trusted-host pypi.python.org -r requirements.txt
WORKDIR /execute
COPY python /execute/python
COPY ml-agents /ml-agents
WORKDIR /ml-agents
RUN pip install .
ENTRYPOINT ["python", "python/learn.py"]
ENTRYPOINT ["mlagents-learn"]

201
LICENSE


Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2017 Unity Technologies
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

106
README.md


<img src="docs/images/unity-wide.png" align="middle" width="3000"/>
<img src="docs/images/image-banner.png" align="middle" width="3000"/>
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source Unity plugin
that enables games and simulations to serve as environments for training
intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through
a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers
and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
These trained agents can be used for multiple purposes, including
controlling NPC behavior (in a variety of settings such as multi-agent and
adversarial), automated testing of game builds and evaluating different game
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
developers and AI researchers as it provides a central platform where advances
in AI can be evaluated on Unity’s rich environments and then made accessible
to the wider research and game developer communities.
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source
Unity plugin that enables games and simulations to serve as environments for
training intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through a
simple-to-use Python API. We also provide implementations (based on TensorFlow)
of state-of-the-art algorithms to enable game developers and hobbyists to easily
train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be
used for multiple purposes, including controlling NPC behavior (in a variety of
settings such as multi-agent and adversarial), automated testing of game builds
and evaluating different game design decisions pre-release. The ML-Agents
toolkit is mutually beneficial for both game developers and AI researchers as it
provides a central platform where advances in AI can be evaluated on Unity’s
rich environments and then made accessible to the wider research and game
developer communities.
* Train memory-enhanced Agents using deep reinforcement learning
* Train memory-enhanced agents using deep reinforcement learning
* Broadcasting of Agent behavior for supervised learning
* Broadcasting of agent behavior for supervised learning
* Flexible Agent control with On Demand Decision Making
* Flexible agent control with On Demand Decision Making
* Wrap learning environments as a gym
* For more information, in addition to installation and usage
instructions, see our [documentation home](docs/Readme.md).
* If you have
used a version of the ML-Agents toolkit prior to v0.4, we strongly recommend
our [guide on migrating from earlier versions](docs/Migrating.md).
* For more information, in addition to installation and usage instructions, see
our [documentation home](docs/Readme.md).
* If you are a researcher interested in a discussion of Unity as an AI platform, see a pre-print of our [reference paper on Unity and the ML-Agents Toolkit](https://arxiv.org/abs/1809.02627). Also, see below for instructions on citing this paper.
* If you have used a version of the ML-Agents toolkit prior to v0.5, we strongly
recommend our [guide on migrating from earlier versions](docs/Migrating.md).
## References
## Additional Resources
- Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
- [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
- [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/) announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
- [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
* Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and
[Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
* [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
* [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/)
announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
* [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
- [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
- [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
- [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
* [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
* [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
* [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
The ML-Agents toolkit is an open-source project and we encourage and welcome contributions.
If you wish to contribute, be sure to review our
[contribution guidelines](CONTRIBUTING.md) and
The ML-Agents toolkit is an open-source project and we encourage and welcome
contributions. If you wish to contribute, be sure to review our
[contribution guidelines](CONTRIBUTING.md) and
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using the ML-Agents toolkit and Unity developers enthusiastic
about machine learning. We use that channel to surface updates
regarding the ML-Agents toolkit (and, more broadly, machine learning in games).
* If you run into any problems using the ML-Agents toolkit,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using the ML-Agents toolkit and Unity developers
enthusiastic about machine learning. We use that channel to surface updates
regarding the ML-Agents toolkit (and, more broadly, machine learning in
games).
* If you run into any problems using the ML-Agents toolkit,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
For any other questions or feedback, connect directly with the ML-Agents
team at ml-agents@unity3d.com.

translating more pages and to other languages. Consequently,
we welcome any enhancements and improvements from the community.
- [Chinese](docs/localized/zh-CN/)
* [Chinese](docs/localized/zh-CN/)
## Citation
If you use Unity or the ML-Agents Toolkit to conduct research, we ask that you cite the following paper as a reference:
Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., Lange, D. (2018). Unity: A General Platform for Intelligent Agents. *arXiv preprint arXiv:1809.02627.* https://github.com/Unity-Technologies/ml-agents.

29
docs/API-Reference.md


# API Reference
Our developer-facing C# classes (Academy, Agent, Decision and
Monitor) have been documented to be compatabile with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
documented to be compatible with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
To generate the API reference,
[download Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html) and run
the following command within the `docs/` directory:
To generate the API reference,
[download Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html)
and run the following command within the `docs/` directory:
doxygen dox-ml-agents.conf
```sh
doxygen dox-ml-agents.conf
```
that includes the classes that have been properly formatted.
The generated HTML files will be placed
in the `html/` subdirectory. Open `index.html` within that subdirectory to
navigate to the API reference home. Note that `html/` is already included in
the repository's `.gitignore` file.
that includes the classes that have been properly formatted. The generated HTML
files will be placed in the `html/` subdirectory. Open `index.html` within that
subdirectory to navigate to the API reference home. Note that `html/` is already
included in the repository's `.gitignore` file.
In the near future, we aim to expand our documentation
to include all the Unity C# classes and Python API.
In the near future, we aim to expand our documentation to include all the Unity
C# classes and Python API.

15
docs/Background-Jupyter.md


# Background: Jupyter
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
embedded visualizations. We provide one such notebook, `python/Basics.ipynb`,
for testing the Python control interface to a Unity build. This notebook is
introduced in the
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
embedded visualizations. We provide one such notebook,
`notebooks/getting-started.ipynb`, for testing the Python control interface to a
Unity build. This notebook is introduced in the
in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the command line:
in the _Jupyter/IPython Quick Start Guide_. To launch Jupyter, run in the
command line:
`jupyter notebook`
```sh
jupyter notebook
```
Then navigate to `localhost:8888` to access your notebooks.

301
docs/Background-Machine-Learning.md


# Background: Machine Learning
Given that a number of users of the ML-Agents toolkit might not have a formal machine
learning background, this page provides an overview to facilitate the
understanding of the ML-Agents toolkit. However, We will not attempt to provide a thorough
treatment of machine learning as there are fantastic resources online.
Given that a number of users of the ML-Agents toolkit might not have a formal
machine learning background, this page provides an overview to facilitate the
understanding of the ML-Agents toolkit. However, We will not attempt to provide
a thorough treatment of machine learning as there are fantastic resources
online.
Machine learning, a branch of artificial intelligence, focuses on learning
Machine learning, a branch of artificial intelligence, focuses on learning
include: unsupervised learning, supervised learning and reinforcement learning.
Each class of algorithm learns from a different type of data. The following
paragraphs provide an overview for each of these classes of machine learning,
as well as introductory examples.
include: unsupervised learning, supervised learning and reinforcement learning.
Each class of algorithm learns from a different type of data. The following
paragraphs provide an overview for each of these classes of machine learning, as
well as introductory examples.
The goal of
[unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) is to group or cluster similar items in a
data set. For example, consider the players of a game. We may want to group
the players depending on how engaged they are with the game. This would enable
us to target different groups (e.g. for highly-engaged players we might
invite them to be beta testers for new features, while for unengaged players
we might email them helpful tutorials). Say that we wish to split our players
into two groups. We would first define basic attributes of the players, such
as the number of hours played, total money spent on in-app purchases and
number of levels completed. We can then feed this data set (three attributes
for every player) to an unsupervised learning algorithm where we specify the
number of groups to be two. The algorithm would then split the data set of
players into two groups where the players within each group would be similar
to each other. Given the attributes we used to describe each player, in this
case, the output would be a split of all the players into two groups, where
one group would semantically represent the engaged players and the second
group would semantically represent the unengaged players.
The goal of [unsupervised
learning](https://en.wikipedia.org/wiki/Unsupervised_learning) is to group or
cluster similar items in a data set. For example, consider the players of a
game. We may want to group the players depending on how engaged they are with
the game. This would enable us to target different groups (e.g. for
highly-engaged players we might invite them to be beta testers for new features,
while for unengaged players we might email them helpful tutorials). Say that we
wish to split our players into two groups. We would first define basic
attributes of the players, such as the number of hours played, total money spent
on in-app purchases and number of levels completed. We can then feed this data
set (three attributes for every player) to an unsupervised learning algorithm
where we specify the number of groups to be two. The algorithm would then split
the data set of players into two groups where the players within each group
would be similar to each other. Given the attributes we used to describe each
player, in this case, the output would be a split of all the players into two
groups, where one group would semantically represent the engaged players and the
second group would semantically represent the unengaged players.
defined the appropriate attributes and relied on the algorithm to uncover
the two groups on its own. This type of data set is typically called an
unlabeled data set as it is lacking these direct labels. Consequently,
unsupervised learning can be helpful in situations where these labels can be
expensive or hard to produce. In the next paragraph, we overview supervised
learning algorithms which accept input labels in addition to attributes.
defined the appropriate attributes and relied on the algorithm to uncover the
two groups on its own. This type of data set is typically called an unlabeled
data set as it is lacking these direct labels. Consequently, unsupervised
learning can be helpful in situations where these labels can be expensive or
hard to produce. In the next paragraph, we overview supervised learning
algorithms which accept input labels in addition to attributes.
In [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning),
we do not want to just group similar items but directly
learn a mapping from each item to the group (or class) that it belongs to.
Returning to our earlier example of
clustering players, let's say we now wish to predict which of our players are
about to churn (that is stop playing the game for the next 30 days). We
can look into our historical records and create a data set that
contains attributes of our players in addition to a label indicating whether
they have churned or not. Note that the player attributes we use for this
churn prediction task may be different from the ones we used for our earlier
clustering task. We can then feed this data set (attributes **and** label for
each player) into a supervised learning algorithm which would learn a mapping
from the player attributes to a label indicating whether that player
will churn or not. The intuition is that the supervised learning algorithm
will learn which values of these attributes typically correspond to players
who have churned and not churned (for example, it may learn that players
who spend very little and play for very short periods will most likely churn).
Now given this learned model, we can provide it the attributes of a
new player (one that recently started playing the game) and it would output
a _predicted_ label for that player. This prediction is the algorithms
expectation of whether the player will churn or not.
We can now use these predictions to target the players
who are expected to churn and entice them to continue playing the game.
In [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning), we
do not want to just group similar items but directly learn a mapping from each
item to the group (or class) that it belongs to. Returning to our earlier
example of clustering players, let's say we now wish to predict which of our
players are about to churn (that is stop playing the game for the next 30 days).
We can look into our historical records and create a data set that contains
attributes of our players in addition to a label indicating whether they have
churned or not. Note that the player attributes we use for this churn prediction
task may be different from the ones we used for our earlier clustering task. We
can then feed this data set (attributes **and** label for each player) into a
supervised learning algorithm which would learn a mapping from the player
attributes to a label indicating whether that player will churn or not. The
intuition is that the supervised learning algorithm will learn which values of
these attributes typically correspond to players who have churned and not
churned (for example, it may learn that players who spend very little and play
for very short periods will most likely churn). Now given this learned model, we
can provide it the attributes of a new player (one that recently started playing
the game) and it would output a _predicted_ label for that player. This
prediction is the algorithms expectation of whether the player will churn or
not. We can now use these predictions to target the players who are expected to
churn and entice them to continue playing the game.
player. Model selection, on the other hand, pertains to selecting the
algorithm (and its parameters) that perform the task well. Both of these
tasks are active areas of machine learning research and, in practice, require
several iterations to achieve good performance.
player. Model selection, on the other hand, pertains to selecting the algorithm
(and its parameters) that perform the task well. Both of these tasks are active
areas of machine learning research and, in practice, require several iterations
to achieve good performance.
We now switch to reinforcement learning, the third class of
machine learning algorithms, and arguably the one most relevant for the ML-Agents toolkit.
We now switch to reinforcement learning, the third class of machine learning
algorithms, and arguably the one most relevant for the ML-Agents toolkit.
can be viewed as a form of learning for sequential
decision making that is commonly associated with controlling robots (but is,
in fact, much more general). Consider an autonomous firefighting robot that is
tasked with navigating into an area, finding the fire and neutralizing it. At
any given moment, the robot perceives the environment through its sensors (e.g.
camera, heat, touch), processes this information and produces an action (e.g.
move to the left, rotate the water hose, turn on the water). In other words,
it is continuously making decisions about how to interact in this environment
given its view of the world (i.e. sensors input) and objective (i.e.
neutralizing the fire). Teaching a robot to be a successful firefighting
machine is precisely what reinforcement learning is designed to do.
can be viewed as a form of learning for sequential decision making that is
commonly associated with controlling robots (but is, in fact, much more
general). Consider an autonomous firefighting robot that is tasked with
navigating into an area, finding the fire and neutralizing it. At any given
moment, the robot perceives the environment through its sensors (e.g. camera,
heat, touch), processes this information and produces an action (e.g. move to
the left, rotate the water hose, turn on the water). In other words, it is
continuously making decisions about how to interact in this environment given
its view of the world (i.e. sensors input) and objective (i.e. neutralizing the
fire). Teaching a robot to be a successful firefighting machine is precisely
what reinforcement learning is designed to do.
More specifically, the goal of reinforcement learning is to learn a **policy**,
which is essentially a mapping from **observations** to **actions**. An
observation is what the robot can measure from its **environment** (in this
More specifically, the goal of reinforcement learning is to learn a **policy**,
which is essentially a mapping from **observations** to **actions**. An
observation is what the robot can measure from its **environment** (in this
to the configuration of the robot (e.g. position of its base, position of
its water hose and whether the hose is on or off).
to the configuration of the robot (e.g. position of its base, position of its
water hose and whether the hose is on or off).
The last remaining piece
of the reinforcement learning task is the **reward signal**. When training a
robot to be a mean firefighting machine, we provide it with rewards (positive
and negative) indicating how well it is doing on completing the task.
Note that the robot does not _know_ how to put out fires before it is trained.
It learns the objective because it receives a large positive reward when it puts
out the fire and a small negative reward for every passing second. The fact that
rewards are sparse (i.e. may not be provided at every step, but only when a
robot arrives at a success or failure situation), is a defining characteristic of
reinforcement learning and precisely why learning good policies can be difficult
(and/or time-consuming) for complex environments.
The last remaining piece of the reinforcement learning task is the **reward
signal**. When training a robot to be a mean firefighting machine, we provide it
with rewards (positive and negative) indicating how well it is doing on
completing the task. Note that the robot does not _know_ how to put out fires
before it is trained. It learns the objective because it receives a large
positive reward when it puts out the fire and a small negative reward for every
passing second. The fact that rewards are sparse (i.e. may not be provided at
every step, but only when a robot arrives at a success or failure situation), is
a defining characteristic of reinforcement learning and precisely why learning
good policies can be difficult (and/or time-consuming) for complex environments.
<p align="center">
<img src="images/rl_cycle.png" alt="The reinforcement learning cycle."/>

usually requires many trials and iterative
policy updates. More specifically, the robot is placed in several
fire situations and over time learns an optimal policy which allows it
to put our fires more effectively. Obviously, we cannot expect to train a
robot repeatedly in the real world, particularly when fires are involved. This
is precisely why the use of
usually requires many trials and iterative policy updates. More specifically,
the robot is placed in several fire situations and over time learns an optimal
policy which allows it to put our fires more effectively. Obviously, we cannot
expect to train a robot repeatedly in the real world, particularly when fires
are involved. This is precisely why the use of
serves as the perfect training grounds for learning such behaviors.
While our discussion of reinforcement learning has centered around robots,
there are strong parallels between robots and characters in a game. In fact,
in many ways, one can view a non-playable character (NPC) as a virtual
robot, with its own observations about the environment, its own set of actions
and a specific objective. Thus it is natural to explore how we can
train behaviors within Unity using reinforcement learning. This is precisely
what the ML-Agents toolkit offers. The video linked below includes a reinforcement
learning demo showcasing training character behaviors using the ML-Agents toolkit.
serves as the perfect training grounds for learning such behaviors. While our
discussion of reinforcement learning has centered around robots, there are
strong parallels between robots and characters in a game. In fact, in many ways,
one can view a non-playable character (NPC) as a virtual robot, with its own
observations about the environment, its own set of actions and a specific
objective. Thus it is natural to explore how we can train behaviors within Unity
using reinforcement learning. This is precisely what the ML-Agents toolkit
offers. The video linked below includes a reinforcement learning demo showcasing
training character behaviors using the ML-Agents toolkit.
<a href="http://www.youtube.com/watch?feature=player_embedded&v=fiQsmdwEGT8" target="_blank">
<img src="http://img.youtube.com/vi/fiQsmdwEGT8/0.jpg" alt="RL Demo" width="400" border="10" />
</a>
<a href="http://www.youtube.com/watch?feature=player_embedded&v=fiQsmdwEGT8" target="_blank">
<img src="http://img.youtube.com/vi/fiQsmdwEGT8/0.jpg" alt="RL Demo" width="400" border="10" />
</a>
also involves two tasks: attribute selection and model selection.
Attribute selection is defining the set of observations for the robot
that best help it complete its objective, while model selection is defining
the form of the policy (mapping from observations to actions) and its
parameters. In practice, training behaviors is an iterative process that may
require changing the attribute and model choices.
also involves two tasks: attribute selection and model selection. Attribute
selection is defining the set of observations for the robot that best help it
complete its objective, while model selection is defining the form of the policy
(mapping from observations to actions) and its parameters. In practice, training
behaviors is an iterative process that may require changing the attribute and
model choices.
One common aspect of all three branches of machine learning is that they
all involve a **training phase** and an **inference phase**. While the
details of the training and inference phases are different for each of the
three, at a high-level, the training phase involves building a model
using the provided data, while the inference phase involves applying this
model to new, previously unseen, data. More specifically:
* For our unsupervised learning
example, the training phase learns the optimal two clusters based
on the data describing existing players, while the inference phase assigns a
new player to one of these two clusters.
* For our supervised learning example, the
training phase learns the mapping from player attributes to player label
(whether they churned or not), and the inference phase predicts whether
a new player will churn or not based on that learned mapping.
* For our reinforcement learning example, the training phase learns the
optimal policy through guided trials, and in the inference phase, the agent
observes and tales actions in the wild using its learned policy.
One common aspect of all three branches of machine learning is that they all
involve a **training phase** and an **inference phase**. While the details of
the training and inference phases are different for each of the three, at a
high-level, the training phase involves building a model using the provided
data, while the inference phase involves applying this model to new, previously
unseen, data. More specifically:
To briefly summarize: all three classes of algorithms involve training
and inference phases in addition to attribute and model selections. What
ultimately separates them is the type of data available to learn from. In
unsupervised learning our data set was a collection of attributes, in
supervised learning our data set was a collection of attribute-label pairs,
and, lastly, in reinforcement learning our data set was a collection of
* For our unsupervised learning example, the training phase learns the optimal
two clusters based on the data describing existing players, while the
inference phase assigns a new player to one of these two clusters.
* For our supervised learning example, the training phase learns the mapping
from player attributes to player label (whether they churned or not), and the
inference phase predicts whether a new player will churn or not based on that
learned mapping.
* For our reinforcement learning example, the training phase learns the optimal
policy through guided trials, and in the inference phase, the agent observes
and tales actions in the wild using its learned policy.
To briefly summarize: all three classes of algorithms involve training and
inference phases in addition to attribute and model selections. What ultimately
separates them is the type of data available to learn from. In unsupervised
learning our data set was a collection of attributes, in supervised learning our
data set was a collection of attribute-label pairs, and, lastly, in
reinforcement learning our data set was a collection of
[Deep learning](https://en.wikipedia.org/wiki/Deep_learning) is a family of
algorithms that can be used to address any of the problems introduced
above. More specifically, they can be used to solve both attribute and
model selection tasks. Deep learning has gained popularity in recent
years due to its outstanding performance on several challenging machine learning
tasks. One example is [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo),
a [computer Go](https://en.wikipedia.org/wiki/Computer_Go) program, that
leverages deep learning, that was able to beat Lee Sedol (a Go world champion).
[Deep learning](https://en.wikipedia.org/wiki/Deep_learning) is a family of
algorithms that can be used to address any of the problems introduced above.
More specifically, they can be used to solve both attribute and model selection
tasks. Deep learning has gained popularity in recent years due to its
outstanding performance on several challenging machine learning tasks. One
example is [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo), a [computer
Go](https://en.wikipedia.org/wiki/Computer_Go) program, that leverages deep
learning, that was able to beat Lee Sedol (a Go world champion).
complex functions from large amounts of training data. This makes them a
natural choice for reinforcement learning tasks when a large amount of data
can be generated, say through the use of a simulator or engine such as Unity.
By generating hundreds of thousands of simulations of
the environment within Unity, we can learn policies for very complex environments
(a complex environment is one where the number of observations an agent perceives
and the number of actions they can take are large).
Many of the algorithms we provide in ML-Agents use some form of deep learning,
built on top of the open-source library, [TensorFlow](Background-TensorFlow.md).
complex functions from large amounts of training data. This makes them a natural
choice for reinforcement learning tasks when a large amount of data can be
generated, say through the use of a simulator or engine such as Unity. By
generating hundreds of thousands of simulations of the environment within Unity,
we can learn policies for very complex environments (a complex environment is
one where the number of observations an agent perceives and the number of
actions they can take are large). Many of the algorithms we provide in ML-Agents
use some form of deep learning, built on top of the open-source library,
[TensorFlow](Background-TensorFlow.md).

74
docs/Background-TensorFlow.md


# Background: TensorFlow
As discussed in our
[machine learning background page](Background-Machine-Learning.md), many of the
algorithms we provide in the ML-Agents toolkit leverage some form of deep learning.
More specifically, our implementations are built on top of the open-source
library [TensorFlow](https://www.tensorflow.org/). This means that the models
produced by the ML-Agents toolkit are (currently) in a format only understood by
As discussed in our
[machine learning background page](Background-Machine-Learning.md),
many of the algorithms we provide in the
ML-Agents toolkit leverage some form of deep learning. More specifically, our
implementations are built on top of the open-source library
[TensorFlow](https://www.tensorflow.org/). This means that the models produced
by the ML-Agents toolkit are (currently) in a format only understood by
TensorFlow. In this page we provide a brief overview of TensorFlow, in addition
to TensorFlow-related tools that we leverage within the ML-Agents toolkit.

performing computations using data flow graphs, the underlying representation
of deep learning models. It facilitates training and inference on CPUs and
GPUs in a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
train the behavior of an Agent, the output is a TensorFlow model (.bytes)
file that you can then embed within an Internal Brain. Unless you implement
a new algorithm, the use of TensorFlow is mostly abstracted away and behind
the scenes.
performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
train the behavior of an agent, the output is a TensorFlow model (.bytes) file
that you can then embed within an Internal Brain. Unless you implement a new
algorithm, the use of TensorFlow is mostly abstracted away and behind the
scenes.
One component of training models with TensorFlow is setting the
values of certain model attributes (called _hyperparameters_). Finding the
right values of these hyperparameters can require a few iterations.
Consequently, we leverage a visualization tool within TensorFlow called
[TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard).
It allows the visualization of certain agent attributes (e.g. reward)
throughout training which can be helpful in both building
intuitions for the different hyperparameters and setting the optimal values for
your Unity environment. We provide more details on setting the hyperparameters
in later parts of the documentation, but, in the meantime, if you are
unfamiliar with TensorBoard we recommend this
One component of training models with TensorFlow is setting the values of
certain model attributes (called _hyperparameters_). Finding the right values of
these hyperparameters can require a few iterations. Consequently, we leverage a
visualization tool within TensorFlow called
[TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard).
It allows the visualization of certain agent attributes (e.g. reward) throughout
training which can be helpful in both building intuitions for the different
hyperparameters and setting the optimal values for your Unity environment. We
provide more details on setting the hyperparameters in later parts of the
documentation, but, in the meantime, if you are unfamiliar with TensorBoard we
recommend this
One of the drawbacks of TensorFlow is that it does not provide a native
C# API. This means that the Internal Brain is not natively supported since
Unity scripts are written in C#. Consequently,
to enable the Internal Brain, we leverage a third-party
library [TensorFlowSharp](https://github.com/migueldeicaza/TensorFlowSharp)
which provides .NET bindings to TensorFlow. Thus, when a Unity environment
that contains an Internal Brain is built, inference is performed via
TensorFlowSharp. We provide an additional in-depth overview of how to
leverage [TensorFlowSharp within Unity](Using-TensorFlow-Sharp-in-Unity.md)
which will become more relevant once you install and start training
behaviors within the ML-Agents toolkit. Given the reliance on TensorFlowSharp, the
Internal Brain is currently marked as experimental.
One of the drawbacks of TensorFlow is that it does not provide a native C# API.
This means that the Internal Brain is not natively supported since Unity scripts
are written in C#. Consequently, to enable the Internal Brain, we leverage a
third-party library
[TensorFlowSharp](https://github.com/migueldeicaza/TensorFlowSharp) which
provides .NET bindings to TensorFlow. Thus, when a Unity environment that
contains an Internal Brain is built, inference is performed via TensorFlowSharp.
We provide an additional in-depth overview of how to leverage
[TensorFlowSharp within Unity](Using-TensorFlow-Sharp-in-Unity.md)
which will become more
relevant once you install and start training behaviors within the ML-Agents
toolkit. Given the reliance on TensorFlowSharp, the Internal Brain is currently
marked as experimental.

12
docs/Background-Unity.md


# Background: Unity
If you are not familiar with the [Unity Engine](https://unity3d.com/unity),
we highly recommend the
[Unity Manual](https://docs.unity3d.com/Manual/index.html) and
[Tutorials page](https://unity3d.com/learn/tutorials). The
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
highly recommend the [Unity Manual](https://docs.unity3d.com/Manual/index.html)
and [Tutorials page](https://unity3d.com/learn/tutorials). The
with the ML-Agents toolkit:
with the ML-Agents toolkit:
* [Editor](https://docs.unity3d.com/Manual/UsingTheEditor.html)
* [Interface](https://docs.unity3d.com/Manual/LearningtheInterface.html)
* [Scene](https://docs.unity3d.com/Manual/CreatingScenes.html)

* [Scripting](https://docs.unity3d.com/Manual/ScriptingSection.html)
* [Physics](https://docs.unity3d.com/Manual/PhysicsSection.html)
* [Ordering of event functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
(e.g. FixedUpdate, Update)
(e.g. FixedUpdate, Update)

242
docs/Basic-Guide.md


# Basic Guide
This guide will show you how to use a pretrained model in an example Unity environment, and show you how to train the model yourself.
This guide will show you how to use a pre-trained model in an example Unity
environment, and show you how to train the model yourself.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity),
we highly recommend the [Roll-a-ball tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all the basic concepts of Unity.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
highly recommend the [Roll-a-ball
tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial) to learn all
the basic concepts of Unity.
In order to use the ML-Agents toolkit within Unity, you need to change some Unity settings first. Also [TensorFlowSharp plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) is needed for you to use pretrained model within Unity, which is based on the [TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
In order to use the ML-Agents toolkit within Unity, you need to change some
Unity settings first. Also [TensorFlowSharp
plugin](https://s3.amazonaws.com/unity-ml-agents/0.5/TFSharpPlugin.unitypackage)
is needed for you to use pre-trained model within Unity, which is based on the
[TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
3. Using the file dialog that opens, locate the `unity-environment` folder within the the ML-Agents toolkit project and click **Open**.
3. Using the file dialog that opens, locate the `UnitySDK` folder
within the the ML-Agents toolkit project and click **Open**.
5. For **each** of the platforms you target
(**PC, Mac and Linux Standalone**, **iOS** or **Android**):
5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,
**iOS** or **Android**):
2. Select **Scripting Runtime Version** to
**Experimental (.NET 4.6 Equivalent or .NET 4.x Equivalent)**
3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`.
After typing in the flag name, press Enter.
2. Select **Scripting Runtime Version** to **Experimental (.NET 4.6
Equivalent or .NET 4.x Equivalent)**
3. In **Scripting Defined Symbols**, add the flag `ENABLE_TENSORFLOW`. After
typing in the flag name, press Enter.
[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage) the TensorFlowSharp plugin. Then import it into Unity by double clicking the downloaded file. You can check if it was successfully imported by checking the TensorFlow files in the Project window under **Assets** > **ML-Agents** > **Plugins** > **Computer**.
[Download](https://s3.amazonaws.com/unity-ml-agents/0.5/TFSharpPlugin.unitypackage)
the TensorFlowSharp plugin. Then import it into Unity by double clicking the
downloaded file. You can check if it was successfully imported by checking the
TensorFlow files in the Project window under **Assets** > **ML-Agents** >
**Plugins** > **Computer**.
**Note**: If you don't see anything under **Assets**, drag the `ml-agents/unity-environment/Assets/ML-Agents` folder under **Assets** within Project window.
**Note**: If you don't see anything under **Assets**, drag the
`UnitySDK/Assets/ML-Agents` folder under **Assets** within Project window.
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder and open the `3DBall` scene file.
2. In the **Hierarchy** window, select the **Ball3DBrain** child under the **Ball3DAcademy** GameObject to view its properties in the Inspector window.
3. On the **Ball3DBrain** object's **Brain** component, change the **Brain Type** to **Internal**.
4. In the **Project** window, locate the `Assets/ML-Agents/Examples/3DBall/TFModels` folder.
5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph Model** field of the **Ball3DBrain** object's **Brain** component.
5. Click the **Play** button and you will see the platforms balance the balls using the pretrained model.
1. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall` folder
and open the `3DBall` scene file.
2. In the **Hierarchy** window, select the **Ball3DBrain** child under the
**Ball3DAcademy** GameObject to view its properties in the Inspector window.
3. On the **Ball3DBrain** object's **Brain** component, change the **Brain
Type** to **Internal**.
4. In the **Project** window, locate the
`Assets/ML-Agents/Examples/3DBall/TFModels` folder.
5. Drag the `3DBall` model file from the `TFModels` folder to the **Graph
Model** field of the **Ball3DBrain** object's **Brain** component.
6. Click the **Play** button and you will see the platforms balance the balls
using the pretrained model.
The `python/Basics` [Jupyter notebook](Background-Jupyter.md) contains a
simple walkthrough of the functionality of the Python
API. It can also serve as a simple test that your environment is configured
correctly. Within `Basics`, be sure to set `env_name` to the name of the
Unity executable if you want to [use an executable](Learning-Environment-Executable.md) or to `None` if you want to interact with the current scene in the Unity Editor.
The `notebooks/getting-started.ipynb` [Jupyter notebook](Background-Jupyter.md)
contains a simple walkthrough of the functionality of the Python API. It can
also serve as a simple test that your environment is configured correctly.
Within `Basics`, be sure to set `env_name` to the name of the Unity executable
if you want to [use an executable](Learning-Environment-Executable.md) or to
`None` if you want to interact with the current scene in the Unity Editor.
More information and documentation is provided in the
More information and documentation is provided in the
Since we are going to build this environment to conduct training, we need to
set the brain used by the agents to **External**. This allows the agents to
Since we are going to build this environment to conduct training, we need to set
the Brain used by the Agents to **External**. This allows the Agents to
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
2. Select its child object **Ball3DBrain**.
3. In the Inspector window, set **Brain Type** to **External**.

1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
3. Change to the `python` directory.
4. Run `python3 learn.py --run-id=<run-identifier> --train`
Where:
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells learn.py to run a training session (rather than inference)
5. When the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen, you can press the :arrow_forward: button in Unity to start training in the Editor.
**Note**: Alternatively, you can use an executable rather than the Editor to perform training. Please refer to [this page](Learning-Environment-Executable.md) for instructions on how to build and use an executable.
1. Open a command or terminal window.
2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
**Note**: If you followed the default [installation](Installation.md), then
you should be able to run `mlagents-learn` from any directory.
3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train`
where:
- `<trainer-config-path>` is the relative or absolute filepath of the
trainer configuration. The defaults used by example environments included
in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs
- `--train` tells `mlagents-learn` to run a training session (rather
than inference)
4. If you cloned the ML-Agents repo, then you can simply run
![Training command example](images/training-command-example.png)
```sh
mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
```
**Note**: If you're using Anaconda, don't forget to activate the ml-agents environment first.
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.
**Note**: Alternatively, you can use an executable rather than the Editor to
perform training. Please refer to [this
page](Learning-Environment-Executable.md) for instructions on how to build and
use an executable.
If the learn.py runs correctly and starts training, you should see something like this:
```console
ml-agents$ mlagents-learn config/trainer_config.yaml --run-id=first-run --train
![Training running](images/training-running.png)
▄▄▄▓▓▓▓
╓▓▓▓▓▓▓█▓▓▓▓▓
,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌
▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄
▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌
▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌
▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓
^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓`
'▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌
▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀
`▀█▓▓▓▓▓▓▓▓▓▌
¬`▀▀▀█▓
INFO:mlagents.learn:{'--curriculum': 'None',
'--docker-target-name': 'Empty',
'--env': 'None',
'--help': False,
'--keep-checkpoints': '5',
'--lesson': '0',
'--load': False,
'--no-graphics': False,
'--num-runs': '1',
'--run-id': 'first-run',
'--save-freq': '50000',
'--seed': '-1',
'--slow': False,
'--train': True,
'--worker-id': '0',
'<trainer-config-path>': 'config/trainer_config.yaml'}
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
```
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
If `mlagents-learn` runs correctly and starts training, you should see something
like this:
```console
INFO:mlagents.envs:
'Ball3DAcademy' started successfully!
Unity Academy name: Ball3DAcademy
Number of Brains: 1
Number of External Brains : 1
Reset Parameters :
Unity brain name: Ball3DBrain
Number of Visual Observations (per agent): 0
Vector Observation space size (per agent): 8
Number of stacked Vector Observation: 1
Vector Action space type: continuous
Vector Action space size (per agent): [2]
Vector Action descriptions: ,
INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain Ball3DBrain:
batch_size: 64
beta: 0.001
buffer_size: 12000
epsilon: 0.2
gamma: 0.995
hidden_units: 128
lambd: 0.99
learning_rate: 0.0003
max_steps: 5.0e4
normalize: True
num_epoch: 3
num_layers: 2
time_horizon: 1000
sequence_length: 64
summary_freq: 1000
use_recurrent: False
graph_scope:
summary_path: ./summaries/first-run-0
memory_size: 256
use_curiosity: False
curiosity_strength: 0.01
curiosity_enc_size: 128
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.
```
You can press Ctrl+C to stop the training, and your trained model will be at `ml-agents/python/models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where `<academy_name>` is the name of the Academy GameObject in the current scene. This file corresponds to your model's latest checkpoint. You can now embed this trained model into your internal brain by following the steps below, which is similar to the steps described [above](#play-an-example-environment-using-pretrained-model).
1. Move your model file into
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where
`<academy_name>` is the name of the Academy GameObject in the current scene.
This file corresponds to your model's latest checkpoint. You can now embed this
trained model into your Internal Brain by following the steps below, which is
similar to the steps described
[above](#play-an-example-environment-using-pretrained-model).
1. Move your model file into
`UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of the Editor
to the **Graph Model** placeholder in the **Ball3DBrain** inspector window.
5. Drag the `<env_name>_<run-identifier>.bytes` file from the Project window of
the Editor to the **Graph Model** placeholder in the **Ball3DBrain**
inspector window.
* For more information on the ML-Agents toolkit, in addition to helpful background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md) page.
* For a more detailed walk-through of our 3D Balance Ball environment, check out the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
* For a "Hello World" introduction to creating your own learning environment, check out the [Making a New Learning Environment](Learning-Environment-Create-New.md) page.
* For a series of Youtube video tutorials, checkout the [Machine Learning Agents PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX) page.
- For more information on the ML-Agents toolkit, in addition to helpful
background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
page.
- For a more detailed walk-through of our 3D Balance Ball environment, check out
the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
- For a "Hello World" introduction to creating your own Learning Environment,
check out the [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.
- For a series of Youtube video tutorials, checkout the
[Machine Learning Agents PlayList](https://www.youtube.com/playlist?list=PLX2vGYjWbI0R08eWQkO7nQkGiicHAX7IX)
page.

136
docs/FAQ.md


# Frequently Asked Questions
## Scripting Runtime Environment not setup correctly
### Scripting Runtime Environment not setup correctly
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6
or .NET 4.x, you will see such error message:
If you haven't switched your scripting runtime version from .NET 3.5 to .NET 4.6 or .NET 4.x, you will see such error message:
```
```console
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
This is because .NET 3.5 doesn't support method Clear() for StringBuilder, refer
to [Setting Up The ML-Agents Toolkit Within
Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
### TensorFlowSharp flag not turned on.
## TensorFlowSharp flag not turned on
If you have already imported the TensorFlowSharp plugin, but havn't set ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the following error message:
If you have already imported the TensorFlowSharp plugin, but haven't set
ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the
following error message:
```
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
```console
You need to install and enable the TensorFlowSharp plugin in order to use the Internal Brain.
This error message occurs because the TensorFlowSharp plugin won't be usage without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
This error message occurs because the TensorFlowSharp plugin won't be usage
without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit
Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
### Tensorflow epsilon placeholder error
## Instance of CoreBrainInternal couldn't be created
If you have a graph placeholder set in the internal Brain inspector that is not present in the TensorFlow graph, you will see some error like this:
If you try to use ML-Agents in Unity versions 2017.1 - 2017.3, you might
encounter an error that looks like this:
```
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
```console
Instance of CoreBrainInternal couldn't be created. The the script
class needs to derive from ScriptableObject.
UnityEngine.ScriptableObject:CreateInstance(String)
Solution: Go to all of your Brain object, find `Graph placeholders` and change its `size` to 0 to remove the `epsilon` placeholder.
You can fix the error by removing `CoreBrain` from CoreBrainInternal.cs:16,
clicking on your Brain Gameobject to let the scene recompile all the changed
C# scripts, then adding the `CoreBrain` back. Make sure your brain is in
Internal mode, your TensorFlowSharp plugin is imported and the
ENABLE_TENSORFLOW flag is set. This fix is only valid locally and unstable.
## Tensorflow epsilon placeholder error
Similarly, if you have a graph scope set in the internal Brain inspector that is not correctly set, you will see some error like this:
If you have a graph placeholder set in the Internal Brain inspector that is not
present in the TensorFlow graph, you will see some error like this:
```console
UnityAgentsException: One of the TensorFlow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
Solution: Go to all of your Brain object, find `Graph placeholders` and change
its `size` to 0 to remove the `epsilon` placeholder.
Similarly, if you have a graph scope set in the Internal Brain inspector that is
not correctly set, you will see some error like this:
```console
Solution: Make sure your Graph Scope field matches the corresponding brain object name in your Hierachy Inspector when there is multiple brain.
Solution: Make sure your Graph Scope field matches the corresponding Brain
object name in your Hierarchy Inspector when there are multiple Brains.
### Environment Permission Error
## Environment Permission Error
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
If you directly import your Unity environment without building it in the
editor, you might need to give it additional permissions to execute it.
`chmod -R 755 *.app`
```sh
chmod -R 755 *.app
```
`chmod -R 755 *.x86_64`
```sh
chmod -R 755 *.x86_64
```
On Windows, you can find
On Windows, you can find
### Environment Connection Timeout
## Environment Connection Timeout
If you are able to launch the environment from `UnityEnvironment` but
then receive a timeout error, there may be a number of possible causes.
* _Cause_: There may be no Brains in your environment which are set
to `External`. In this case, the environment will not attempt to
communicate with python. _Solution_: Set the Brains(s) you wish to
externally control through the Python API to `External` from the
Unity Editor, and rebuild the environment.
* _Cause_: On OSX, the firewall may be preventing communication with
the environment. _Solution_: Add the built environment binary to the
list of exceptions on the firewall by following
[instructions](https://support.apple.com/en-us/HT201642).
* _Cause_: An error happened in the Unity Environment preventing
communication. _Solution_: Look into the
[log files](https://docs.unity3d.com/Manual/LogFiles.html)
generated by the Unity Environment to figure what error happened.
If you are able to launch the environment from `UnityEnvironment` but then
receive a timeout error, there may be a number of possible causes.
### Communication port {} still in use
* _Cause_: There may be no Brains in your environment which are set to
`External`. In this case, the environment will not attempt to communicate
with python. _Solution_: Set the Brains(s) you wish to externally control
through the Python API to `External` from the Unity Editor, and rebuild the
environment.
* _Cause_: On OSX, the firewall may be preventing communication with the
environment. _Solution_: Add the built environment binary to the list of
exceptions on the firewall by following
[instructions](https://support.apple.com/en-us/HT201642).
* _Cause_: An error happened in the Unity Environment preventing communication.
_Solution_: Look into the [log
files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the Unity
Environment to figure what error happened.
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker
number in the Python script when calling
## Communication port {} still in use
`UnityEnvironment(file_name=filename, worker_id=X)`
If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker number in
the Python script when calling
### Mean reward : nan
```python
UnityEnvironment(file_name=filename, worker_id=X)
```
If you receive a message `Mean reward : nan` when attempting to train a
model using PPO, this is due to the episodes of the learning environment
not terminating. In order to address this, set `Max Steps` for either
the Academy or Agents within the Scene Inspector to a value greater
than 0. Alternatively, it is possible to manually set `done` conditions
for episodes from within scripts for custom episode-terminating events.
## Mean reward : nan
If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for either the Academy or
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
for custom episode-terminating events.

57
docs/Feature-Memory.md


# Memory-enhanced Agents using Recurrent Neural Networks
# Memory-enhanced agents using Recurrent Neural Networks
## What are memories for?
Have you ever entered a room to get something and immediately forgot
what you were looking for? Don't let that happen to
your agents.
## What are memories used for?
Have you ever entered a room to get something and immediately forgot what you
were looking for? Don't let that happen to your agents.
It is now possible to give memories to your agents. When training, the
agents will be able to store a vector of floats to be used next time
they need to make a decision.
It is now possible to give memories to your agents. When training, the agents
will be able to store a vector of floats to be used next time they need to make
a decision.
Deciding what the agents should remember in order to solve a task is not
easy to do by hand, but our training algorithms can learn to keep
track of what is important to remember with [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory).
Deciding what the agents should remember in order to solve a task is not easy to
do by hand, but our training algorithms can learn to keep track of what is
important to remember with
[LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory).
When configuring the trainer parameters in the `trainer_config.yaml`
When configuring the trainer parameters in the `config/trainer_config.yaml`
file, add the following parameters to the Brain you want to use.
```json

```
* `use_recurrent` is a flag that notifies the trainer that you want
to use a Recurrent Neural Network.
* `sequence_length` defines how long the sequences of experiences
must be while training. In order to use a LSTM, training requires
a sequence of experiences instead of single experiences.
* `memory_size` corresponds to the size of the memory the agent
must keep. Note that if this number is too small, the agent will not
be able to remember a lot of things. If this number is too large,
the neural network will take longer to train.
* `use_recurrent` is a flag that notifies the trainer that you want to use a
Recurrent Neural Network.
* `sequence_length` defines how long the sequences of experiences must be while
training. In order to use a LSTM, training requires a sequence of experiences
instead of single experiences.
* `memory_size` corresponds to the size of the memory the agent must keep. Note
that if this number is too small, the agent will not be able to remember a lot
of things. If this number is too large, the neural network will take longer to
train.
* LSTM does not work well with continuous vector action space.
Please use discrete vector action space for better results.
* Since the memories must be sent back and forth between Python
and Unity, using too large `memory_size` will slow down training.
* Adding a recurrent layer increases the complexity of the neural
network, it is recommended to decrease `num_layers` when using recurrent.
* LSTM does not work well with continuous vector action space. Please use
discrete vector action space for better results.
* Since the memories must be sent back and forth between Python and Unity, using
too large `memory_size` will slow down training.
* Adding a recurrent layer increases the complexity of the neural network, it is
recommended to decrease `num_layers` when using recurrent.
* It is required that `memory_size` be divisible by 4.

50
docs/Feature-Monitor.md


![Monitor](images/monitor.png)
The monitor allows visualizing information related to the agents or training process within a Unity scene.
The monitor allows visualizing information related to the agents or training
process within a Unity scene.
You can track many different things both related and unrelated to the agents
themselves. By default, the Monitor is only active in the *inference* phase, so
not during training. To change this behavior, you can activate or deactivate it
by calling `SetActive(boolean)`. For example to also show the monitor during
training, you can call it in the `InitializeAcademy()` method of your `Academy`:
```csharp
using MLAgents;
public class YourAcademy : Academy {
public override void InitializeAcademy()
{
Monitor.SetActive(true);
}
}
```
You can track many different things both related and unrelated to the agents themselves. To use the Monitor, call the Log function anywhere in your code :
To add values to monitor, call the `Log` function anywhere in your code:
* *`key`* is the name of the information you want to display.
* *`value`* is the information you want to display. *`value`* can have different types :
* *`string`* - The Monitor will display the string next to the key. It can be useful for displaying error messages.
* *`float`* - The Monitor will display a slider. Note that the values must be between -1 and 1. If the value is positive, the slider will be green, if the value is negative, the slider will be red.
* *`float[]`* - The Monitor Log call can take an additional argument called `displayType` that can be either `INDEPENDENT` (default) or `PROPORTIONAL` :
* *`INDEPENDENT`* is used to display multiple independent floats as a histogram. The histogram will be a sequence of vertical sliders.
* *`PROPORTION`* is used to see the proportions between numbers. For each float in values, a rectangle of width of value divided by the sum of all values will be show. It is best for visualizing values that sum to 1.
* *`target`* is the transform to which you want to attach information. If the transform is `null` the information will be attached to the global monitor.
* `key` is the name of the information you want to display.
* `value` is the information you want to display. *`value`* can have different
types:
* `string` - The Monitor will display the string next to the key. It can be
useful for displaying error messages.
* `float` - The Monitor will display a slider. Note that the values must be
between -1 and 1. If the value is positive, the slider will be green, if the
value is negative, the slider will be red.
* `float[]` - The Monitor Log call can take an additional argument called
`displayType` that can be either `INDEPENDENT` (default) or `PROPORTIONAL`:
* `INDEPENDENT` is used to display multiple independent floats as a
histogram. The histogram will be a sequence of vertical sliders.
* `PROPORTION` is used to see the proportions between numbers. For each
float in values, a rectangle of width of value divided by the sum of all
values will be show. It is best for visualizing values that sum to 1.
* `target` is the transform to which you want to attach information. If the
transform is `null` the information will be attached to the global monitor.
* **NB:** When adding a target transform that is not the global monitor, make
sure you have your main camera object tagged as `MainCamera` via the
inspector. This is needed to properly display the text onto the screen.

393
docs/Getting-Started-with-Balance-Ball.md


# Getting Started with the 3D Balance Ball Environment
This tutorial walks through the end-to-end process of opening a ML-Agents toolkit
example environment in Unity, building the Unity executable, training an agent
in it, and finally embedding the trained model into the Unity environment.
This tutorial walks through the end-to-end process of opening a ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
Agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example environments](Learning-Environment-Examples.md)
which you can examine to help understand the different ways in which the ML-Agents toolkit
can be used. These environments can also serve as templates for new
environments or as ways to test new ML algorithms. After reading this tutorial,
you should be able to explore and build the example environments.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help
understand the different ways in which the ML-Agents toolkit can be used. These
environments can also serve as templates for new environments or as ways to test
new ML algorithms. After reading this tutorial, you should be able to explore
and build the example environments.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains
a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent**
that receives a reward for every step that it balances the ball. An agent is
also penalized with a negative reward for dropping the ball. The goal of the
training process is to have the platforms learn to never drop the ball.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **Agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the platforms learn to never drop the ball.
In order to install and set up the ML-Agents toolkit, the Python dependencies and Unity,
see the [installation instructions](Installation.md).
In order to install and set up the ML-Agents toolkit, the Python dependencies
and Unity, see the [installation instructions](Installation.md).
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing
an Academy and one or more Brain and Agent objects, and, of course, the other
entities that an agent interacts with.
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing an
Academy and one or more Brain and Agent objects, and, of course, the other
entities that an agent interacts with.
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make
up a GameObject, select the GameObject in the Scene window, and open the
Inspector window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make up
a GameObject, select the GameObject in the Scene window, and open the Inspector
window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same Brain. 3D Balance Ball does this
The Academy object for the scene is placed on the Ball3DAcademy GameObject.
When you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the
**Training** and **Inference Configuration** properties set the graphics and
timescale properties for the Unity application. The Academy uses the
**Training Configuration** during training and the **Inference Configuration**
when not training. (*Inference* means that the agent is using a trained model
or heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the
**Training configuration** and a high graphics quality and the timescale to
`1.0` for the **Inference Configuration** .
The Academy object for the scene is placed on the Ball3DAcademy GameObject. When
you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the **Training**
and **Inference Configuration** properties set the graphics and timescale
properties for the Unity application. The Academy uses the **Training
Configuration** during training and the **Inference Configuration** when not
training. (*Inference* means that the Agent is using a trained model or
heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the **Training
configuration** and a high graphics quality and the timescale to `1.0` for the
**Inference Configuration** .
**Note:** if you want to observe the environment during training, you can
adjust the **Inference Configuration** settings to use a larger window and a
timescale closer to 1:1. Be sure to set these parameters back when training in
earnest; otherwise, training can take a very long time.
**Note:** if you want to observe the environment during training, you can adjust
the **Inference Configuration** settings to use a larger window and a timescale
closer to 1:1. Be sure to set these parameters back when training in earnest;
otherwise, training can take a very long time.
Another aspect of an environment to look at is the Academy implementation.
Since the base Academy class is abstract, you must always define a subclass.
There are three functions you can implement, though they are all optional:
Another aspect of an environment to look at is the Academy implementation. Since
the base Academy class is abstract, you must always define a subclass. There are
three functions you can implement, though they are all optional:
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
* Academy.AcademyStep() — Called at every simulation step before
agent.AgentAction() (and after the Agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
The 3D Balance Ball environment does not use these functions — each agent
resets itself when needed — but many environments do use these functions to
control the environment around the agents.
The 3D Balance Ball environment does not use these functions — each Agent resets
itself when needed — but many environments do use these functions to control the
environment around the Agents.
The Ball3DBrain GameObject in the scene, which contains a Brain component,
is a child of the Academy object. (All Brain objects in a scene must be
children of the Academy.) All the agents in the 3D Balance Ball environment
use the same Brain instance.
A Brain doesn't store any information about an agent,
it just routes the agent's collected observations to the decision making
process and returns the chosen action to the agent. Thus, all agents can share
the same brain, but act independently. The Brain settings tell you quite a bit
about how an agent works.
The Ball3DBrain GameObject in the scene, which contains a Brain component, is a
child of the Academy object. (All Brain objects in a scene must be children of
the Academy.) All the Agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an Agent, it just
routes the Agent's collected observations to the decision making process and
returns the chosen action to the Agent. Thus, all Agents can share the same
Brain, but act independently. The Brain settings tell you quite a bit about how
an Agent works.
The **Brain Type** determines how an agent makes its decisions. The
**External** and **Internal** types work together — use **External** when
training your agents; use **Internal** when using the trained model.
The **Heuristic** brain allows you to hand-code the agent's logic by extending
the Decision class. Finally, the **Player** brain lets you map keyboard
commands to actions, which can be useful when testing your agents and
environment. If none of these types of brains do what you need, you can
implement your own CoreBrain to create your own type.
The **Brain Type** determines how an Agent makes its decisions. The **External**
and **Internal** types work together — use **External** when training your
Agents; use **Internal** when using the trained model. The **Heuristic** Brain
allows you to hand-code the Agent's logic by extending the Decision class.
Finally, the **Player** Brain lets you map keyboard commands to actions, which
can be useful when testing your agents and environment. If none of these types
of Brains do what you need, you can implement your own CoreBrain to create your
own type.
In this tutorial, you will set the **Brain Type** to **External** for training;
In this tutorial, you will set the **Brain Type** to **External** for training;
**Vector Observation Space**
#### Vector Observation Space
Before making a decision, an agent collects its observation about its state
in the world. The ML-Agents toolkit classifies vector observations into two types:
**Continuous** and **Discrete**. The **Continuous** vector observation space
collects observations in a vector of floating point numbers. The **Discrete**
vector observation space is an index into a table of states. Most of the example
environments use a continuous vector observation space.
Before making a decision, an agent collects its observation about its state in
the world. The vector observation is a vector of floating point numbers which
contain relevant information for the agent to make decisions.
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the
feature vector containing the agent's observations contains eight elements:
the `x` and `z` components of the platform's rotation and the `x`, `y`, and `z`
components of the ball's relative position and velocity. (The observation
values are defined in the agent's `CollectObservations()` function.)
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the Agent's observations contains eight elements: the `x` and
`z` components of the platform's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the Agent's `CollectObservations()` function.)
**Vector Action Space**
#### Vector Action Space
An agent is given instructions from the brain in the form of *actions*. Like
states, ML-Agents toolkit classifies actions into two types: the **Continuous**
vector action space is a vector of numbers that can vary continuously. What
each element of the vector means is defined by the agent logic (the PPO
training process just learns what values are better given particular state
observations based on the rewards received when it tries different values).
For example, an element might represent a force or torque applied to a
`RigidBody` in the agent. The **Discrete** action vector space defines its
actions as a table. A specific action given to the agent is an index into
this table.
An Agent is given instructions from the Brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the Agent logic (the PPO training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `Rigidbody` in the Agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the Agent is an array of indices into tables.
space.
You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
space. You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
The Agent is the actor that observes and takes actions in the environment.
In the 3D Balance Ball environment, the Agent components are placed on the
twelve Platform GameObjects. The base Agent object has a few properties that
affect its behavior:
The Agent is the actor that observes and takes actions in the environment. In
the 3D Balance Ball environment, the Agent components are placed on the twelve
Platform GameObjects. The base Agent object has a few properties that affect its
behavior:
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
* **Visual Observations** — Defines any Camera objects used by the agent to
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an agent starts over when it is finished.
3D Balance Ball sets this true so that the agent restarts after reaching the
**Max Step** count or after dropping the ball.
* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
makes decisions. All the Agents in the 3D Balance Ball scene share the same
Brain.
* **Visual Observations** — Defines any Camera objects used by the Agent to
observe its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the Agent
decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an Agent starts over when it is finished.
3D Balance Ball sets this true so that the Agent restarts after reaching the
**Max Step** count or after dropping the ball.
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
Perhaps the more interesting aspect of an agents is the Agent subclass
implementation. When you create an Agent, you must extend the base Agent class.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the