浏览代码

Release v0.5 (Develop) (#1203)

/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
10d2a19d
共有 116 个文件被更改,包括 6214 次插入6394 次删除
  1. 41
      .gitignore
  2. 5
      CODE_OF_CONDUCT.md
  3. 60
      CONTRIBUTING.md
  4. 7
      Dockerfile
  5. 201
      LICENSE
  6. 104
      README.md
  7. 2
      config/curricula/push-block/PushBlockBrain.json
  8. 2
      config/curricula/test/TestBrain.json
  9. 2
      config/curricula/wall-jump/BigWallBrain.json
  10. 2
      config/curricula/wall-jump/SmallWallBrain.json
  11. 24
      config/trainer_config.yaml
  12. 2
      docs/API-Reference.md
  13. 2
      docs/Background-TensorFlow.md
  14. 129
      docs/Basic-Guide.md
  15. 33
      docs/FAQ.md
  16. 6
      docs/Feature-Memory.md
  17. 2
      docs/Feature-Monitor.md
  18. 96
      docs/Getting-Started-with-Balance-Ball.md
  19. 4
      docs/Glossary.md
  20. 15
      docs/Installation-Windows.md
  21. 37
      docs/Installation.md
  22. 4
      docs/Learning-Environment-Best-Practices.md
  23. 89
      docs/Learning-Environment-Create-New.md
  24. 6
      docs/Learning-Environment-Design-Academy.md
  25. 147
      docs/Learning-Environment-Design-Agents.md
  26. 48
      docs/Learning-Environment-Design-Brains.md
  27. 28
      docs/Learning-Environment-Design-External-Internal-Brains.md
  28. 14
      docs/Learning-Environment-Design-Heuristic-Brains.md
  29. 56
      docs/Learning-Environment-Design-Player-Brains.md
  30. 90
      docs/Learning-Environment-Design.md
  31. 80
      docs/Learning-Environment-Examples.md
  32. 116
      docs/Learning-Environment-Executable.md
  33. 4
      docs/Limitations.md
  34. 24
      docs/ML-Agents-Overview.md
  35. 64
      docs/Migrating.md
  36. 4
      docs/Readme.md
  37. 29
      docs/Training-Curriculum-Learning.md
  38. 24
      docs/Training-Imitation-Learning.md
  39. 64
      docs/Training-ML-Agents.md
  40. 2
      docs/Training-PPO.md
  41. 12
      docs/Training-on-Amazon-Web-Service.md
  42. 17
      docs/Training-on-Microsoft-Azure.md
  43. 18
      docs/Using-Docker.md
  44. 18
      docs/Using-TensorFlow-Sharp-in-Unity.md
  45. 8
      docs/Using-Tensorboard.md
  46. 8
      docs/dox-ml-agents.conf
  47. 611
      docs/images/banner.png
  48. 129
      docs/images/player_brain.png
  49. 79
      docs/images/scene-hierarchy.png
  50. 309
      docs/images/unity-logo-rgb.png
  51. 26
      docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
  52. 4
      docs/localized/zh-CN/docs/Installation.md
  53. 4
      docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
  54. 14
      docs/localized/zh-CN/docs/Learning-Environment-Design.md
  55. 42
      docs/localized/zh-CN/docs/Learning-Environment-Examples.md
  56. 2
      docs/localized/zh-CN/docs/ML-Agents-Overview.md
  57. 188
      ml-agents/README.md
  58. 2
      ml-agents/mlagents/envs/environment.py
  59. 2
      ml-agents/mlagents/envs/exception.py
  60. 4
      ml-agents/mlagents/trainers/bc/models.py
  61. 2
      ml-agents/mlagents/trainers/bc/policy.py
  62. 8
      ml-agents/mlagents/trainers/bc/trainer.py
  63. 45
      ml-agents/mlagents/trainers/curriculum.py
  64. 64
      ml-agents/mlagents/trainers/meta_curriculum.py
  65. 38
      ml-agents/mlagents/trainers/models.py
  66. 4
      ml-agents/mlagents/trainers/policy.py
  67. 2
      ml-agents/mlagents/trainers/ppo/models.py
  68. 2
      ml-agents/mlagents/trainers/ppo/policy.py
  69. 15
      ml-agents/mlagents/trainers/ppo/trainer.py
  70. 2
      ml-agents/mlagents/trainers/trainer.py
  71. 134
      ml-agents/mlagents/trainers/trainer_controller.py
  72. 2
      ml-agents/setup.py
  73. 2
      ml-agents/tests/mock_communicator.py
  74. 14
      ml-agents/tests/trainers/test_curriculum.py
  75. 28
      ml-agents/tests/trainers/test_meta_curriculum.py
  76. 2
      ml-agents/tests/trainers/test_trainer_controller.py
  77. 2
      protobuf-definitions/make.bat
  78. 12
      UnitySDK/Assets/ML-Agents/Scripts/CoreBrainInternal.cs.meta
  79. 4
      UnitySDK/Assets/ML-Agents/Scripts/Academy.cs
  80. 2
      UnitySDK/Assets/ML-Agents/Scripts/CoreBrainInternal.cs
  81. 999
      UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/WallJump.bytes
  82. 319
      UnitySDK/Assets/ML-Agents/Examples/WallJump/Scenes/WallJump.unity
  83. 246
      UnitySDK/Assets/ML-Agents/Examples/Soccer/Scenes/SoccerTwos.unity
  84. 999
      UnitySDK/Assets/ML-Agents/Examples/Soccer/TFModels/SoccerTwos.bytes
  85. 999
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/TFModels/Pyramids.bytes
  86. 257
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/PushBlock.unity
  87. 999
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/TFModels/PushBlock.bytes
  88. 2
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/TFModels/PushBlock.bytes.meta
  89. 224
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/Hallway.unity
  90. 443
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/HallwayIL.unity
  91. 972
      UnitySDK/Assets/ML-Agents/Examples/Hallway/TFModels/Hallway.bytes
  92. 4
      UnitySDK/Assets/ML-Agents/Examples/Hallway/TFModels/Hallway.bytes.meta
  93. 999
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/TFModels/GridWorld_3x3.bytes
  94. 998
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/TFModels/GridWorld_5x5.bytes
  95. 154
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scenes/GridWorld.unity
  96. 86
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs
  97. 102
      UnitySDK/Assets/ML-Agents/Examples/Basic/Scenes/Basic.unity
  98. 175
      UnitySDK/Assets/ML-Agents/Examples/Basic/TFModels/Basic.bytes
  99. 2
      UnitySDK/Assets/ML-Agents/Examples/Basic/TFModels/Basic.bytes.meta
  100. 79
      UnitySDK/ProjectSettings/ProjectSettings.asset

41
.gitignore


/MLAgentsSDK/[Ll]ibrary/
/MLAgentsSDK/[Tt]emp/
/MLAgentsSDK/[Oo]bj/
/MLAgentsSDK/[Bb]uild/
/MLAgentsSDK/[Bb]uilds/
/MLAgentsSDK/[Pp]ackages/
/MLAgentsSDK/[Uu]nity[Pp]ackage[Mm]anager/
/MLAgentsSDK/Assets/AssetStoreTools*
/MLAgentsSDK/Assets/Plugins*
/MLAgentsSDK/Assets/Gizmos*
/UnitySDK/[Ll]ibrary/
/UnitySDK/[Tt]emp/
/UnitySDK/[Oo]bj/
/UnitySDK/[Bb]uild/
/UnitySDK/[Bb]uilds/
/UnitySDK/[Pp]ackages/
/UnitySDK/[Uu]nity[Pp]ackage[Mm]anager/
/UnitySDK/Assets/AssetStoreTools*
/UnitySDK/Assets/Plugins*
/UnitySDK/Assets/Gizmos*
# Training environments
/envs
*MLAgentsSDK.log
*UnitySDK.log
/MLAgentsSDK/.vs/
/UnitySDK/.vs/
/MLAgentsSDKExportedObj/
/MLAgentsSDK.consulo/
/UnitySDKExportedObj/
/UnitySDK.consulo/
*.csproj
*.unityproj
*.sln

*.pidb.meta
# Unity3D Generated File On Crash Reports
/MLAgentsSDK/sysinfo.txt
/UnitySDK/sysinfo.txt
# Builds
*.apk

*.x86
# Tensorflow Sharp Files
/MLAgentsSDK/Assets/ML-Agents/Plugins/Android*
/MLAgentsSDK/Assets/ML-Agents/Plugins/iOS*
/MLAgentsSDK/Assets/ML-Agents/Plugins/Computer*
/MLAgentsSDK/Assets/ML-Agents/Plugins/System*
/UnitySDK/Assets/ML-Agents/Plugins/Android*
/UnitySDK/Assets/ML-Agents/Plugins/iOS*
/UnitySDK/Assets/ML-Agents/Plugins/Computer*
/UnitySDK/Assets/ML-Agents/Plugins/System*
# Generated doc folders
/docs/html

5
CODE_OF_CONDUCT.md


## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct/
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 1.4, available at
https://www.contributor-covenant.org/version/1/4/code-of-conduct/
[homepage]: https://www.contributor-covenant.org

60
CONTRIBUTING.md


# Contribution Guidelines
Thank you for your interest in contributing to the ML-Agents toolkit! We are incredibly
excited to see how members of our community will use and extend the ML-Agents toolkit.
To facilitate your contributions, we've outlined a brief set of guidelines
to ensure that your extensions can be easily integrated.
Thank you for your interest in contributing to the ML-Agents toolkit! We are
incredibly excited to see how members of our community will use and extend the
ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set
of guidelines to ensure that your extensions can be easily integrated.
### Communication
## Communication
First, please read through our [code of conduct](CODE_OF_CONDUCT.md),
as we expect all our contributors to follow it.
First, please read through our [code of conduct](CODE_OF_CONDUCT.md), as we
expect all our contributors to follow it.
Second, before starting on a project that you intend to contribute
to the ML-Agents toolkit (whether environments or modifications to the codebase),
we **strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) and
briefly outlining the changes you plan to make. This will enable us to provide
some context that may be helpful for you. This could range from advice and
feedback on how to optimally perform your changes or reasons for not doing it.
Second, before starting on a project that you intend to contribute to the
ML-Agents toolkit (whether environments or modifications to the codebase), we
**strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues)
and briefly outlining the changes you plan to make. This will enable us to
provide some context that may be helpful for you. This could range from advice
and feedback on how to optimally perform your changes or reasons for not doing
it.
### Git Branches
## Git Branches
Starting with v0.3, we adopted the
Starting with v0.3, we adopted the
Consequently, the `master` branch corresponds to the latest release of
Consequently, the `master` branch corresponds to the latest release of
* Corresponding changes to documentation, unit tests and sample environments
(if applicable)
* Corresponding changes to documentation, unit tests and sample environments (if
applicable)
### Environments
## Environments
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
PR explaining the nature of the environment and task.
PR explaining the nature of the environment and task.
### Style Guide
## Style Guide
When performing changes to the codebase, ensure that you follow the style
guide of the file you're modifying. For Python, we follow
[PEP 8](https://www.python.org/dev/peps/pep-0008/). For C#, we will soon be
adding a formal style guide for our repository.
When performing changes to the codebase, ensure that you follow the style guide
of the file you're modifying. For Python, we follow
[PEP 8](https://www.python.org/dev/peps/pep-0008/).
For C#, we will soon be adding a formal style guide for our repository.

7
Dockerfile


# xvfb is used to do CPU based rendering of Unity
RUN apt-get install -y xvfb
COPY ml-agents/requirements.txt .
RUN pip install --trusted-host pypi.python.org -r requirements.txt
COPY README.md .
COPY ml-agents /ml-agents
WORKDIR /ml-agents
RUN pip install .

ENTRYPOINT ["python", "mlagents/learn.py"]
ENTRYPOINT ["mlagents-learn"]

201
LICENSE


Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2017 Unity Technologies
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

104
README.md


# Unity ML-Agents Toolkit (Beta)
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source Unity plugin
that enables games and simulations to serve as environments for training
intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through
a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers
and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
These trained agents can be used for multiple purposes, including
controlling NPC behavior (in a variety of settings such as multi-agent and
adversarial), automated testing of game builds and evaluating different game
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
developers and AI researchers as it provides a central platform where advances
in AI can be evaluated on Unity’s rich environments and then made accessible
to the wider research and game developer communities.
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source
Unity plugin that enables games and simulations to serve as environments for
training intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through a
simple-to-use Python API. We also provide implementations (based on TensorFlow)
of state-of-the-art algorithms to enable game developers and hobbyists to easily
train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be
used for multiple purposes, including controlling NPC behavior (in a variety of
settings such as multi-agent and adversarial), automated testing of game builds
and evaluating different game design decisions pre-release. The ML-Agents
toolkit is mutually beneficial for both game developers and AI researchers as it
provides a central platform where advances in AI can be evaluated on Unity’s
rich environments and then made accessible to the wider research and game
developer communities.
* Train memory-enhanced Agents using deep reinforcement learning
* Train memory-enhanced agents using deep reinforcement learning
* Broadcasting of Agent behavior for supervised learning
* Broadcasting of agent behavior for supervised learning
* Flexible Agent control with On Demand Decision Making
* Flexible agent control with On Demand Decision Making
* Wrap learning environments as a gym
* For more information, in addition to installation and usage
instructions, see our [documentation home](docs/Readme.md).
* If you have
used a version of the ML-Agents toolkit prior to v0.4, we strongly recommend
our [guide on migrating from earlier versions](docs/Migrating.md).
* For more information, in addition to installation and usage instructions, see
our [documentation home](docs/Readme.md).
* If you are a researcher interested in a discussion of Unity as an AI platform, see a pre-print of our [reference paper on Unity and the ML-Agents Toolkit](https://arxiv.org/abs/1809.02627). Also, see below for instructions on citing this paper.
* If you have used a version of the ML-Agents toolkit prior to v0.5, we strongly
recommend our [guide on migrating from earlier versions](docs/Migrating.md).
## References
## Additional Resources
- Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
- [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
- [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/) announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
- [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
* Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and
[Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
* [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
* [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/)
announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
* [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
- [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
- [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
- [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
* [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
* [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
* [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
The ML-Agents toolkit is an open-source project and we encourage and welcome contributions.
If you wish to contribute, be sure to review our
[contribution guidelines](CONTRIBUTING.md) and
The ML-Agents toolkit is an open-source project and we encourage and welcome
contributions. If you wish to contribute, be sure to review our
[contribution guidelines](CONTRIBUTING.md) and
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using the ML-Agents toolkit and Unity developers enthusiastic
about machine learning. We use that channel to surface updates
regarding the ML-Agents toolkit (and, more broadly, machine learning in games).
* If you run into any problems using the ML-Agents toolkit,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using the ML-Agents toolkit and Unity developers
enthusiastic about machine learning. We use that channel to surface updates
regarding the ML-Agents toolkit (and, more broadly, machine learning in
games).
* If you run into any problems using the ML-Agents toolkit,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
For any other questions or feedback, connect directly with the ML-Agents
team at ml-agents@unity3d.com.

translating more pages and to other languages. Consequently,
we welcome any enhancements and improvements from the community.
- [Chinese](docs/localized/zh-CN/)
* [Chinese](docs/localized/zh-CN/)
## Citation
If you use Unity or the ML-Agents Toolkit to conduct research, we ask that you cite the following paper as a reference:
Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., Lange, D. (2018). Unity: A General Platform for Intelligent Agents. *arXiv preprint arXiv:1809.02627.* https://github.com/Unity-Technologies/ml-agents.

2
config/curricula/push-block/PushBlockBrain.json


{
"measure" : "reward",
"thresholds" : [0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75],
"min_lesson_length" : 2,
"min_lesson_length" : 100,
"signal_smoothing" : true,
"parameters" :
{

2
config/curricula/test/TestBrain.json


{
"measure" : "reward",
"thresholds" : [10, 20, 50],
"min_lesson_length" : 3,
"min_lesson_length" : 100,
"signal_smoothing" : true,
"parameters" :
{

2
config/curricula/wall-jump/BigWallBrain.json


{
"measure" : "progress",
"thresholds" : [0.1, 0.3, 0.5],
"min_lesson_length" : 2,
"min_lesson_length": 100,
"signal_smoothing" : true,
"parameters" :
{

2
config/curricula/wall-jump/SmallWallBrain.json


{
"measure" : "progress",
"thresholds" : [0.1, 0.3, 0.5],
"min_lesson_length" : 2,
"min_lesson_length": 100,
"signal_smoothing" : true,
"parameters" :
{

24
config/trainer_config.yaml


num_layers: 2
SmallWallBrain:
max_steps: 2.0e5
max_steps: 1.0e6
batch_size: 128
buffer_size: 2048
beta: 5.0e-3

normalize: false
BigWallBrain:
max_steps: 2.0e5
max_steps: 1.0e6
batch_size: 128
buffer_size: 2048
beta: 5.0e-3

normalize: false
StrikerBrain:
max_steps: 1.0e5
max_steps: 5.0e5
learning_rate: 1e-3
buffer_size: 2048
beta: 5.0e-3
num_epoch: 3
buffer_size: 2000
beta: 1.0e-2
hidden_units: 256
summary_freq: 2000
time_horizon: 128

GoalieBrain:
max_steps: 1.0e5
batch_size: 128
buffer_size: 2048
beta: 5.0e-3
max_steps: 5.0e5
learning_rate: 1e-3
batch_size: 320
num_epoch: 3
buffer_size: 2000
beta: 1.0e-2
hidden_units: 256
summary_freq: 2000
time_horizon: 128

hidden_units: 512
num_layers: 2
beta: 1.0e-2
max_steps: 2.0e5
max_steps: 5.0e5
num_epoch: 3
VisualPyramidBrain:

2
docs/API-Reference.md


# API Reference
Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
documented to be compatabile with
documented to be compatible with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
documentation.

2
docs/Background-TensorFlow.md


performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
train the behavior of an Agent, the output is a TensorFlow model (.bytes) file
train the behavior of an agent, the output is a TensorFlow model (.bytes) file
that you can then embed within an Internal Brain. Unless you implement a new
algorithm, the use of TensorFlow is mostly abstracted away and behind the
scenes.

129
docs/Basic-Guide.md


# Basic Guide
This guide will show you how to use a pretrained model in an example Unity
This guide will show you how to use a pre-trained model in an example Unity
environment, and show you how to train the model yourself.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we

In order to use the ML-Agents toolkit within Unity, you need to change some
Unity settings first. Also [TensorFlowSharp
plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
is needed for you to use pretrained model within Unity, which is based on the
plugin](https://s3.amazonaws.com/unity-ml-agents/0.5/TFSharpPlugin.unitypackage)
is needed for you to use pre-trained model within Unity, which is based on the
3. Using the file dialog that opens, locate the `MLAgentsSDK` folder
3. Using the file dialog that opens, locate the `UnitySDK` folder
within the the ML-Agents toolkit project and click **Open**.
4. Go to **Edit** > **Project Settings** > **Player**
5. For **each** of the platforms you target (**PC, Mac and Linux Standalone**,

![Project Settings](images/project-settings.png)
[Download](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
[Download](https://s3.amazonaws.com/unity-ml-agents/0.5/TFSharpPlugin.unitypackage)
the TensorFlowSharp plugin. Then import it into Unity by double clicking the
downloaded file. You can check if it was successfully imported by checking the
TensorFlow files in the Project window under **Assets** > **ML-Agents** >

`MLAgentsSDK/Assets/ML-Agents` folder under **Assets** within Project window.
`UnitySDK/Assets/ML-Agents` folder under **Assets** within Project window.
![Imported TensorFlowsharp](images/imported-tensorflowsharp.png)

`None` if you want to interact with the current scene in the Unity Editor.
More information and documentation is provided in the
[Python API](../ml-agents/README.md) page.
[Python API](Python-API.md) page.
## Training the Brain with Reinforcement Learning

the brain used by the agents to **External**. This allows the agents to
the Brain used by the Agents to **External**. This allows the Agents to
communicate with the external training process when making their decisions.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy

### Training the environment
1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
**Note**: If you followed the default [installation](Installation.md), then
you should be able to run `mlagents-learn` from any directory.
Where:
where:
trainer configuration. The defaults used by environments in the ML-Agents
SDK can be found in `config/trainer_config.yaml`.
trainer configuration. The defaults used by example environments included
in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
- And the `--train` tells `mlagents-learn` to run a training session (rather
- `--train` tells `mlagents-learn` to run a training session (rather
4. When the message _"Start training by pressing the Play button in the Unity
4. If you cloned the ML-Agents repo, then you can simply run
```sh
mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
```
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.

use an executable.
![Training command example](images/training-command-example.png)
```console
ml-agents$ mlagents-learn config/trainer_config.yaml --run-id=first-run --train
▄▄▄▓▓▓▓
╓▓▓▓▓▓▓█▓▓▓▓▓
,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌
▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄
▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌
▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌
▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓
^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓`
'▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌
▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀
`▀█▓▓▓▓▓▓▓▓▓▌
¬`▀▀▀█▓
INFO:mlagents.learn:{'--curriculum': 'None',
'--docker-target-name': 'Empty',
'--env': 'None',
'--help': False,
'--keep-checkpoints': '5',
'--lesson': '0',
'--load': False,
'--no-graphics': False,
'--num-runs': '1',
'--run-id': 'first-run',
'--save-freq': '50000',
'--seed': '-1',
'--slow': False,
'--train': True,
'--worker-id': '0',
'<trainer-config-path>': 'config/trainer_config.yaml'}
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
```
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.

![Training running](images/training-running.png)
```console
INFO:mlagents.envs:
'Ball3DAcademy' started successfully!
Unity Academy name: Ball3DAcademy
Number of Brains: 1
Number of External Brains : 1
Reset Parameters :
Unity brain name: Ball3DBrain
Number of Visual Observations (per agent): 0
Vector Observation space size (per agent): 8
Number of stacked Vector Observation: 1
Vector Action space type: continuous
Vector Action space size (per agent): [2]
Vector Action descriptions: ,
INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain Ball3DBrain:
batch_size: 64
beta: 0.001
buffer_size: 12000
epsilon: 0.2
gamma: 0.995
hidden_units: 128
lambd: 0.99
learning_rate: 0.0003
max_steps: 5.0e4
normalize: True
num_epoch: 3
num_layers: 2
time_horizon: 1000
sequence_length: 64
summary_freq: 1000
use_recurrent: False
graph_scope:
summary_path: ./summaries/first-run-0
memory_size: 256
use_curiosity: False
curiosity_strength: 0.01
curiosity_enc_size: 128
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training.
INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training.
```
### After training

This file corresponds to your model's latest checkpoint. You can now embed this
trained model into your internal brain by following the steps below, which is
trained model into your Internal Brain by following the steps below, which is
`MLAgentsSDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
`UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
2. Open the Unity Editor, and select the **3DBall** scene as described above.
3. Select the **Ball3DBrain** object from the Scene hierarchy.
4. Change the **Type of Brain** to **Internal**.

page.
- For a more detailed walk-through of our 3D Balance Ball environment, check out
the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
- For a "Hello World" introduction to creating your own learning environment,
- For a "Hello World" introduction to creating your own Learning Environment,
check out the [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.
- For a series of Youtube video tutorials, checkout the

33
docs/FAQ.md


## TensorFlowSharp flag not turned on
If you have already imported the TensorFlowSharp plugin, but havn't set
If you have already imported the TensorFlowSharp plugin, but haven't set
You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
You need to install and enable the TensorFlowSharp plugin in order to use the Internal Brain.
```
This error message occurs because the TensorFlowSharp plugin won't be usage

## Instance of CoreBrainInternal couldn't be created
If you try to use ML-Agents in Unity versions 2017.1 - 2017.3, you might
encounter an error that looks like this:
```console
Instance of CoreBrainInternal couldn't be created. The the script
class needs to derive from ScriptableObject.
UnityEngine.ScriptableObject:CreateInstance(String)
```
You can fix the error by removing `CoreBrain` from CoreBrainInternal.cs:16,
clicking on your Brain Gameobject to let the scene recompile all the changed
C# scripts, then adding the `CoreBrain` back. Make sure your brain is in
Internal mode, your TensorFlowSharp plugin is imported and the
ENABLE_TENSORFLOW flag is set. This fix is only valid locally and unstable.
If you have a graph placeholder set in the internal Brain inspector that is not
If you have a graph placeholder set in the Internal Brain inspector that is not
UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
UnityAgentsException: One of the TensorFlow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
Similarly, if you have a graph scope set in the internal Brain inspector that is
Similarly, if you have a graph scope set in the Internal Brain inspector that is
not correctly set, you will see some error like this:
```console

Solution: Make sure your Graph Scope field matches the corresponding brain
object name in your Hierachy Inspector when there is multiple brain.
Solution: Make sure your Graph Scope field matches the corresponding Brain
object name in your Hierarchy Inspector when there are multiple Brains.
## Environment Permission Error

## Mean reward : nan
If you receive a message `Mean reward : nan` when attempting to train a model
using PPO, this is due to the episodes of the learning environment not
using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for either the Academy or
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts

6
docs/Feature-Memory.md


# Memory-enhanced Agents using Recurrent Neural Networks
# Memory-enhanced agents using Recurrent Neural Networks
## What are memories for
## What are memories used for?
were looking for? Don't let that happen to your agents.
were looking for? Don't let that happen to your agents.
It is now possible to give memories to your agents. When training, the agents
will be able to store a vector of floats to be used next time they need to make

2
docs/Feature-Monitor.md


You can track many different things both related and unrelated to the agents
themselves. By default, the Monitor is only active in the *inference* phase, so
not during training. To change this behaviour, you can activate or deactivate it
not during training. To change this behavior, you can activate or deactivate it
by calling `SetActive(boolean)`. For example to also show the monitor during
training, you can call it in the `InitializeAcademy()` method of your `Academy`:

96
docs/Getting-Started-with-Balance-Ball.md


This tutorial walks through the end-to-end process of opening a ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
agent in it, and finally embedding the trained model into the Unity environment.
Agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help

This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent** that
horizontally or vertically. In this environment, a platform is an **Agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the platforms learn to never drop the ball.

An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing an
Academy and one or more Brain and Agent objects, and, of course, the other
entities that an agent interacts with.
entities that an agent interacts with.
![Unity Editor](images/mlagents-3DBallHierarchy.png)

The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. 3D Balance Ball does this
independent agent, but they all share the same Brain. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
### Academy

and **Inference Configuration** properties set the graphics and timescale
properties for the Unity application. The Academy uses the **Training
Configuration** during training and the **Inference Configuration** when not
training. (*Inference* means that the agent is using a trained model or
training. (*Inference* means that the Agent is using a trained model or
heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the **Training
configuration** and a high graphics quality and the timescale to `1.0` for the

* Academy.InitializeAcademy() — Called once when the environment is launched.
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentAction() (and after the agents collect their observations).
agent.AgentAction() (and after the Agents collect their observations).
The 3D Balance Ball environment does not use these functions — each agent resets
The 3D Balance Ball environment does not use these functions — each Agent resets
environment around the agents.
environment around the Agents.
the Academy.) All the agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an agent, it just
routes the agent's collected observations to the decision making process and
returns the chosen action to the agent. Thus, all agents can share the same
brain, but act independently. The Brain settings tell you quite a bit about how
an agent works.
the Academy.) All the Agents in the 3D Balance Ball environment use the same
Brain instance. A Brain doesn't store any information about an Agent, it just
routes the Agent's collected observations to the decision making process and
returns the chosen action to the Agent. Thus, all Agents can share the same
Brain, but act independently. The Brain settings tell you quite a bit about how
an Agent works.
The **Brain Type** determines how an agent makes its decisions. The **External**
The **Brain Type** determines how an Agent makes its decisions. The **External**
agents; use **Internal** when using the trained model. The **Heuristic** brain
allows you to hand-code the agent's logic by extending the Decision class.
Finally, the **Player** brain lets you map keyboard commands to actions, which
Agents; use **Internal** when using the trained model. The **Heuristic** Brain
allows you to hand-code the Agent's logic by extending the Decision class.
Finally, the **Player** Brain lets you map keyboard commands to actions, which
of brains do what you need, you can implement your own CoreBrain to create your
of Brains do what you need, you can implement your own CoreBrain to create your
own type.
In this tutorial, you will set the **Brain Type** to **External** for training;

The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the agent's observations contains eight elements: the `x` and
vector containing the Agent's observations contains eight elements: the `x` and
defined in the agent's `CollectObservations()` function.)
defined in the Agent's `CollectObservations()` function.)
An agent is given instructions from the brain in the form of *actions*.
An Agent is given instructions from the Brain in the form of *actions*.
element of the vector means is defined by the agent logic (the PPO training
element of the vector means is defined by the Agent logic (the PPO training
element might represent a force or torque applied to a `RigidBody` in the agent.
element might represent a force or torque applied to a `Rigidbody` in the Agent.
given to the agent is an array of indeces into tables.
given to the Agent is an array of indices into tables.
The 3D Balance Ball example is programmed to use both types of vector action
space. You can try training with both settings to observe whether there is a

Platform GameObjects. The base Agent object has a few properties that affect its
behavior:
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
* **Visual Observations** — Defines any Camera objects used by the agent to
* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
makes decisions. All the Agents in the 3D Balance Ball scene share the same
Brain.
* **Visual Observations** — Defines any Camera objects used by the Agent to
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an agent starts over when it is finished.
3D Balance Ball sets this true so that the agent restarts after reaching the
* **Max Step** — Defines how many simulation steps can occur before the Agent
decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an Agent starts over when it is finished.
3D Balance Ball sets this true so that the Agent restarts after reaching the
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
Perhaps the more interesting aspect of an agents is the Agent subclass
implementation. When you create an Agent, you must extend the base Agent class.
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
* agent.AgentReset() — Called when the Agent resets, including at the beginning
* Agent.CollectObservations() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous vector observation
* agent.CollectObservations() — Called every simulation step. Responsible for
collecting the Agent's observations of the environment. Since the Brain
instance assigned to the Agent is set to the continuous vector observation
* Agent.AgentAction() — Called every simulation step. Receives the action chosen
by the brain. The Ball3DAgent example handles both the continuous and the
* agent.AgentAction() — Called every simulation step. Receives the action chosen
by the Brain. The Ball3DAgent example handles both the continuous and the
assigns a reward to the agent; in this example, an agent receives a small
assigns a reward to the Agent; in this example, an Agent receives a small
negative reward for dropping the ball. An agent is also marked as done when it
negative reward for dropping the ball. An Agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.

explaining it.
To train the agents within the Ball Balance environment, we will be using the
python package. We have provided a convenient script called `mlagents-learn`
Python package. We have provided a convenient script called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
We can use `run_id` to identify the experiment and create a folder where the

The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: You can train using an executable rather than the Editor. To do so,
follow the intructions in [Using an
Execuatble](Learning-Environment-Executable.md).
follow the intructions in
[Using an Executable](Learning-Environment-Executable.md).
### Observing Training Progress

Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type. **Note:** Do not just
use it with Agents having an **Internal** Brain type. **Note:** Do not just
close the Unity Window once the `Saved Model` message appears. Either wait for
the training process to close the window or press Ctrl+C at the command-line
prompt. If you simply close the window manually, the .bytes file containing the

To embed the trained model into Unity, follow the later part of [Training the
Brain with Reinforcement
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
of the Basic Buides page.
of the Basic Guide page.

4
docs/Glossary.md


logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given external
brain. Contains TensorFlow graph which makes decisions for external brain.
* **Trainer** - Python class which is responsible for training a given External
Brain. Contains TensorFlow graph which makes decisions for External Brain.

15
docs/Installation-Windows.md


Next, install `tensorflow`. Install this package using `pip` - which is a
package management system used to install Python packages. Latest versions of
Tensorflow won't work, so you will need to make sure that you install version
TensorFlow won't work, so you will need to make sure that you install version
1.7.1. In the same Anaconda Prompt, type in the following command _(make sure
you are connected to the internet)_:

</p>
Once you've signed up, go back to the cuDNN
[downloads page](https://developer.nvidia.com/cudnn). You may or may not be asked to fill
out a short survey. When you get to the list cuDNN releases, __make sure you are
downloading the right version for the CUDA toolkit you installed in Step 1.__
In this guide, we are using version 7.0.5 for CUDA toolkit version 9.0 ([direct
link](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/cudnn-9.0-windows10-x64-v7)).
[downloads page](https://developer.nvidia.com/cudnn).
You may or may not be asked to fill out a short survey. When you get to the list
cuDNN releases, __make sure you are downloading the right version for the CUDA
toolkit you installed in Step 1.__ In this guide, we are using version 7.0.5 for
CUDA toolkit version 9.0
([direct link](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/cudnn-9.0-windows10-x64-v7)).
After you have downloaded the cuDNN files, you will need to extract the files
into the CUDA toolkit directory. In the cuDNN zip file, there are three folders

Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an
Anaconda Prompt with the Conda environment ml-agents activated, type in the
following command to uninstall the tensorflow for cpu and install the tensorflow
following command to uninstall TensorFlow for cpu and install TensorFlow
for gpu _(make sure you are connected to the internet)_:
```sh

37
docs/Installation.md


install Python with additional dependencies. Each of the subsections below
overviews each step, in addition to a Docker set-up.
## Install **Unity 2017.1** or Later
## Install **Unity 2017.4** or Later
[Download](https://store.unity.com/download) and install Unity. If you would
like to use our Docker set-up (introduced later), make sure to select the _Linux

width="500" border="10" />
</p>
## Clone the Ml-Agents Repository
## Clone the ML-Agents Toolkit Repository
git clone https://github.com/Unity-Technologies/ml-agents.git
```sh
git clone https://github.com/Unity-Technologies/ml-agents.git
```
The `UnitySDK` subdirectory contains the Unity Assets to add to your projects.
It also contains many [example environments](Learning-Environment-Examples.md)
that can be used to help get you familiar with Unity.
The `ml-agents` subdirectory contains Python packages which provide
trainers and a Python API to interface with Unity.
The `MLAgentsSDK` directory in this repository contains the Unity Assets to add
to your projects. The `python` directory contains python packages which provide
trainers, a python API to interface with Unity, and a package to interface with
OpenAI Gym.
The `gym-unity` subdirectory contains a package to interface with OpenAI Gym.
## Install Python (with Dependencies)
## Install Python and mlagents Package
In order to use ML-Agents toolkit, you need Python 3.6 along with the
dependencies listed in the [requirements file](../ml-agents/requirements.txt).

### Mac and Unix Users
[Download](https://www.python.org/downloads/) and install Python 3 if you do not
[Download](https://www.python.org/downloads/) and install Python 3.6 if you do not
If your Python environment doesn't include `pip`, see these
If your Python environment doesn't include `pip3`, see these
To install dependencies, enter the `ml-agents/` directory and run from
the command line:
To install the dependencies and `mlagents` Python package, enter the
`ml-agents/` subdirectory and run from the command line:
```sh
pip3 install .
```
pip install .
If you installed this correctly, you should be able to run
`mlagents-learn --help`
## Docker-based Installation

4
docs/Learning-Environment-Best-Practices.md


([learn more here](Training-Curriculum-Learning.md)).
* When possible, it is often helpful to ensure that you can complete the task by
using a Player Brain to control the agent.
* It is often helpful to make many copies of the agent, and attach the brain to
be trained to all of these agents. In this way the brain can get more feedback
* It is often helpful to make many copies of the agent, and attach the Brain to
be trained to all of these agents. In this way the Brain can get more feedback
information from all of these agents, which helps it train faster.
## Rewards

89
docs/Learning-Environment-Create-New.md


This tutorial walks through the process of creating a Unity Environment. A Unity
Environment is an application built using the Unity Engine which can be used to
train Reinforcement Learning agents.
train Reinforcement Learning Agents.
![A simple ML-Agents environment](images/mlagents-NewTutSplash.png)

methods to update the scene independently of any agents. For example, you can
add, move, or delete agents and other entities in the environment.
3. Add one or more Brain objects to the scene as children of the Academy.
4. Implement your Agent subclasses. An Agent subclass defines the code an agent
4. Implement your Agent subclasses. An Agent subclass defines the code an Agent
optional methods to reset the agent when it has finished or failed its task.
optional methods to reset the Agent when it has finished or failed its task.
in the scene that represents the agent in the simulation. Each Agent object
in the scene that represents the Agent in the simulation. Each Agent object
[run the training process](Training-ML-Agents.md).
[run the training process](Training-ML-Agents.md).
**Note:** If you are unfamiliar with Unity, refer to
[Learning the interface](https://docs.unity3d.com/Manual/LearningtheInterface.html)

1. Launch the Unity Editor and create a new project named "RollerBall".
2. In a file system window, navigate to the folder containing your cloned
ML-Agents repository.
3. Drag the `ML-Agents` folder from `MLAgentsSDK/Assets` to the Unity Editor
3. Drag the `ML-Agents` folder from `UnitySDK/Assets` to the Unity Editor
Project window.
Your Unity **Project** window should contain the following assets:

Next, we will create a very simple scene to act as our ML-Agents environment.
The "physical" components of the environment include a Plane to act as the floor
for the agent to move around on, a Cube to act as the goal or target for the
agent to seek, and a Sphere to represent the agent itself.
for the Agent to move around on, a Cube to act as the goal or target for the
agent to seek, and a Sphere to represent the Agent itself.
### Create the floor plane

leave it alone for now.
So far, these are the basic steps that you would use to add ML-Agents to any
Unity project. Next, we will add the logic that will let our agent learn to roll
Unity project. Next, we will add the logic that will let our Agent learn to roll
to the cube using reinforcement learning.
In this simple scenario, we don't use the Academy object to control the

### Initialization and Resetting the Agent
When the agent reaches its target, it marks itself done and its agent reset
function moves the target to a random location. In addition, if the agent rolls
When the Agent reaches its target, it marks itself done and its Agent reset
function moves the target to a random location. In addition, if the Agent rolls
off the platform, the reset function puts it back onto the floor.
To move the target GameObject, we need a reference to its Transform (which

allowing you to choose which GameObject to use as the target in the Unity
Editor. To reset the agent's velocity (and later to apply force to move the
Editor. To reset the Agent's velocity (and later to apply force to move the
agent) we need a reference to the Rigidbody component. A
[Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) is Unity's
primary element for physics simulation. (See

public override void AgentReset()
{
if (this.transform.position.y < -1.0)
{
// The agent fell
{
// The Agent fell
this.transform.position = Vector3.zero;
this.rBody.angularVelocity = Vector3.zero;
this.rBody.velocity = Vector3.zero;

### Observing the Environment
The Agent sends the information we collect to the Brain, which uses it to make a
decision. When you train the agent (or use a trained model), the data is fed
into a neural network as a feature vector. For an agent to successfully learn a
decision. When you train the Agent (or use a trained model), the data is fed
into a neural network as a feature vector. For an Agent to successfully learn a
In our case, the information our agent collects includes:
In our case, the information our Agent collects includes:
training. Note that the agent only collects the x and z coordinates since the
training. Note that the Agent only collects the x and z coordinates since the
floor is aligned with the x-z plane and the y component of the target's
position never changes.

AddVectorObs(relativePosition.z / 5);
```
* Position of the agent itself within the confines of the floor. This data is
collected as the agent's distance from each edge of the floor.
* Position of the Agent itself within the confines of the floor. This data is
collected as the Agent's distance from each edge of the floor.
```csharp
// Distance to edges of platform

AddVectorObs((this.transform.position.z - 5) / 5);
```
* The velocity of the agent. This helps the agent learn to control its speed so
* The velocity of the Agent. This helps the Agent learn to control its speed so
it doesn't overshoot the target and roll off the platform.
```csharp

`AgentAction()` function. The number of elements in this array is determined by
the `Vector Action Space Type` and `Vector Action Space Size` settings of the
agent's Brain. The RollerAgent uses the continuous vector action space and needs
two continuous control signals from the brain. Thus, we will set the Brain
two continuous control signals from the Brain. Thus, we will set the Brain
axis. (If we allowed the agent to move in three dimensions, then we would need
axis. (If we allowed the Agent to move in three dimensions, then we would need
to set `Vector Action Size` to 3. Each of these values returned by the network
are between `-1` and `1.` Note the Brain really has no idea what the values in
the action array mean. The training process just adjusts the action values in

### Rewards
Reinforcement learning requires rewards. Assign rewards in the `AgentAction()`
function. The learning algorithm uses the rewards assigned to the agent at each
function. The learning algorithm uses the rewards assigned to the Agent at each
the agent the optimal actions. You want to reward an agent for completing the
assigned task (reaching the Target cube, in this case) and punish the agent if
the Agent the optimal actions. You want to reward an Agent for completing the
assigned task (reaching the Target cube, in this case) and punish the Agent if
training with sub-rewards that encourage behavior that helps the agent complete
training with sub-rewards that encourage behavior that helps the Agent complete
the agent moves closer to the target in a step and a small negative reward at
each step which encourages the agent to complete its task quickly.
the Agent moves closer to the target in a step and a small negative reward at
each step which encourages the Agent to complete its task quickly.
agent as finished by setting the agent to done.
agent as finished by setting the Agent to done.
```csharp
float distanceToTarget = Vector3.Distance(this.transform.position,

}
```
**Note:** When you mark an agent as done, it stops its activity until it is
reset. You can have the agent reset immediately, by setting the
**Note:** When you mark an Agent as done, it stops its activity until it is
reset. You can have the Agent reset immediately, by setting the
It can also encourage an agent to finish a task more quickly to assign a
It can also encourage an Agent to finish a task more quickly to assign a
negative reward at each step:
```csharp

Finally, to punish the agent for falling off the platform, assign a large
negative reward and, of course, set the agent to done so that it resets itself
Finally, to punish the Agent for falling off the platform, assign a large
negative reward and, of course, set the Agent to done so that it resets itself
in the next step:
```csharp

Now, that all the GameObjects and ML-Agent components are in place, it is time
to connect everything together in the Unity Editor. This involves assigning the
Brain object to the Agent, changing some of the Agent Components properties, and
setting the Brain properties so that they are compatible with our agent code.
setting the Brain properties so that they are compatible with our Agent code.
1. Expand the Academy GameObject in the Hierarchy window, so that the Brain
object is visible.

It is always a good idea to test your environment manually before embarking on
an extended training run. The reason we have left the Brain set to the
**Player** type is so that we can control the agent using direct keyboard
**Player** type is so that we can control the Agent using direct keyboard
control. But first, you need to define the keyboard to action mapping. Although
the RollerAgent only has an `Action Size` of two, we will use one key to specify
positive values and one to specify negative values for each action, for a total

`AgentAction()` function. **Value** is assigned to action[Index] when **Key** is
pressed.
Press **Play** to run the scene and use the WASD keys to move the agent around
Press **Play** to run the scene and use the WASD keys to move the Agent around
Console window and that the agent resets when it reaches its target or falls
Console window and that the Agent resets when it reaches its target or falls
includes a convenient Monitor class that you can use to easily display agent
includes a convenient Monitor class that you can use to easily display Agent
status information in the Game window.
One additional test you can perform is to first ensure that your environment and

There are three kinds of game objects you need to include in your scene in order
to use Unity ML-Agents:
* Academy
* Brain
* Agents
* Academy
* Brain
* Agents
* You can have multiple Brain game objects but they must be child of the Academy game object.
* You can have multiple Brain game objects but they must be child of the Academy
game object.
Here is an example of what your scene hierarchy should look like:

6
docs/Learning-Environment-Design-Academy.md


# Creating an Academy
An Academy orchestrates all the Agent and Brain objects in a Unity scene. Every
scene containing agents must contain a single Academy. To use an Academy, you
scene containing Agents must contain a single Academy. To use an Academy, you
must create your own subclass. However, all the methods you can override are
optional.

## Resetting an Environment
Implement an `AcademyReset()` function to alter the environment at the start of
each episode. For example, you might want to reset an agent to its starting
each episode. For example, you might want to reset an Agent to its starting
position or move a goal to a random position. An environment resets when the
Academy `Max Steps` count is reached.

## Controlling an Environment
The `AcademyStep()` function is called at every step in the simulation before
any agents are updated. Use this function to update objects in the environment
any Agents are updated. Use this function to update objects in the environment
at every step or during the episode between environment resets. For example, if
you want to add elements to the environment at random intervals, you can put the
logic for creating them in the `AcademyStep()` function.

147
docs/Learning-Environment-Design-Agents.md


# Agents
An agent is an actor that can observe its environment and decide on the best
course of action using those observations. Create agents in Unity by extending
course of action using those observations. Create Agents in Unity by extending
successfully learn are the observations the agent collects and, for
reinforcement learning, the reward you assign to estimate the value of the
successfully learn are the observations the agent collects for
reinforcement learning and the reward you assign to estimate the value of the
An agent passes its observations to its brain. The brain, then, makes a decision
An Agent passes its observations to its Brain. The Brain, then, makes a decision
and passes the chosen action back to the agent. Your agent code must execute the
action, for example, move the agent in one direction or another. In order to
[train an agent using reinforcement learning](Learning-Environment-Design.md),

The Brain class abstracts out the decision making logic from the agent itself so
that you can use the same brain in multiple agents. How a brain makes its
decisions depends on the type of brain it is. An **External** brain simply
passes the observations from its agents to an external process and then passes
the decisions made externally back to the agents. An **Internal** brain uses the
The Brain class abstracts out the decision making logic from the Agent itself so
that you can use the same Brain in multiple Agents. How a Brain makes its
decisions depends on the type of Brain it is. An **External** Brain simply
passes the observations from its Agents to an external process and then passes
the decisions made externally back to the Agents. An **Internal** Brain uses the
parameters in search of a better decision). The other types of brains do not
parameters in search of a better decision). The other types of Brains do not
directly involve training, but you might find them useful as part of a training
project. See [Brains](Learning-Environment-Design-Brains.md).

of simulation steps (the frequency defaults to once-per-step). You can also set
up an agent to request decisions on demand. Making decisions at regular step
up an Agent to request decisions on demand. Making decisions at regular step
decisions on demand is generally appropriate for situations where agents only
decisions on demand is generally appropriate for situations where Agents only
respond to specific events or take actions of variable duration. For example, an
agent in a robotic simulator that must provide fine-control of joint torques
should make its decisions every step of the simulation. On the other hand, an

To control the frequency of step-based decision making, set the **Decision
Frequency** value for the Agent object in the Unity Inspector window. Agents
using the same Brain instance can use a different frequency. During simulation
steps in which no decision is requested, the agent receives the same action
steps in which no decision is requested, the Agent receives the same action
On demand decision making allows agents to request decisions from their brains
On demand decision making allows Agents to request decisions from their Brains
only when needed instead of receiving decisions at a fixed frequency. This is
useful when the agents commit to an action for a variable number of steps or
when the agents cannot make decisions at the same time. This typically the case

When you turn on **On Demand Decisions** for an agent, your agent code must call
When you turn on **On Demand Decisions** for an Agent, your agent code must call
of the observation-decision-action-reward cycle. The Brain invokes the agent's
of the observation-decision-action-reward cycle. The Brain invokes the Agent's
`AgentAction()` method. The Brain waits for the agent to request the next
`AgentAction()` method. The Brain waits for the Agent to request the next
decision before starting another iteration.
## Observations

point numbers.
* **Visual Observations** — one or more camera images.
When you use vector observations for an agent, implement the
When you use vector observations for an Agent, implement the
to implement the `CollectObservations()` method when your agent uses visual
to implement the `CollectObservations()` method when your Agent uses visual
observations (unless it also uses vector observations).
### Vector Observation Space: Feature Vectors

class calls the `CollectObservations()` method of each of its agents. Your
class calls the `CollectObservations()` method of each of its Agents. Your
The observation must include all the information an agent needs to accomplish
The observation must include all the information an agents needs to accomplish
its task. Without sufficient and relevant information, an agent may learn poorly
or may not learn at all. A reasonable approach for determining what information
should be included is to consider what you would need to calculate an analytical

an agent's observations to a fixed subset. For example, instead of observing
every enemy agent in an environment, you could only observe the closest five.
When you set up an Agent's brain in the Unity Editor, set the following
When you set up an Agent's Brain in the Unity Editor, set the following
properties to use a continuous vector observation:
* **Space Size** — The state size must match the length of your feature vector.

### Multiple Visual Observations
Camera observations use rendered textures from one or more cameras in a scene.
The brain vectorizes the textures into a 3D Tensor which can be fed into a
The Brain vectorizes the textures into a 3D Tensor which can be fed into a
convolutional neural network (CNN). For more information on CNNs, see [this
guide](http://cs231n.github.io/convolutional-networks/). You can use camera
observations along side vector observations.

also typically less efficient and slower to train, and sometimes don't succeed
at all.
To add a visual observation to an agent, click on the `Add Camera` button in the
To add a visual observation to an Agent, click on the `Add Camera` button in the