浏览代码

Dev docs (#361)

New documentation structure and content.
/develop-generalizationTraining-TrainerController
Arthur Juliani 7 年前
当前提交
85ae912d
共有 114 个文件被更改,包括 13867 次插入2390 次删除
  1. 2
      .gitignore
  2. 92
      README.md
  3. 353
      docs/Getting-Started-with-Balance-Ball.md
  4. 61
      docs/Readme.md
  5. 2
      docs/Training-on-Amazon-Web-Service.md
  6. 30
      docs/Learning-Environment-Examples.md
  7. 8
      docs/Limitations-and-Common-Issues.md
  8. 4
      docs/Python-API.md
  9. 10
      docs/Using-TensorFlow-Sharp-in-Unity.md
  10. 2
      docs/Feature-Broadcasting.md
  11. 2
      docs/Feature-Monitor.md
  12. 4
      python/unityagents/environment.py
  13. 19
      unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DDecision.cs
  14. 132
      unity-environment/Assets/ML-Agents/Examples/Crawler/Crawler.unity
  15. 2
      unity-environment/Assets/ML-Agents/Scripts/Academy.cs
  16. 2
      unity-environment/Assets/ML-Agents/Scripts/Agent.cs
  17. 547
      docs/images/mlagents-BuildWindow.png
  18. 511
      docs/images/mlagents-TensorBoard.png
  19. 345
      docs/images/brain.png
  20. 377
      docs/images/learning_environment_basic.png
  21. 16
      docs/Background-Jupyter.md
  22. 183
      docs/Background-Machine-Learning.md
  23. 37
      docs/Background-TensorFlow.md
  24. 18
      docs/Background-Unity.md
  25. 20
      docs/Contribution-Guidelines.md
  26. 3
      docs/Glossary.md
  27. 3
      docs/Installation-Docker.md
  28. 61
      docs/Installation.md
  29. 25
      docs/Learning-Environment-Best-Practices.md
  30. 398
      docs/Learning-Environment-Create-New.md
  31. 42
      docs/Learning-Environment-Design-Academy.md
  32. 277
      docs/Learning-Environment-Design-Agents.md
  33. 52
      docs/Learning-Environment-Design-Brains.md
  34. 89
      docs/Learning-Environment-Design.md
  35. 403
      docs/ML-Agents-Overview.md
  36. 68
      docs/Training-Curriculum-Learning.md
  37. 3
      docs/Training-Imitation-Learning.md
  38. 4
      docs/Training-ML-Agents.md
  39. 118
      docs/Training-PPO.md
  40. 5
      docs/Using-Tensorboard.md
  41. 1001
      docs/dox-ml-agents.conf
  42. 5
      docs/doxygen/Readme.md
  43. 1001
      docs/doxygen/doxygenbase.css
  44. 14
      docs/doxygen/footer.html
  45. 56
      docs/doxygen/header.html
  46. 18
      docs/doxygen/logo.png
  47. 146
      docs/doxygen/navtree.css
  48. 15
      docs/doxygen/splitbar.png
  49. 410
      docs/doxygen/unity.css
  50. 219
      docs/images/academy.png
  51. 119
      docs/images/broadcast.png
  52. 110
      docs/images/internal_brain.png
  53. 744
      docs/images/learning_environment.png
  54. 469
      docs/images/learning_environment_example.png
  55. 1001
      docs/images/mlagents-3DBall.png
  56. 712
      docs/images/mlagents-3DBallHierarchy.png
  57. 144
      docs/images/mlagents-NewProject.png
  58. 183
      docs/images/mlagents-NewTutAcademy.png
  59. 810
      docs/images/mlagents-NewTutAssignBrain.png
  60. 186
      docs/images/mlagents-NewTutBlock.png
  61. 184
      docs/images/mlagents-NewTutBrain.png
  62. 209
      docs/images/mlagents-NewTutFloor.png
  63. 54
      docs/images/mlagents-NewTutHierarchy.png
  64. 192
      docs/images/mlagents-NewTutSphere.png
  65. 591
      docs/images/mlagents-NewTutSplash.png
  66. 580
      docs/images/mlagents-Open3DBall.png
  67. 1001
      docs/images/mlagents-Scene.png
  68. 79
      docs/images/mlagents-SetExternalBrain.png
  69. 64
      docs/images/normalization.png
  70. 129
      docs/images/player_brain.png
  71. 111
      docs/images/rl_cycle.png
  72. 79
      docs/images/scene-hierarchy.png
  73. 15
      docs/images/splitbar.png
  74. 71
      docs/Agents-Editor-Interface.md
  75. 127
      docs/Making-a-new-Unity-Environment.md
  76. 33
      docs/Organizing-the-Scene.md
  77. 43
      docs/Unity-Agents-Overview.md
  78. 23
      docs/best-practices.md
  79. 114
      docs/best-practices-ppo.md
  80. 87
      docs/curriculum.md
  81. 29
      docs/Instantiating-Destroying-Agents.md
  82. 55
      docs/installation.md
  83. 343
      images/academy.png
  84. 241
      images/player_brain.png
  85. 58
      python/README.md
  86. 52
      unity-environment/README.md
  87. 0
      /docs/Learning-Environment-Examples.md
  88. 0
      /docs/Limitations-and-Common-Issues.md
  89. 0
      /docs/Python-API.md
  90. 0
      /docs/Using-TensorFlow-Sharp-in-Unity.md
  91. 0
      /docs/Feature-Broadcasting.md
  92. 0
      /docs/Feature-Monitor.md
  93. 0
      /docs/images/agent.png
  94. 0
      /docs/images/agents_diagram.png
  95. 0
      /docs/images/balance.png
  96. 0
      /docs/images/banner.png
  97. 0
      /docs/images/mlagents-BuildWindow.png
  98. 0
      /docs/images/mlagents-TensorBoard.png

2
.gitignore


/unity-environment/Assets/ML-Agents/Plugins/Computer*
/unity-environment/Assets/ML-Agents/Plugins/System*
# Generated doc folders
/docs/html
# Mac hidden files
*.DS_Store

92
README.md


<img src="images/unity-wide.png" align="middle" width="3000"/>
<img src="docs/images/unity-wide.png" align="middle" width="3000"/>
# Unity ML - Agents (Beta)
# Unity ML-Agents (Beta)
**Unity Machine Learning Agents** allows researchers and developers to
create games and simulations using the Unity Editor which serve as
environments where intelligent agents can be trained using
reinforcement learning, neuroevolution, or other machine learning
methods through a simple-to-use Python API. For more information, see
the [documentation page](docs).
For a walkthrough on how to train an agent in one of the provided
example environments, start
[here](docs/Getting-Started-with-Balance-Ball.md).
**Unity Machine Learning Agents** (ML-Agents) is an open-source Unity plugin
that enables games and simulations to serve as environments for training
intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through
a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers
and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
These trained agents can be used for multiple purposes, including
controlling NPC behavior (in a variety of settings such as multi-agent and
adversarial), automated testing of game builds and evaluating different game
design decisions pre-release. ML-Agents is mutually beneficial for both game
developers and AI researchers as it provides a central platform where advances
in AI can be evaluated on Unity’s rich environments and then made accessible
to the wider research and game developer communities.
* Multiple observations (cameras)
* Flexible Multi-agent support
* Flexible single-agent and multi-agent support
* Multiple visual observations (cameras)
* Python (2 and 3) control interface
* Visualizing network outputs in environment
* Tensorflow Sharp Agent Embedding _[Experimental]_
* Built-in support for Imitation Learning (coming soon)
* Visualizing network outputs within the environment
* Python control interface
* TensorFlow Sharp Agent Embedding _[Experimental]_
## Creating an Environment
## Documentation and References
The _Agents SDK_, including example environment scenes is located in
`unity-environment` folder. For requirements, instructions, and other
information, see the contained Readme and the relevant
[documentation](docs/Making-a-new-Unity-Environment.md).
For more information on ML-Agents, in addition to installation, and usage
instructions, see our [documentation home](docs).
## Training your Agents
We have also published a series of blog posts that are relevant for ML-Agents:
- Overviewing reinforcement learning concepts
([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
and [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
- [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
- [Post]() announcing the winners of our
[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
- [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
overviewing how Unity can be leveraged as a simulator to design safer cities.
In addition to our own documentation, here are some additional, relevant articles:
- [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
- [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
- [Unity3D Machine Learning – Setting up the environment & TensorFlow for AgentML on Windows 10](https://unity3d.college/2017/10/25/machine-learning-in-unity3d-setting-up-the-environment-tensorflow-for-agentml-on-windows-10/)
- [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
Once you've built a Unity Environment, example Reinforcement Learning
algorithms and the Python API are available in the `python`
folder. For requirements, instructions, and other information, see the
contained Readme and the relevant
[documentation](docs/Unity-Agents---Python-API.md).
## Community and Feedback
ML-Agents is an open-source project and we encourage and welcome contributions.
If you wish to contribute, be sure to review our
[contribution guidelines](docs/Contribution-Guidelines.md) and
[code of conduct](CODE_OF_CONDUCT.md).
You can connect with us and the broader community
through Unity Connect and GitHub:
* Join our
[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
to connect with others using ML-Agents and Unity developers enthusiastic
about machine learning. We use that channel to surface updates
regarding ML-Agents (and, more broadly, machine learning in games).
* If you run into any problems using ML-Agents,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to include as much detail as possible.
For any other questions or feedback, connect directly with the ML-Agents
team at ml-agents@unity3d.com.
## License
[Apache License 2.0](LICENSE)

353
docs/Getting-Started-with-Balance-Ball.md


# Getting Started with the Balance Ball Example
# Getting Started with the 3D Balance Ball Example
![Balance Ball](../images/balance.png)
This tutorial walks through the end-to-end process of opening an ML-Agents
example environment in Unity, building the Unity executable, training an agent
in it, and finally embedding the trained model into the Unity environment.
This tutorial will walk through the end-to-end process of installing Unity Agents, building an example environment, training an agent in it, and finally embedding the trained model into the Unity environment.
ML-Agents includes a number of [example environments](Learning-Environment-Examples.md)
which you can examine to help understand the different ways in which ML-Agents
can be used. These environments can also serve as templates for new
environments or as ways to test new ML algorithms. After reading this tutorial,
you should be able to explore and build the example environments.
Unity ML Agents contains a number of example environments which can be used as templates for new environments, or as ways to test a new ML algorithm to ensure it is functioning correctly.
![Balance Ball](images/balance.png)
In this walkthrough we will be using the **3D Balance Ball** environment. The environment contains a number of platforms and balls. Platforms can act to keep the ball up by rotating either horizontally or vertically. Each platform is an agent which is rewarded the longer it can keep a ball balanced on it, and provided a negative reward for dropping the ball. The goal of the training process is to have the platforms learn to never drop the ball.
This walkthrough uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent**
that receives a reward for every step that it balances the ball. An agent is
also penalized with a negative reward for dropping the ball. The goal of the
training process is to have the platforms learn to never drop the ball.
In order to install and set-up the Python and Unity environments, see the instructions [here](installation.md).
In order to install and set up ML-Agents, the Python dependencies and Unity,
see the [installation instructions](Installation.md).
## Understanding a Unity Environment (Balance Ball)
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing
an Academy and one or more Brain and Agent objects, and, of course, the other
entities that an agent interacts with.
![Unity Editor](images/mlagents-3DBallHierarchy.png)
**Note:** In Unity, the base object of everything in a scene is the
_GameObject_. The GameObject is essentially a container for everything else,
including behaviors, graphics, physics, etc. To see the components that make
up a GameObject, select the GameObject in the Scene window, and open the
Inspector window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
### Academy
The Academy object for the scene is placed on the Ball3DAcademy GameObject.
When you look at an Academy component in the inspector, you can see several
properties that control how the environment works. For example, the
**Training** and **Inference Configuration** properties set the graphics and
timescale properties for the Unity application. The Academy uses the
**Training Configuration** during training and the **Inference Configuration**
when not training. (*Inference* means that the agent is using a trained model
or heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the
**Training configuration** and a high graphics quality and the timescale to
`1.0` for the **Inference Configuration** .
**Note:** if you want to observe the environment during training, you can
adjust the **Inference Configuration** settings to use a larger window and a
timescale closer to 1:1. Be sure to set these parameters back when training in
earnest; otherwise, training can take a very long time.
Another aspect of an environment to look at is the Academy implementation.
Since the base Academy class is abstract, you must always define a subclass.
There are three functions you can implement, though they are all optional:
* Academy.InitializeAcademy() — Called once when the environment is launched.
* Academy.AcademyStep() — Called at every simulation step before
Agent.AgentStep() (and after the agents collect their state observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
The 3D Balance Ball environment does not use these functions — each agent
resets itself when needed — but many environments do use these functions to
control the environment around the agents.
### Brain
The Ball3DBrain GameObject in the scene, which contains a Brain component,
is a child of the Academy object. (All Brain objects in a scene must be
children of the Academy.) All the agents in the 3D Balance Ball environment
use the same Brain instance. A Brain doesn't save any state about an agent,
it just routes the agent's collected state observations to the decision making
process and returns the chosen action to the agent. Thus, all agents can share
the same brain, but act independently. The Brain settings tell you quite a bit
about how an agent works.
The **Brain Type** determines how an agent makes its decisions. The
**External** and **Internal** types work together — use **External** when
training your agents; use **Internal** when using the trained model.
The **Heuristic** brain allows you to hand-code the agent's logic by extending
the Decision class. Finally, the **Player** brain lets you map keyboard
commands to actions, which can be useful when testing your agents and
environment. If none of these types of brains do what you need, you can
implement your own CoreBrain to create your own type.
In this tutorial, you will set the **Brain Type** to **External** for training;
when you embed the trained model in the Unity application, you will change the
**Brain Type** to **Internal**.
**State Observation Space**
Before making a decision, an agent collects its observation about its state
in the world. ML-Agents classifies observations into two types: **Continuous**
and **Discrete**. The **Continuous** state space collects observations in a
vector of floating point numbers. The **Discrete** state space is an index
into a table of states. Most of the example environments use a continuous
state space.
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
state space with a **State Size** of 8. This means that the feature vector
containing the agent's observations contains eight elements: the `x` and `z`
components of the platform's rotation and the `x`, `y`, and `z` components of
the ball's relative position and velocity. (The state values are defined in
the agent's `CollectState()` function.)
**Action Space**
An agent is given instructions from the brain in the form of *actions*. Like
states, ML-Agents classifies actions into two types: the **Continuous** action
space is a vector of numbers that can vary continuously. What each element of
the vector means is defined by the agent logic (the PPO training process just
learns what values are better given particular state observations based on the
rewards received when it tries different values). For example, an element might
represent a force or torque applied to a Rigidbody in the agent. The
**Discrete** action space defines its actions as a table. A specific action
given to the agent is an index into this table.
The 3D Balance Ball example is programmed to use both types of action space.
You can try training with both settings to observe whether there is a
difference. (Set the `Action Size` to 4 when using the discrete action space
and 2 when using continuous.)
### Agent
The Agent is the actor that observes and takes actions in the environment.
In the 3D Balance Ball environment, the Agent components are placed on the
twelve Platform GameObjects. The base Agent object has a few properties that
affect its behavior:
* **Brain** — Every agent must have a Brain. The brain determines how an agent
makes decisions. All the agents in the 3D Balance Ball scene share the same
brain.
* **Observations** — Defines any Camera objects used by the agent to observe
its environment. 3D Balance Ball does not use camera observations.
* **Max Step** — Defines how many simulation steps can occur before the agent
decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an agent starts over when it is finished.
3D Balance Ball sets this true so that the agent restarts after reaching the
**Max Step** count or after dropping the ball.
Perhaps the more interesting aspect of an agent is the Agent subclass
implementation. When you create an agent, you must extend the base Agent class.
The Ball3DAgent subclass defines the following methods:
* Agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the
platform and ball. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and platform
attitude.
* Agent.CollectState() — Called every simulation step. Responsible for
collecting the agent's observations of the environment. Since the Brain
instance assigned to the agent is set to the continuous state space with a
state size of 8, the `CollectState()` function returns a vector (technically
a List<float> object) containing 8 elements.
* Agent.AgentStep() — Called every simulation step (unless the brain's
`Frame Skip` property is > 0). Receives the action chosen by the brain. The
Ball3DAgent example handles both the continuous and the discrete action space
types. There isn't actually much difference between the two state types in
this environment — both action spaces result in a small change in platform
rotation at each step. The `AgentStep()` function assigns a reward to the
agent; in this example, an agent receives a small positive reward for each
step it keeps the ball on the platform and a larger, negative reward for
dropping the ball. An agent is also marked as done when it drops the ball
so that it will reset with a new ball for the next simulation step.
## Building Unity Environment
Launch the Unity Editor, and log in, if necessary.
## Building the Environment
1. Open the `unity-environment` folder using the Unity editor. *(If this is not first time running Unity, you'll be able to skip most of these immediate steps, choose directly from the list of recently opened projects)*
- On the initial dialog, choose `Open` on the top options
- On the file dialog, choose `unity-environment` and click `Open` *(It is safe to ignore any warning message about non-matching editor installation)*
- Once the project is open, on the `Project` panel (bottom of the tool), navigate to the folder `Assets/ML-Agents/Examples/3DBall/`
- Double-click the `Scene` icon (Unity logo) to load all environment assets
2. Go to `Edit -> Project Settings -> Player`
- Ensure that `Resolution and Presentation -> Run in Background` is Checked.
- Ensure that `Resolution and Presentation -> Display Resolution Dialog` is set to Disabled.
3. Expand the `Ball3DAcademy` GameObject and locate its child object `Ball3DBrain` within the Scene hierarchy in the editor. Ensure Type of Brain for this object is set to `External`.
4. *File -> Build Settings*
5. Choose your target platform:
- (optional) Select “Development Build” to log debug messages.
6. Click *Build*:
- Save environment binary to the `python` sub-directory of the cloned repository *(you may need to click on the down arrow on the file chooser to be able to select that folder)*
The first step is to open the Unity scene containing the 3D Balance Ball
environment:
## Training the Brain with Reinforcement Learning
1. Launch Unity.
2. On the Projects dialog, choose the **Open** option at the top of the window.
3. Using the file dialog that opens, locate the `unity-environment` folder
within the ML-Agents project and click **Open**.
4. In the `Project` window, navigate to the folder
`Assets/ML-Agents/Examples/3DBall/`.
5. Double-click the `Scene` file to load the scene containing the Balance
Ball environment.
### Testing Python API
![3DBall Scene](images/mlagents-Open3DBall.png)
To launch jupyter, run in the command line:
Since we are going to build this environment to conduct training, we need to
set the brain used by the agents to **External**. This allows the agents to
communicate with the external training process when making their decisions.
`jupyter notebook`
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
2. Select its child object `Ball3DBrain`.
3. In the Inspector window, set **Brain Type** to `External`.
Then navigate to `localhost:8888` to access the notebooks. If you're new to jupyter, check out the [quick start guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html) before you continue.
![Set Brain to External](images/mlagents-SetExternalBrain.png)
To ensure that your environment and the Python API work as expected, you can use the `python/Basics` Jupyter notebook. This notebook contains a simple walkthrough of the functionality of the API. Within `Basics`, be sure to set `env_name` to the name of the environment file you built earlier.
Next, we want the set up scene to to play correctly when the training process
launches our environment executable. This means:
* The environment application runs in the background
* No dialogs require interaction
* The correct scene loads automatically
1. Open Player Settings (menu: **Edit** > **Project Settings** > **Player**).
2. Under **Resolution and Presentation**:
- Ensure that **Run in Background** is Checked.
- Ensure that **Display Resolution Dialog** is set to Disabled.
3. Open the Build Settings window (menu:**File** > **Build Settings**).
4. Choose your target platform.
- (optional) Select “Development Build” to
[log debug messages](https://docs.unity3d.com/Manual/LogFiles.html).
5. If any scenes are shown in the **Scenes in Build** list, make sure that
the 3DBall Scene is the only one checked. (If the list is empty, than only the
current scene is included in the build).
6. Click *Build*:
a. In the File dialog, navigate to the `python` folder in your ML-Agents
directory.
b. Assign a file name and click **Save**.
![Build Window](images/mlagents-BuildWindow.png)
## Training the Brain with Reinforcement Learning
Now that we have a Unity executable containing the simulation environment, we
can perform the training.
In order to train an agent to correctly balance the ball, we will use a Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This is a method that has been shown to be safe, efficient, and more general purpose than many other RL algorithms, as such we have chosen it as the example algorithm for use with ML Agents. For more information on PPO, OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) explaining it.
In order to train an agent to correctly balance the ball, we will use a
Reinforcement Learning algorithm called Proximal Policy Optimization (PPO).
This is a method that has been shown to be safe, efficient, and more general
purpose than many other RL algorithms, as such we have chosen it as the
example algorithm for use with ML-Agents. For more information on PPO,
OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
explaining it.
3. (optional) Set `run_path` directory to your choice.
4. Run all cells of notebook with the exception of the last one under "Export the trained Tensorflow graph."
3. (optional) In order to get the best results quickly, set `max_steps` to
50000, set `buffer_size` to 5000, and set `batch_size` to 512. For this
exercise, this will train the model in approximately ~5-10 minutes.
4. (optional) Set `run_path` directory to your choice. When using TensorBoard
to observe the training statistics, it helps to set this to a sequential value
for each training run. In other words, "BalanceBall1" for the first run,
"BalanceBall2" or the second, and so on. If you don't, the summaries for
every training run are saved to the same directory and will all be included
on the same graph.
5. Run all cells of notebook with the exception of the last one under "Export
the trained Tensorflow graph."
In order to observe the training process in more detail, you can use Tensorboard.
In your command line, enter into `python` directory and then run :
In order to observe the training process in more detail, you can use
TensorBoard. In your command line, enter into `python` directory and then run :
From Tensorboard, you will see the summary statistics of six variables:
* Cumulative Reward - The mean cumulative episode reward over all agents. Should increase during a successful training session.
* Value Loss - The mean loss of the value function update. Correlates to how well the model is able to predict the value of each state. This should decrease during a succesful training session.
* Policy Loss - The mean loss of the policy function update. Correlates to how much the policy (process for deciding actions) is changing. The magnitude of this should decrease during a succesful training session.
* Episode Length - The mean length of each episode in the environment for all agents.
* Value Estimates - The mean value estimate for all states visited by the agent. Should increase during a successful training session.
* Policy Entropy - How random the decisions of the model are. Should slowly decrease during a successful training process. If it decreases too quickly, the `beta` hyperparameter should be increased.
From TensorBoard, you will see the summary statistics:
## Embedding Trained Brain into Unity Environment _[Experimental]_
Once the training process displays an average reward of ~75 or greater, and there has been a recently saved model (denoted by the `Saved Model` message) you can choose to stop the training process by stopping the cell execution. Once this is done, you now have a trained TensorFlow model. You must now convert the saved model to a Unity-ready format which can be embedded directly into the Unity project by following the steps below.
* Lesson - only interesting when performing
[curriculum training](Training-Curriculum-Learning.md).
This is not used in the 3d Balance Ball environment.
* Cumulative Reward - The mean cumulative episode reward over all agents.
Should increase during a successful training session.
* Entropy - How random the decisions of the model are. Should slowly decrease
during a successful training process. If it decreases too quickly, the `beta`
hyperparameter should be increased.
* Episode Length - The mean length of each episode in the environment for all
agents.
* Learning Rate - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
* Policy Loss - The mean loss of the policy function update. Correlates to how
much the policy (process for deciding actions) is changing. The magnitude of
this should decrease during a successful training session.
* Value Estimate - The mean value estimate for all states visited by the agent.
Should increase during a successful training session.
* Value Loss - The mean loss of the value function update. Correlates to how
well the model is able to predict the value of each state. This should decrease
during a successful training session.
![Example TensorBoard Run](images/mlagents-TensorBoard.png)
## Embedding the Trained Brain into the Unity Environment _[Experimental]_
Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
use it with agents having an **Internal** brain type.
Because TensorFlowSharp support is still experimental, it is disabled by default. In order to enable it, you must follow these steps. Please note that the `Internal` Brain mode will only be available once completing these steps.
1. Make sure you are using Unity 2017.1 or newer.
2. Make sure the TensorFlowSharp plugin is in your `Assets` folder. A Plugins folder which includes TF# can be downloaded [here](https://s3.amazonaws.com/unity-agents/0.2/TFSharpPlugin.unitypackage). Double click and import it once downloaded.
3. Go to `Edit` -> `Project Settings` -> `Player`
4. For each of the platforms you target (**`PC, Mac and Linux Standalone`**, **`iOS`** or **`Android`**):
1. Go into `Other Settings`.
2. Select `Scripting Runtime Version` to `Experimental (.NET 4.6 Equivalent)`
3. In `Scripting Defined Symbols`, add the flag `ENABLE_TENSORFLOW`
Because TensorFlowSharp support is still experimental, it is disabled by
default. In order to enable it, you must follow these steps. Please note that
the `Internal` Brain mode will only be available once completing these steps.
1. Make sure the TensorFlowSharp plugin is in your `Assets` folder. A Plugins
folder which includes TF# can be downloaded
[here](https://s3.amazonaws.com/unity-agents/0.2/TFSharpPlugin.unitypackage).
Double click and import it once downloaded. You can see if this was
successfully installed by checking the TensorFlow files in the Project tab
under `Assets` -> `ML-Agents` -> `Plugins` -> `Computer`
2. Go to `Edit` -> `Project Settings` -> `Player`
3. For each of the platforms you target
(**`PC, Mac and Linux Standalone`**, **`iOS`** or **`Android`**):
1. Go into `Other Settings`.
2. Select `Scripting Runtime Version` to
`Experimental (.NET 4.6 Equivalent)`
3. In `Scripting Defined Symbols`, add the flag `ENABLE_TENSORFLOW`.
After typing in, press Enter.
4. Go to `File` -> `Save Project`
1. Run the final cell of the notebook under "Export the trained TensorFlow graph" to produce an `<env_name >.bytes` file.
2. Move `<env_name>.bytes` from `python/models/ppo/` into `unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
1. Run the final cell of the notebook under "Export the trained TensorFlow
graph" to produce an `<env_name >.bytes` file.
2. Move `<env_name>.bytes` from `python/models/ppo/` into
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
6. Drag the `<env_name>.bytes` file from the Project window of the Editor to the `Graph Model` placeholder in the `3DBallBrain` inspector window.
6. Drag the `<env_name>.bytes` file from the Project window of the Editor
to the `Graph Model` placeholder in the `3DBallBrain` inspector window.
If you followed these steps correctly, you should now see the trained model being used to control the behavior of the balance ball within the Editor itself. From here you can re-build the Unity binary, and run it standalone with your agent's new learned behavior built right in.
If you followed these steps correctly, you should now see the trained model
being used to control the behavior of the balance ball within the Editor
itself. From here you can re-build the Unity binary, and run it standalone
with your agent's new learned behavior built right in.

61
docs/Readme.md


# Unity ML Agents Documentation
## About
* [Unity ML Agents Overview](Unity-Agents-Overview.md)
* [Example Environments](Example-Environments.md)
# Unity ML-Agents Documentation
## Tutorials
* [Installation & Set-up](installation.md)
## Getting Started
* [ML-Agents Overview](ML-Agents-Overview.md)
* [Background: Unity](Background-Unity.md)
* [Background: Machine Learning](Background-Machine-Learning.md)
* [Background: TensorFlow](Background-TensorFlow.md)
* [Installation & Set-up](Installation.md)
* [Background: Jupyter Notebooks](Background-Jupyter.md)
* [Docker Set-up (Experimental)](Using-Docker.md)
* [Making a new Unity Environment](Making-a-new-Unity-Environment.md)
* [How to use the Python API](Unity-Agents---Python-API.md)
* [Example Environments](Learning-Environment-Examples.md)
## Features
* [Agents SDK Inspector Descriptions](Agents-Editor-Interface.md)
* [Scene Organization](Organizing-the-Scene.md)
* [Curriculum Learning](curriculum.md)
* [Broadcast](broadcast.md)
* [Monitor](monitor.md)
## Creating Learning Environments
* [Making a new Learning Environment](Learning-Environment-Create-New.md)
* [Designing a Learning Environment](Learning-Environment-Design.md)
* [Agents](Learning-Environment-Design-Agents.md)
* [Academy](Learning-Environment-Design-Academy.md)
* [Brains](Learning-Environment-Design-Brains.md)
* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
* [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md)
## Training
* [Training ML-Agents](Training-ML-Agents.md)
* [Training with Proximal Policy Optimization](Training-PPO.md)
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [TensorflowSharp in Unity [Experimental]](Using-TensorFlow-Sharp-in-Unity-(Experimental).md)
* [Instanciating and Destroying agents](Instantiating-Destroying-Agents.md)
## Best Practices
* [Best practices when creating an Environment](best-practices.md)
* [Best practices when training using PPO](best-practices-ppo.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
* [Limitations & Common Issues](Limitations-&-Common-Issues.md)
* [ML-Agents Glossary](Glossary.md)
* [Limitations & Common Issues](Limitations-and-Common-Issues.md)
## C# API and Components
* Academy
* Brain
* Agent
* CoreBrain
* Decision
* Monitor
## Python API
* [How to use the Python API](Python-API.md)

2
docs/Training-on-Amazon-Web-Service.md


env = UnityEnvironment(your_env)
```
You should receive a message confirming that the environment was loaded succesfully.
You should receive a message confirming that the environment was loaded successfully.

30
docs/Learning-Environment-Examples.md


# Example Learning Environments
### About Example Environments
Unity ML Agents contains a set of example environments which demonstrate various features of the platform. In the coming months more will be added. We are also actively open to adding community contributed environments as examples, as long as they are small, simple, demonstrate a unique feature of the platform, and provide a unique non-trivial challenge to modern RL algorithms. Feel free to submit these environments with a Pull-Request explaining the nature of the environment and task.
Unity ML-Agents contains an expanding set of example environments which
demonstrate various features of the platform. Environments are located in
`unity-environment/Assets/ML-Agents/Examples` and summarised below.
Additionally, our
[first ML Challenge](https://connect.unity.com/challenges/ml-agents-1)
contains environments created by the community.
Environments are located in `unity-environment/ML-Agents/Examples`.
This page only overviews the example environments we provide. To learn more
on how to design and build your own environments see our
[Making a new Learning Environment](Learning-Environment-Create-New.md)
page.
If you would like to contribute environments, please see our
[contribution guidelines](Contribution-Guidelines.md) page.
## Basic

## 3DBall
![Balance Ball](../images/balance.png)
![Balance Ball](images/balance.png)
* Set-up: A balance-ball task, where the agent controls the platform.
* Goal: The agent must balance the platform in order to keep the ball on it for as long as possible.

## GridWorld
![GridWorld](../images/gridworld.png)
![GridWorld](images/gridworld.png)
* Set-up: A version of the classic grid-world task. Scene contains agent, goal, and obstacles.
* Goal: The agent must navigate the grid to the goal while avoiding the obstacles.

## Tennis
![Tennis](../images/tennis.png)
![Tennis](images/tennis.png)
* Set-up: Two-player game where agents control rackets to bounce ball over a net.
* Goal: The agents must bounce ball between one another while not dropping or sending ball out of bounds.

### Push Area
![Push](../images/push.png)
![Push](images/push.png)
* Set-up: A platforming environment where the agent can push a block around.
* Goal: The agent must push the block to the goal.

### Wall Area
![Wall](../images/wall.png)
![Wall](images/wall.png)
* Set-up: A platforming environment where the agent can jump over a wall.
* Goal: The agent must use the block to scale the wall and reach the goal.

## Reacher
![Tennis](../images/reacher.png)
![Tennis](images/reacher.png)
* Set-up: Double-jointed arm which can move to target locations.
* Goal: The agents must move it's hand to the goal location, and keep it there.

## Crawler
![Crawler](../images/crawler.png)
![Crawler](images/crawler.png)
* Set-up: A creature with 4 arms and 4 forearms.
* Goal: The agents must move its body along the x axis without falling.

8
docs/Limitations-and-Common-Issues.md


# Limitations and Common Issues
# Limitations and Common Issues
## Unity SDK
### Headless Mode

### Environment Permission Error
If you directly import your Unity environment without building it in the editor, you might need to give it additionnal permissions to execute it.
If you directly import your Unity environment without building it in the editor, you might need to give it additional permissions to execute it.
If you receive such a permission error on macOS, run:

### Environment Connection Timeout
If you are able to launch the environment from `UnityEnvironment` but then recieve a timeout error, there may be a number of possible causes.
If you are able to launch the environment from `UnityEnvironment` but then receive a timeout error, there may be a number of possible causes.
* _Cause_: There may be no Brains in your environment which are set to `External`. In this case, the environment will not attempt to communicate with python. _Solution_: Set the train you wish to externally control through the Python API to `External` from the Unity Editor, and rebuild the environment.
* _Cause_: On OSX, the firewall may be preventing communication with the environment. _Solution_: Add the built environment binary to the list of exceptions on the firewall by following instructions [here](https://support.apple.com/en-us/HT201642).

### Mean reward : nan
If you recieve a message `Mean reward : nan` when attempting to train a model using PPO, this is due to the episodes of the learning environment not terminating. In order to address this, set `Max Steps` for either the Academy or Agents within the Scene Inspector to a value greater than 0. Alternatively, it is possible to manually set `done` conditions for episodes from within scripts for custom episode-terminating events.
If you receive a message `Mean reward : nan` when attempting to train a model using PPO, this is due to the episodes of the learning environment not terminating. In order to address this, set `Max Steps` for either the Academy or Agents within the Scene Inspector to a value greater than 0. Alternatively, it is possible to manually set `done` conditions for episodes from within scripts for custom episode-terminating events.

4
docs/Python-API.md


```python
from unityagents import UnityEnvironment
env = UnityEnvironment(file_name=filename, worker_num=0)
env = UnityEnvironment(file_name=filename, worker_id=0)
* `worker_num` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.
* `worker_id` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.
## Interacting with a Unity Environment

10
docs/Using-TensorFlow-Sharp-in-Unity.md


# Using TensorFlowSharp in Unity (Experimental)
# Using TensorFlowSharp in Unity _[Experimental]_
Unity now offers the possibility to use pretrained TensorFlow graphs inside of the game engine. This was made possible thanks to [the TensorFlowSharp project](https://github.com/migueldeicaza/TensorFlowSharp).
Unity now offers the possibility to use pre-trained Tensorflow graphs inside of the game engine. This was made possible thanks to [the TensorFlowSharp project](https://github.com/migueldeicaza/TensorFlowSharp).
_Notice: This feature is still experimental. While it is possible to embed trained models into Unity games, Unity Technologies does not officially support this use-case for production games at this time. As such, no guarantees are provided regarding the quality of experience. If you encounter issues regarding battery life, or general performance (especially on mobile), please let us know._

* Unity 2017.1 or above
* Unity Tensorflow Plugin ([Download here](https://s3.amazonaws.com/unity-agents/0.2/TFSharpPlugin.unitypackage))
# Using TensorflowSharp with ML-Agents
# Using TensorFlowSharp with ML-Agents
In order to bring a fully trained agent back into Unity, you will need to make sure the nodes of your graph have appropriate names. You can give names to nodes in Tensorflow :
```python

* Type `-force_load`
* Drag the library `libtensorflow-core.a` from the `Project Navigator` on the left under `Libraries/ML-Agents/Plugins/iOS` into the flag list.
# Using TensorflowSharp without ML-Agents
# Using TensorFlowSharp without ML-Agents
Beyond controlling an in-game agent, you may desire to use TensorFlowSharp for more general computation. The below instructions describe how to generally embed Tensorflow models without using the ML-Agents framework.

In your C# script :
At the top, add the line
```csharp
using Tensorflow;
using TensorFlow;
```
If you will be building for android, you must add this block at the start of your code :
```csharp

2
docs/Feature-Broadcasting.md


## How to use : Unity
To turn it on in Unity, simply check the `Broadcast` box as shown bellow:
![Broadcast](../images/broadcast.png)
![Broadcast](images/broadcast.png)
## How to use : Python
When you launch your Unity Environment from python, you can see what the agents connected to non-external brains are doing. When calling `step` or `reset` on your environment, you retrieve a dictionary from brain names to `BrainInfo` objects. Each `BrainInfo` the non-external brains set to broadcast.

2
docs/Feature-Monitor.md


# Using the Monitor
![Monitor](../images/monitor.png)
![Monitor](images/monitor.png)
The monitor allows visualizing information related to the agents or training process within a Unity scene.

4
python/unityagents/environment.py


candidates = glob.glob(os.path.join(cwd, file_name + '.app', 'Contents', 'MacOS', true_filename))
if len(candidates) == 0:
candidates = glob.glob(os.path.join(file_name + '.app', 'Contents', 'MacOS', true_filename))
if len(candidates) == 0:
candidates = glob.glob(os.path.join(cwd, file_name + '.app', 'Contents', 'MacOS', '*'))
if len(candidates) == 0:
candidates = glob.glob(os.path.join(file_name + '.app', 'Contents', 'MacOS', '*'))
if len(candidates) > 0:
launch_string = candidates[0]
elif platform == 'win32':

19
unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DDecision.cs


{
if (gameObject.GetComponent<Brain>().brainParameters.actionSpaceType == StateType.continuous)
{
return new float[4]{ 0f, 0f, 0f, 0.0f };
List<float> ret = new List<float>();
if (state[2] < 0 || state[5] < 0)
{
ret.Add(state[5]);
}
else
{
ret.Add(state[5]);
}
if (state[3] < 0 || state[7] < 0)
{
ret.Add(-state[7]);
}
else
{
ret.Add(-state[7]);
}
return ret.ToArray();
}
else

132
unity-environment/Assets/ML-Agents/Examples/Crawler/Crawler.unity


m_ReflectionIntensity: 1
m_CustomReflection: {fileID: 0}
m_Sun: {fileID: 0}
m_IndirectSpecularColor: {r: 0.4465785, g: 0.49641252, b: 0.574817, a: 1}
m_IndirectSpecularColor: {r: 0.4465934, g: 0.49642956, b: 0.5748249, a: 1}
--- !u!157 &3
LightmapSettings:
m_ObjectHideFlags: 0

m_PVRFilteringAtrousPositionSigmaDirect: 0.5
m_PVRFilteringAtrousPositionSigmaIndirect: 2
m_PVRFilteringAtrousPositionSigmaAO: 1
m_ShowResolutionOverlay: 1
m_LightingDataAsset: {fileID: 0}
m_UseShadowmask: 1
--- !u!196 &4

debug:
m_Flags: 0
m_NavMeshData: {fileID: 0}
--- !u!114 &157493879
--- !u!114 &140901381
MonoBehaviour:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}

m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 8b23992c8eb17439887f5e944bf04a40, type: 3}
m_Name: (Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)
m_Script: {fileID: 11500000, guid: 41e9bda8f3cf1492fa74926a530f6f70, type: 3}
m_Name: (Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)
graphModel: {fileID: 4900000, guid: 3569492a1961e4abe87b232f5ccaac90, type: 3}
graphScope:
graphPlaceholders: []
BatchSizePlaceholderName: batch_size
StatePlacholderName: state
RecurrentInPlaceholderName: recurrent_in
RecurrentOutPlaceholderName: recurrent_out
ObservationPlaceholderName: []
ActionPlaceholderName: action
continuousPlayerActions:
- key: 101
index: 0
value: -1
- key: 101
index: 1
value: -1
- key: 101
index: 2
value: -1
- key: 101
index: 3
value: -1
- key: 114
index: 3
value: 1
- key: 116
index: 11
value: -1
- key: 100
index: 1
value: 1
- key: 119
index: 7
value: -1
discretePlayerActions: []
defaultAction: 0
brain: {fileID: 393360180}
--- !u!1 &393360178
GameObject:

m_EditorClassIdentifier:
brainParameters:
stateSize: 117
stackedStates: 1
actionSize: 12
memorySize: 0
cameraResolutions: []

stateSpaceType: 1
brainType: 3
CoreBrains:
- {fileID: 1543732223}
- {fileID: 681646356}
- {fileID: 2072160993}
- {fileID: 157493879}
instanceID: 17776
--- !u!114 &681646356
- {fileID: 140901381}
- {fileID: 722649573}
- {fileID: 1365757417}
- {fileID: 1527314308}
instanceID: 38818
--- !u!114 &722649573
MonoBehaviour:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}

m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 943466ab374444748a364f9d6c3e2fe2, type: 3}
m_Name: (Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)
m_Name: (Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)
m_EditorClassIdentifier:
broadcast: 1
brain: {fileID: 393360180}

m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 3db4283e33af74336bfedb01d0e011bf, type: 2}
m_IsPrefabParent: 0
--- !u!114 &1365757417
MonoBehaviour:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 0}
m_GameObject: {fileID: 0}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 35813a1be64e144f887d7d5f15b963fa, type: 3}
m_Name: (Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)
m_EditorClassIdentifier:
brain: {fileID: 393360180}
--- !u!1 &1392866527
GameObject:
m_ObjectHideFlags: 0

m_TargetEye: 3
m_HDR: 1
m_AllowMSAA: 1
m_AllowDynamicResolution: 0
m_ForceIntoRT: 0
m_OcclusionCulling: 1
m_StereoConvergence: 10

m_PrefabParentObject: {fileID: 4491788954268586, guid: 3db4283e33af74336bfedb01d0e011bf,
type: 2}
m_PrefabInternal: {fileID: 1808602249}
--- !u!114 &1543732223
--- !u!114 &1527314308
MonoBehaviour:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}

m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 41e9bda8f3cf1492fa74926a530f6f70, type: 3}
m_Name: (Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)
m_Script: {fileID: 11500000, guid: 8b23992c8eb17439887f5e944bf04a40, type: 3}
m_Name: (Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)
continuousPlayerActions:
- key: 101
index: 0
value: -1
- key: 101
index: 1
value: -1
- key: 101
index: 2
value: -1
- key: 101
index: 3
value: -1
- key: 114
index: 3
value: 1
- key: 116
index: 11
value: -1
- key: 100
index: 1
value: 1
- key: 119
index: 7
value: -1
discretePlayerActions: []
defaultAction: 0
graphModel: {fileID: 4900000, guid: 3569492a1961e4abe87b232f5ccaac90, type: 3}
graphScope:
graphPlaceholders:
- name: epsilon
valueType: 1
minValue: -1
maxValue: 1
BatchSizePlaceholderName: batch_size
StatePlacholderName: state
RecurrentInPlaceholderName: recurrent_in
RecurrentOutPlaceholderName: recurrent_out
ObservationPlaceholderName: []
ActionPlaceholderName: action
brain: {fileID: 393360180}
--- !u!1001 &1599453071
Prefab:

m_Father: {fileID: 0}
m_RootOrder: 1
m_LocalEulerAnglesHint: {x: 50, y: -30, z: 0}
--- !u!114 &2072160993
MonoBehaviour:
m_ObjectHideFlags: 0
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 0}
m_GameObject: {fileID: 0}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 35813a1be64e144f887d7d5f15b963fa, type: 3}
m_Name: (Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)(Clone)
m_EditorClassIdentifier:
brain: {fileID: 393360180}
--- !u!1 &2095421678
GameObject:
m_ObjectHideFlags: 0

- key: steps
value: 0
done: 0
maxStepReached: 0
episodeCount: 0
currentStep: 0

2
unity-environment/Assets/ML-Agents/Scripts/Academy.cs


using System.Collections.Generic;
using UnityEngine;
/*! \mainpage ML-Agents Index Page
/**
* Welcome to Unity Machine Learning Agents documentation.
*/

2
unity-environment/Assets/ML-Agents/Scripts/Agent.cs


/** Generic functions for parent Agent class.
* Contains all logic for Brain-Agent communication and Agent-Environment
* interaction.
*
* See Also: \ref Agents.md
*/
public abstract class Agent : MonoBehaviour
{

547
docs/images/mlagents-BuildWindow.png

之前 之后
宽度: 471  |  高度: 446  |  大小: 72 KiB

511
docs/images/mlagents-TensorBoard.png

之前 之后
宽度: 921  |  高度: 333  |  大小: 77 KiB

345
docs/images/brain.png

之前 之后
宽度: 457  |  高度: 293  |  大小: 47 KiB

377
docs/images/learning_environment_basic.png

之前 之后
宽度: 1652  |  高度: 470  |  大小: 64 KiB

16
docs/Background-Jupyter.md


# Jupyter
**Work In Progress**
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
embedded visualizations. We provide several such notebooks for testing your
Python installation and training behaviors. For a walkthrough of how to use
Jupyter, see
[Running the Jupyter Notebook](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html)
in the _Jupyter/IPython Quick Start Guide_.
To launch Jupyter, run in the command line:
`jupyter notebook`
Then navigate to `localhost:8888` to access the notebooks.

183
docs/Background-Machine-Learning.md


# Background: Machine Learning
**Work In Progress**
We will not attempt to provide a thorough treatment of machine learning
as there are fantastic resources online. However, given that a number
of users of ML-Agents might not have a formal machine learning background,
this section provides an overview of terminology to facilitate the
understanding of ML-Agents.
Machine learning, a branch of artificial intelligence, focuses on learning patterns
from data. The three main classes of machine learning algorithms include:
unsupervised learning, supervised learning and reinforcement learning.
Each class of algorithm learns from a different type of data. The following paragraphs
provide an overview for each of these classes of machine learning, as well as introductory examples.
## Unsupervised Learning
The goal of unsupervised learning is to group or cluster similar items in a
data set. For example, consider the players of a game. We may want to group
the players depending on how engaged they are with the game. This would enable
us to target different groups (e.g. for highly-engaged players we might
invite them to be beta testers for new features, while for unengaged players
we might email them helpful tutorials). Say that we wish to split our players
into two groups. We would first define basic attributes of the players, such
as the number of hours played, total money spent on in-app purchases and
number of levels completed. We can then feed this data set (three attributes
for every player) to an unsupervised learning algorithm where we specify the
number of groups to be two. The algorithm would then split the data set of
players into two groups where the players within each group would be similar
to each other. Given the attributes we used to describe each player, in this
case, the output would be a split of all the players into two groups, where
one group would semantically represent the engaged players and the second
group would semantically represent the unengaged players.
With unsupervised learning, we did not provide specific examples of which
players are considered engaged and which are considered unengaged. We just
defined the appropriate attributes and relied on the algorithm to uncover
the two groups on its own. This type of data set is typically called an
unlabeled data set as it is lacking these direct labels. Consequently,
unsupervised learning can be helpful in situations where these labels can be
expensive or hard to produce. In the next paragraph, we overview supervised
learning algorithms which accept input labels in addition to attributes.
## Supervised Learning
In supervised learning, we do not want to just group similar items but directly
learn a mapping from each item to the group (or class) that it belongs to.
Returning to our earlier example of
clustering players, let's say we now wish to predict which of our players are
about to churn (that is stop playing the game for the next 30 days). We
can look into our historical records and create a data set that
contains attributes of our players in addition to a label indicating whether
they have churned or not. Note that the player attributes we use for this
churn prediction task may be different from the ones we used for our earlier
clustering task. We can then feed this data set (attributes **and** label for
each player) into a supervised learning algorithm which would learn a mapping
from the player attributes to a label indicating whether that player
will churn or not. The intuition is that the supervised learning algorithm
will learn which values of these attributes typically correspond to players
who have churned and not churned (for example, it may learn that players
who spend very little and play for very short periods will most likely churn).
Now given this learned model, we can provide it the attributes of a
new player (one that recently started playing the game) and it would output
a _predicted_ label for that player. This prediction is the algorithms
expectation of whether the player will churn or not.
We can now use these predictions to target the players
who are expected to churn and entice them to continue playing the game.
As you may have noticed, for both supervised and unsupervised learning, there
are two tasks that need to be performed: attribute selection and model
selection. Attribute selection (also called feature selection) pertains to
selecting how we wish to represent the entity of interest, in this case, the
player. Model selection, on the other hand, pertains to selecting the
algorithm (and its parameters) that perform the task well. Both of these
tasks are active areas of machine learning research and, in practice, require
several iterations to achieve good performance.
We now switch to reinforcement learning, the third class of
machine learning algorithms, and arguably the one most relevant for ML-Agents.
## Reinforcement Learning
Reinforcement learning can be viewed as a form of learning for sequential
decision making that is commonly associated with controlling robots (but is,
in fact, much more general). Consider an autonomous firefighting robot that is
tasked with navigating into an area, finding the fire and neutralizing it. At
any given moment, the robot perceives the environment through its sensors (e.g.
camera, heat, touch), processes this information and produces an action (e.g.
move to the left, rotate the water hose, turn on the water). In other words,
it is continuously making decisions about how to interact in this environment
given its view of the world (i.e. sensors input) and objective (i.e.
neutralizing the fire). Teaching a robot to be a successful firefighting
machine is precisely what reinforcement learning is designed to do.
More specifically, the goal of reinforcement learning is to learn a **policy**,
which is essentially a mapping from **observations** to **actions**. An
observation is what the robot can measure from its **environment** (in this
case, all its sensory inputs) and an action, in its most raw form, is a change
to the configuration of the robot (e.g. position of its base, position of
its water hose and whether the hose is on or off).
The last remaining piece
of the reinforcement learning task is the **reward signal**. When training a
robot to be a mean firefighting machine, we provide it with rewards (positive
and negative) indicating how well it is doing on completing the task.
Note that the robot does not _know_ how to put out fires before it is trained.
It learns the objective because it receives a large positive reward when it puts
out the fire and a small negative reward for every passing second. The fact that
rewards are sparse (i.e. may not be provided at every step, but only when a
robot arrives at a success or failure situation), is a defining characteristic of
reinforcement learning and precisely why learning good policies can be difficult
(and/or time-consuming) for complex environments.
<p align="center">
<img src="images/rl_cycle.png" alt="The reinforcement learning cycle."/>
</p>
[Learning a policy](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/)
usually requires many trials and iterative
policy updates. More specifically, the robot is placed in several
fire situations and over time learns an optimal policy which allows it
to put our fires more effectively. Obviously, we cannot expect to train a
robot repeatedly in the real world, particularly when fires are involved. This
is precisely why the use of
[Unity as a simulator](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
serves as the perfect training grounds for learning such behaviors.
While our discussion of reinforcement learning has centered around robots,
there are strong parallels between robots and characters in a game. In fact,
in many ways, one can view a non-playable character (NPC) as a virtual
robot, with its own observations about the environment, its own set of actions
and a specific objective. Thus it is natural to explore how we can
train behaviors within Unity using reinforcement learning. This is precisely
what ML-Agents offers. The video linked below includes a reinforcement
learning demo showcasing training character behaviors using ML-Agents.
<p align="center">
<a href="http://www.youtube.com/watch?feature=player_embedded&v=fiQsmdwEGT8" target="_blank">
<img src="http://img.youtube.com/vi/fiQsmdwEGT8/0.jpg" alt="RL Demo" width="400" border="10" />
</a>
</p>
Similar to both unsupervised and supervised learning, reinforcement learning
also involves two tasks: attribute selection and model selection.
Attribute selection is defining the set of observations for the robot
that best help it complete its objective, while model selection is defining
the form of the policy (mapping from observations to actions) and its
parameters. In practice, training behaviors is an iterative process that may
require changing the attribute and model choices.
## Training and Inference
One common aspect of all three branches of machine learning is that they
all involve a **training phase** and an **inference phase**. While the
details of the training and inference phases are different for each of the
three, at a high-level, the training phase involves building a model
using the provided data, while the inference phase involves applying this
model to new, previously unseen, data. More specifically:
* For our unsupervised learning
example, the training phase learns the optimal two clusters based
on the data describing existing players, while the inference phase assigns a
new player to one of these two clusters.
* For our supervised learning example, the
training phase learns the mapping from player attributes to player label
(whether they churned or not), and the inference phase predicts whether
a new player will churn or not based on that learned mapping.
* For our reinforcement learning example, the training phase learns the
optimal policy through guided trials, and in the inference phase, the agent
observes and tales actions in the wild using its learned policy.
To briefly summarize: all three classes of algorithms involve training
and inference phases in addition to attribute and model selections. What
ultimately separates them is the type of data available to learn from. In
unsupervised learning our data set was a collection of attributes, in
supervised learning our data set was a collection of attribute-label pairs,
and, lastly, in reinforcement learning our data set was a collection of
observation-action-reward tuples.
## Deep Learning
To be completed.
Link to TensorFlow background page.

37
docs/Background-TensorFlow.md


# Background: TensorFlow
**Work In Progress**
## TensorFlow
[TensorFlow](https://www.tensorflow.org/) is a deep learning library.
Link to Arthur's content?
A few words about TensorFlow and why/how it is relevant would be nice.
TensorFlow is used for training the machine learning models in ML-Agents.
Unless you are implementing new algorithms, the use of TensorFlow
is mostly abstracted away and behind the scenes.
## TensorBoard
One component of training models with TensorFlow is setting the
values of certain model attributes (called _hyperparameters_). Finding the
right values of these hyperparameters can require a few iterations.
Consequently, we leverage a visualization tool within TensorFlow called
[TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard).
It allows the visualization of certain agent attributes (e.g. reward)
throughout training which can be helpful in both building
intuitions for the different hyperparameters and setting the optimal values for
your Unity environment. We provide more details on setting the hyperparameters
in later parts of the documentation, but, in the meantime, if you are
unfamiliar with TensorBoard we recommend this
[tutorial](https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial).
## TensorFlowSharp
Third-party used in Internal Brain mode.

18
docs/Background-Unity.md


# Background: Unity
If you are not familiar with the [Unity Engine](https://unity3d.com/unity),
we highly recommend the
[Unity Manual](https://docs.unity3d.com/Manual/index.html) and
[Tutorials page](https://unity3d.com/learn/tutorials). The
[Roll-a-ball tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial)
is sufficient to learn all the basic concepts of Unity to get started with
ML-Agents:
* [Editor](https://docs.unity3d.com/Manual/UsingTheEditor.html)
* [Interface](https://docs.unity3d.com/Manual/LearningtheInterface.html)
* [Scene](https://docs.unity3d.com/Manual/CreatingScenes.html)
* [GameObjects](https://docs.unity3d.com/Manual/GameObjects.html)
* [Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html)
* [Camera](https://docs.unity3d.com/Manual/Cameras.html)
* [Scripting](https://docs.unity3d.com/Manual/ScriptingSection.html)
* [Ordering of event functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
(e.g. FixedUpdate, Update)

20
docs/Contribution-Guidelines.md


# Contribution Guidelines
Reference code of conduct.
## GitHub Workflow
## Environments
We are also actively open to adding community contributed environments as
examples, as long as they are small, simple, demonstrate a unique feature of
the platform, and provide a unique non-trivial challenge to modern
machine learning algorithms. Feel free to submit these environments with a
Pull-Request explaining the nature of the environment and task.
TODO: above paragraph needs expansion.
## Algorithms
## Style Guide

3
docs/Glossary.md


# ML-Agents Glossary
**Work In Progress**

3
docs/Installation-Docker.md


# Docker Set-up _[Experimental]_
**Work In Progress**

61
docs/Installation.md


# Installation & Set-up
To install and use ML-Agents, you need install Unity, clone this repository
and install Python with additional dependencies. Each of the subsections
below overviews each step, in addition to an experimental Docker set-up.
## Install **Unity 2017.1** or Later
[Download](https://store.unity.com/download) and install Unity.
## Clone the ml-agents Repository
Once installed, you will want to clone the ML-Agents GitHub repository.
git clone git@github.com:Unity-Technologies/ml-agents.git
The `unity-environment` directory in this repository contains the Unity Assets
to add to your projects. The `python` directory contains the training code.
Both directories are located at the root of the repository.
## Install Python
In order to use ML-Agents, you need Python (2 or 3; 64 bit required) along with
the dependencies listed in the [requirements file](../python/requirements.txt).
Some of the primary dependencies include:
- [TensorFlow](Background-TensorFlow.md)
- [Jupyter](Background-Jupyter.md)
### Windows Users
If you are a Windows user who is new to Python and TensorFlow, follow
[this guide](https://unity3d.college/2017/10/25/machine-learning-in-unity3d-setting-up-the-environment-tensorflow-for-agentml-on-windows-10/)
to set up your Python environment.
### Mac and Unix Users
If your Python environment doesn't include `pip`, see these
[instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
on installing it.
To install dependencies, go into the `python` subdirectory of the repository,
and run (depending on your Python version) from the command line:
pip install .
or
pip3 install .
## Docker-based Installation _[Experimental]_
If you'd like to use Docker for ML-Agents, please follow
[this guide](Using-Docker.md).
## Help
If you run into any problems installing ML-Agents,
[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to cite relevant information on OS, Python version, and exact error
message (whenever possible).

25
docs/Learning-Environment-Best-Practices.md


# Environment Design Best Practices
## General
* It is often helpful to start with the simplest version of the problem, to ensure the agent can learn it. From there increase
complexity over time. This can either be done manually, or via Curriculum Learning, where a set of lessons which progressively increase in difficulty are presented to the agent ([learn more here](Training-Curriculum-Learning.md)).
* When possible, it is often helpful to ensure that you can complete the task by using a Player Brain to control the agent.
## Rewards
* The magnitude of any given reward should typically not be greater than 1.0 in order to ensure a more stable learning process.
* Positive rewards are often more helpful to shaping the desired behavior of an agent than negative rewards.
* For locomotion tasks, a small positive reward (+0.1) for forward velocity is typically used.
* If you want the agent to finish a task quickly, it is often helpful to provide a small penalty every step (-0.05) that the agent does not complete the task. In this case completion of the task should also coincide with the end of the episode.
* Overly-large negative rewards can cause undesirable behavior where an agent learns to avoid any behavior which might produce the negative reward, even if it is also behavior which can eventually lead to a positive reward.
## States
* States should include all variables relevant to allowing the agent to take the optimally informed decision.
* Categorical state variables such as type of object (Sword, Shield, Bow) should be encoded in one-hot fashion (ie `3` -> `0, 0, 1`).
* Besides encoding non-numeric values, all inputs should be normalized to be in the range 0 to +1 (or -1 to 1). For example rotation information on GameObjects should be recorded as `state.Add(transform.rotation.eulerAngles.y/180.0f-1.0f);` rather than `state.Add(transform.rotation.y);`. See the equation below for one approach of normaliztaion.
* Positional information of relevant GameObjects should be encoded in relative coordinates wherever possible. This is often relative to the agent position.
![normalization](images/normalization.png)
## Actions
* When using continuous control, action values should be clipped to an appropriate range.
* Be sure to set the action-space-size to the number of used actions, and not greater, as doing the latter can interfere with the efficency of the training process.

398
docs/Learning-Environment-Create-New.md


# Making a new Learning Environment
This tutorial walks through the process of creating a Unity Environment. A Unity Environment is an application built using the Unity Engine which can be used to train Reinforcement Learning agents.
![A simple ML-Agents environment](images/mlagents-NewTutSplash.png)
In this example, we will train a ball to roll to a randomly placed cube. The ball also learns to avoid falling off the platform.
## Overview
Using ML-Agents in a Unity project involves the following basic steps:
1. Create an environment for your agents to live in. An environment can range from a simple physical simulation containing a few objects to an entire game or ecosystem.
2. Implement an Academy subclass and add it to a GameObject in the Unity scene containing the environment. This GameObject will serve as the parent for any Brain objects in the scene. Your Academy class can implement a few optional methods to update the scene independently of any agents. For example, you can add, move, or delete agents and other entities in the environment.
3. Add one or more Brain objects to the scene as children of the Academy.
4. Implement your Agent subclasses. An Agent subclass defines the code an agent uses to observe its environment, to carry out assigned actions, and to calculate the rewards used for reinforcement training. You can also implement optional methods to reset the agent when it has finished or failed its task.
5. Add your Agent subclasses to appropriate GameObjects, typically, the object in the scene that represents the agent in the simulation. Each Agent object must be assigned a Brain object.
6. If training, set the Brain type to External and [run the training process](Training-PPO.md).
**Note:** If you are unfamiliar with Unity, refer to [Learning the interface](https://docs.unity3d.com/Manual/LearningtheInterface.html) in the Unity Manual if an Editor task isn't explained sufficiently in this tutorial.
If you haven't already, follow the [installation instructions](Installation.md).
## Set Up the Unity Project
The first task to accomplish is simply creating a new Unity project and importing the ML-Agents assets into it:
1. Launch the Unity Editor and create a new project named "RollerBall".
2. In a file system window, navigate to the folder containing your cloned ML-Agents repository.
3. Drag the `ML-Agents` folder from `unity-environments/Assets` to the Unity Editor Project window.
Your Unity **Project** window should contain the following assets:
![Project window](images/mlagents-NewProject.png)
## Create the Environment:
Next, we will create a very simple scene to act as our ML-Agents environment. The "physical" components of the environment include a Plane to act as the floor for the agent to move around on, a Cube to act as the goal or target for the agent to seek, and a Sphere to represent the agent itself.
**Create the floor plane:**
1. Right click in Hierarchy window, select 3D Object > Plane.
2. Name the GameObject "Floor."
3. Select Plane to view its properties in the Inspector window.
4. Set Transform to Position = (0,0,0), Rotation = (0,0,0), Scale = (1,1,1).
5. On the Plane's Mesh Renderer, expand the Materials property and change the default-material to *floor*.
(To set a new material, click the small circle icon next to the current material name. This opens the **Object Picker** dialog so that you can choose the a different material from the list of all materials currently in the project.)
![The Floor in the Inspector window](images/mlagents-NewTutFloor.png)
**Add the Target Cube**
1. Right click in Hierarchy window, select 3D Object > Cube.
2. Name the GameObject "Target"
3. Select Target to view its properties in the Inspector window.
4. Set Transform to Position = (3,0.5,3), Rotation = (0,0,0), Scale = (1,1,1).
5. On the Cube's Mesh Renderer, expand the Materials property and change the default-material to *block*.
![The Target Cube in the Inspector window](images/mlagents-NewTutBlock.png)
**Add the Agent Sphere**
1. Right click in Hierarchy window, select 3D Object > Sphere.
2. Name the GameObject "RollerAgent"
3. Select Target to view its properties in the Inspector window.
4. Set Transform to Position = (0,0.5,0), Rotation = (0,0,0), Scale = (1,1,1).
5. On the Sphere's Mesh Renderer, expand the Materials property and change the default-material to *checker 1*.
6. Click **Add Component**.
7. Add the Physics/Rigidbody component to the Sphere. (Adding a Rigidbody )
![The Agent GameObject in the Inspector window](images/mlagents-NewTutSphere.png)
Note that we will create an Agent subclass to add to this GameObject as a component later in the tutorial.
**Add Empty GameObjects to Hold the Academy and Brain**
1. Right click in Hierarchy window, select Create Empty.
2. Name the GameObject "Academy"
3. Right-click on the Academy GameObject and select Create Empty.
4. Name this child of the Academy, "Brain".
![The scene hierarchy](images/mlagents-NewTutHierarchy.png)
You can adjust the camera angles to give a better view of the scene at runtime. The next steps will be to create and add the ML-Agent components.
## Implement an Academy
The Academy object coordinates the ML-Agents in the scene and drives the decision-making portion of the simulation loop. Every ML-Agent scene needs one Academy instance. Since the base Academy classis abstract, you must make your own subclass even if you don't need to use any of the methods for a particular environment.
First, add a New Script component to the Academy GameObject created earlier:
1. Select the Academy GameObject to view it in the Inspector window.
2. Click **Add Component**.
3. Click **New Script** in the list of components (at the bottom).
4. Name the script "RollerAcademy".
5. Click **Create and Add**.
Next, edit the new `RollerAcademy` script:
1. In the Unity Project window, double-click the `RollerAcademy` script to open it in your code editor. (By default new scripts are placed directly in the **Assets** folder.)
2. In the editor, change the base class from `MonoBehaviour` to `Academy`.
3. Delete the `Start()` and `Update()` methods that were added by default.
In such a basic scene, we don't need the Academy to initialize, reset, or otherwise control any objects in the environment so we have the simplest possible Academy implementation:
public class RollerAcademy : Academy { }
The default settings for the Academy properties are also fine for this environment, so we don't need to change anything for the RollerAcademy component in the Inspector window.
![The Academy properties](images/mlagents-NewTutAcademy.png)
## Add a Brain
The Brain object encapsulates the decision making process. An Agent sends its observations to its Brain and expects a decision in return. The Brain Type setting determines how the Brain makes decisions. Unlike the Academy and Agent classes, you don't make your own Brain subclasses. (You can extend CoreBrain to make your own *types* of Brain, but the four built-in brain types should cover almost all scenarios.)
To create the Brain:
1. Right-click the Academy GameObject in the Hierarchy window and choose *Create Empty* to add a child GameObject.
2. Name the new GameObject, "Brain".
3. Select the Brain GameObject to show its properties in the Inspector window.
4. Click **Add Component**.
5. Select the **Scripts/Brain** component to add it to the GameObject.
We will come back to the Brain properties later, but leave the Brain Type as **Player** for now.
![The Brain default properties](images/mlagents-NewTutBrain.png)
## Implement an Agent
To create the Agent:
1. Select the RollerAgent GameObject to view it in the Inspector window.
2. Click **Add Component**.
3. Click **New Script** in the list of components (at the bottom).
4. Name the script "RollerAgent".
5. Click **Create and Add**.
Then, edit the new `RollerAgent` script:
1. In the Unity Project window, double-click the `RollerAgent` script to open it in your code editor.
2. In the editor, change the base class from `MonoBehaviour` to `Agent`.
3. Delete the `Update()` method, but we will use the `Start()` function, so leave it alone for now.
So far, these are the basic steps that you would use to add ML-Agents to any Unity project. Next, we will add the logic that will let our agent learn to roll to the cube.
In this simple scenario, we don't use the Academy object to control the environment. If we wanted to change the environment, for example change the size of the floor or add or remove agents or other objects before or during the simulation, we could implement the appropriate methods in the Academy. Instead, we will have the Agent do all the work of resetting itself and the target when it succeeds or falls trying.
**Initialization and Resetting the Agent**
When the agent reaches its target, it marks itself done and its agent reset function moves the target to a random location. In addition, if the agent rolls off the platform, the reset function puts it back onto the floor.
To move the target GameObject, we need a reference to its Transform (which stores a GameObject's position, orientation and scale in the 3D world). To get this reference, add a public field of type `Transform` to the RollerAgent class. Public fields of a component in Unity get displayed in the Inspector window, allowing you to choose which GameObject to use as the target in the Unity Editor. To reset the agent's velocity (and later to apply force to move the agent) we need a reference to the Rigidbody component. A [Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) is Unity's primary element for physics simulation. (See [Physics](https://docs.unity3d.com/Manual/PhysicsSection.html) for full documentation of Unity physics.) Since the Rigidbody component is on the same GameObject as our Agent script, the best way to get this reference is using `GameObject.GetComponent<T>()`, which we can call in our script's `Start()` method.
So far, our RollerAgent script looks like:
using System.Collections.Generic;
using UnityEngine;
public class RollerAgent : Agent
{
Rigidbody rBody;
void Start () {
rBody = GetComponent<Rigidbody>();
}
public Transform Target;
public override void AgentReset()
{
if (this.transform.position.y < -1.0)
{
// The agent fell
this.transform.position = Vector3.zero;
this.rBody.angularVelocity = Vector3.zero;
this.rBody.velocity = Vector3.zero;
}
else
{
// Move the target to a new spot
Target.position = new Vector3(Random.value * 8 - 4,
0.5f,
Random.value * 8 - 4);
}
}
}
Next, let's implement the Agent.CollectState() function.
**Observing the Environment**
The Agent sends the information we collect to the Brain, which uses it to make a decision. When you train the agent using the PPO training algorithm (or use a trained PPO model), the data is fed into a neural network as a feature vector. For an agent to successfully learn a task, we need to provide the correct information. A good rule of thumb for deciding what information to collect is to consider what you would need to calculate an analytical solution to the problem.
In our case, the information our agent collects includes:
* Position of the target. In general, it is better to use the relative position of other objects rather than the absolute position for more generalizable training. Note that the agent only collects the x and z coordinates since the floor is aligned with the xz plane and the y component of the target's position never changes.
// Calculate relative position
Vector3 relativePosition = Target.position - this.transform.position;
// Relative position
observation.Add(relativePosition.x/5);
observation.Add(relativePosition.z/5);
* Position of the agent itself within the confines of the floor. This data is collected as the agent's distance from each edge of the floor.
// Distance to edges of platform
observation.Add((this.transform.position.x + 5) / 5);
observation.Add((this.transform.position.x - 5) / 5);
observation.Add((this.transform.position.z + 5) / 5);
observation.Add((this.transform.position.z - 5) / 5);
* The velocity of the agent. This helps the agent learn to control its speed so it doesn't overshoot the target and roll off the platform.
// Agent velocity
observation.Add(rBody.velocity.x/5);
observation.Add(rBody.velocity.z/5);
All the values are divided by 5 to normalize the inputs to the neural network to the range [-1,1]. (The number five is used because the platform is 10 units across.)
In total, the state observation contains 8 values and we need to use the continuous state space when we get around to setting the Brain properties:
List<float> observation = new List<float>();
public override List<float> CollectState()
{
// Remove last step's observations from the list
observation.Clear();
// Calculate relative position
Vector3 relativePosition = Target.position - this.transform.position;
// Relative position
observation.Add(relativePosition.x/5);
observation.Add(relativePosition.z/5);
// Distance to edges of platform