
Update New Environment Doc (#1404)

* Simplified rewards and observations; Determined better settings for training within a reasonable amount of time.

* Simplified Agent rewards; Added training section that discusses hyperparameters.

* Added note about DecisionFrequency.

* Updated screenshots and a small clarification in the text.

* Tested and updated using v0.6.

* Update a couple of images, minor text edit.

* Replace with more recent training stats.

* resolve a couple of minor review commnts.

* Increased the recommended batch and buffer size hyperparameter values.

* Fix 2 typos.
GitHub 6 年前
共有 10 个文件被更改,包括 2632 次插入2111 次删除
  1. 319
  2. 277
  3. 360
  4. 999
  5. 502
  6. 536
  7. 132
  8. 507
  9. 819
  10. 292


1. Create an environment for your agents to live in. An environment can range
from a simple physical simulation containing a few objects to an entire game
or ecosystem.
from a simple physical simulation containing a few objects to an entire game
or ecosystem.
containing the environment. Your Academy class can implement a few optional
methods to update the scene independently of any agents. For example, you can
add, move, or delete agents and other entities in the environment.
3. Create one or more Brain assets by clicking `Assets -> Create -> ML-Agents
-> Bain`. And name them appropriately.
containing the environment. Your Academy class can implement a few optional
methods to update the scene independently of any agents. For example, you can
add, move, or delete agents and other entities in the environment.
3. Create one or more Brain assets by clicking **Assets** > **Create** >
**ML-Agents** > **Brain**, and naming them appropriately.
uses to observe its environment, to carry out assigned actions, and to
calculate the rewards used for reinforcement training. You can also implement
optional methods to reset the Agent when it has finished or failed its task.
uses to observe its environment, to carry out assigned actions, and to
calculate the rewards used for reinforcement training. You can also implement
optional methods to reset the Agent when it has finished or failed its task.
in the scene that represents the Agent in the simulation. Each Agent object
must be assigned a Brain object.
in the scene that represents the Agent in the simulation. Each Agent object
must be assigned a Brain object.
[run the training process](Training-ML-Agents.md).
[run the training process](Training-ML-Agents.md).
**Note:** If you are unfamiliar with Unity, refer to
[Learning the interface](https://docs.unity3d.com/Manual/LearningtheInterface.html)

importing the ML-Agents assets into it:
1. Launch the Unity Editor and create a new project named "RollerBall".
2. In a file system window, navigate to the folder containing your cloned
ML-Agents repository.
3. Drag the `ML-Agents` folder from `UnitySDK/Assets` to the Unity Editor
Project window.
4. Setup the ML-Agents toolkit by following the instructions [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Basic-Guide.md#setting-up-the-ml-agents-toolkit-within-unity).
2. Make sure that the Scripting Runtime Version for the project is set to use
**.NET 4.x Equivalent** (This is an experimental option in Unity 2017,
but is the default as of 2018.3.)
3. In a file system window, navigate to the folder containing your cloned
ML-Agents repository.
4. Drag the `ML-Agents` folder from `UnitySDK/Assets` to the Unity Editor
Project window.
Your Unity **Project** window should contain the following assets:

1. Right click in Hierarchy window, select 3D Object > Plane.
2. Name the GameObject "Floor."
3. Select Plane to view its properties in the Inspector window.
4. Set Transform to Position = (0,0,0), Rotation = (0,0,0), Scale = (1,1,1).
4. Set Transform to Position = (0, 0, 0), Rotation = (0, 0, 0), Scale = (1, 1, 1).
default-material to *floor*.
default-material to *LightGridFloorSquare* (or any suitable material of your choice).
name. This opens the **Object Picker** dialog so that you can choose the a
name. This opens the **Object Picker** dialog so that you can choose a
different material from the list of all materials currently in the project.)
![The Floor in the Inspector window](images/mlagents-NewTutFloor.png)

1. Right click in Hierarchy window, select 3D Object > Cube.
2. Name the GameObject "Target"
3. Select Target to view its properties in the Inspector window.
4. Set Transform to Position = (3,0.5,3), Rotation = (0,0,0), Scale = (1,1,1).
4. Set Transform to Position = (3, 0.5, 3), Rotation = (0, 0, 0), Scale = (1, 1, 1).
default-material to *Block*.
default-material to *Block*.
![The Target Cube in the Inspector window](images/mlagents-NewTutBlock.png)

2. Name the GameObject "RollerAgent"
3. Select Target to view its properties in the Inspector window.
4. Set Transform to Position = (0,0.5,0), Rotation = (0,0,0), Scale = (1,1,1).
4. Set Transform to Position = (0, 0.5, 0), Rotation = (0, 0, 0), Scale = (1, 1, 1).
default-material to *checker 1*.
default-material to *CheckerSquare*.
7. Add the Physics/Rigidbody component to the Sphere. (Adding a Rigidbody)
7. Add the Physics/Rigidbody component to the Sphere.
![The Agent GameObject in the Inspector window](images/mlagents-NewTutSphere.png)

Next, edit the new `RollerAcademy` script:
1. In the Unity Project window, double-click the `RollerAcademy` script to open
it in your code editor. (By default new scripts are placed directly in the
**Assets** folder.)
2. In the editor, change the base class from `MonoBehaviour` to `Academy`.
3. Delete the `Start()` and `Update()` methods that were added by default.
it in your code editor. (By default new scripts are placed directly in the
**Assets** folder.)
2. In the code editor, add the statement, `using MLAgents;`.
3. Change the base class from `MonoBehaviour` to `Academy`.
4. Delete the `Start()` and `Update()` methods that were added by default.
In such a basic scene, we don't need the Academy to initialize, reset, or
otherwise control any objects in the environment so we have the simplest

![The Academy properties](images/mlagents-NewTutAcademy.png)
## Add Brains
## Add Brain Assets
The Brain object encapsulates the decision making process. An Agent sends its
observations to its Brain and expects a decision in return. The type of the Brain

1. Go to `Assets -> Create -> ML-Agents` and select the type of Brain you want to
create. In this tutorial, we will create a **Learning Brain** and
a **Player Brain**.
1. Go to **Assets** > **Create** > **ML-Agents** and select the type of Brain asset
you want to create. For this tutorial, create a **Learning Brain** and
a **Player Brain**.
![Creating a Brain Asset](images/mlagents-NewTutBrain.png)
![The Brain default properties](images/mlagents-NewTutBrain.png)
## Implement an Agent

1. In the Unity Project window, double-click the `RollerAgent` script to open it
in your code editor.
2. In the editor, change the base class from `MonoBehaviour` to `Agent`.
2. In the editor, add the `using MLAgents;` statement and then change the base
class from `MonoBehaviour` to `Agent`.
leave it alone for now.
leave it alone for now.
So far, these are the basic steps that you would use to add ML-Agents to any
Unity project. Next, we will add the logic that will let our Agent learn to roll

this reference, add a public field of type `Transform` to the RollerAgent class.
Public fields of a component in Unity get displayed in the Inspector window,
allowing you to choose which GameObject to use as the target in the Unity
Editor. To reset the Agent's velocity (and later to apply force to move the
To reset the Agent's velocity (and later to apply force to move the
agent) we need a reference to the Rigidbody component. A
[Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) is Unity's
primary element for physics simulation. (See

if (this.transform.position.y < 0)
// The Agent fell
this.transform.position = new Vector3(0, 0.5f, 0);
// If the Agent fell, zero its momentum
// Move the target to a new spot
Target.position = new Vector3(Random.value * 8 - 4,
Random.value * 8 - 4);
this.transform.position = new Vector3( 0, 0.5f, 0);
// Move the target to a new spot
Target.position = new Vector3(Random.value * 8 - 4,
Random.value * 8 - 4);
Next, let's implement the Agent.CollectObservations() function.
Next, let's implement the `Agent.CollectObservations()` method.
### Observing the Environment

In our case, the information our Agent collects includes:
* Position of the target. In general, it is better to use the relative position
of other objects rather than the absolute position for more generalizable
training. Note that the Agent only collects the x and z coordinates since the
floor is aligned with the x-z plane and the y component of the target's
position never changes.
* Position of the target.
// Calculate position relative to the target
Vector3 relativePosition = Target.position - this.transform.position;
// Position relative to the target
AddVectorObs(relativePosition.x / 5);
AddVectorObs(relativePosition.z / 5);
* Position of the Agent itself relative to the size of the floor (which is 10)
* Position of the Agent itself.
// Relative position
AddVectorObs(this.transform.position.x / 5);
AddVectorObs(this.transform.position.z / 5);
it doesn't overshoot the target and roll off the platform.
it doesn't overshoot the target and roll off the platform.
AddVectorObs(rBody.velocity.x / 5);
AddVectorObs(rBody.velocity.z / 5);
All the values are divided to normalize the inputs to the neural network to
the range [-1,1]. (The platform is a square which reaches from positions -5 to +5
thereby having an edge length of 10 units.)
In total, the state observation contains 6 values and we need to use the
In total, the state observation contains 8 values and we need to use the
// Calculate position relative to the target
Vector3 relativePosition = Target.position - this.transform.position;
// Position relative to the target
AddVectorObs(relativePosition.x / 5);
AddVectorObs(relativePosition.z / 5);
// Relative position
AddVectorObs(this.transform.position.x / 10);
AddVectorObs(this.transform.position.x / 10);
// Target and Agent positions
The final part of the Agent code is the Agent.AgentAction() function, which
receives the decision from the Brain.
The final part of the Agent code is the `Agent.AgentAction()` method, which
receives the decision from the Brain and assigns the reward.
the `Vector Action Space Type` and `Vector Action Space Size` settings of the
the `Vector Action` `Space Type` and `Space Size` settings of the
`Vector Action Size` to 2. The first element,`action[0]` determines the force
`Space Size` to 2. The first element,`action[0]` determines the force
to set `Vector Action Size` to 3. Each of these values returned by the network
are between `-1` and `1.` Note the Brain really has no idea what the values in
to set `Vector Action Size` to 3.) Note that the Brain really has no idea what the values in
The RollerAgent applies the values from the action[] array to its Rigidbody
The RollerAgent applies the values from the `action[]` array to its Rigidbody
component, `rBody`, using the `Rigidbody.AddForce` function:

### Rewards
Reinforcement learning requires rewards. Assign rewards in the `AgentAction()`
function. The learning algorithm uses the rewards assigned to the Agent at each
step in the simulation and learning process to determine whether it is giving
function. The learning algorithm uses the rewards assigned to the Agent during
the simulation and learning process to determine whether it is giving
assigned task (reaching the Target cube, in this case) and punish the Agent if
it irrevocably fails (falls off the platform). You can sometimes speed up
training with sub-rewards that encourage behavior that helps the Agent complete
the task. For example, the RollerAgent reward system provides a small reward if
the Agent moves closer to the target in a step and a small negative reward at
each step which encourages the Agent to complete its task quickly.
assigned task. In this case, the Agent is given a reward of 1.0 for reaching the
Target cube.
When it does, the code increments the Agent.reward variable by 1.0 and marks the
agent as finished by setting the Agent to done.
When it does, the code calls the `Agent.SetReward()` method to assign a
reward of 1.0 and marks the agent as finished by calling the `Done()` method
on the Agent.
float distanceToTarget = Vector3.Distance(this.transform.position,


`ResetOnDone` mechanism and doesn't set a `Max Steps` limit for the Academy (so
it never resets the environment).
It can also encourage an Agent to finish a task more quickly by assigning a
negative reward at each step:
// Time penalty
Finally, to punish the Agent for falling off the platform, assign a large
negative reward and, of course, set the Agent to done so that it resets itself
in the next step:
Finally, if the Agent falls off the platform, set the Agent to done so that it can reset itself:

public float speed = 10;
private float previousDistance = float.MaxValue;
// Actions, size = 2
Vector3 controlSignal = Vector3.zero;
controlSignal.x = vectorAction[0];
controlSignal.z = vectorAction[1];
rBody.AddForce(controlSignal * speed);
// Rewards
float distanceToTarget = Vector3.Distance(this.transform.position,

// Time penalty
// Actions, size = 2
Vector3 controlSignal = Vector3.zero;
controlSignal.x = vectorAction[0];
controlSignal.z = vectorAction[1];
rBody.AddForce(controlSignal * speed);
Note the `speed` and `previousDistance` class variables defined before the
Note the `speed` class variable defined before the
function. Since `speed` is public, you can set the value from the Inspector

to connect everything together in the Unity Editor. This involves assigning the
Brain asset to the Agent, changing some of the Agent Components properties, and
Brain asset to the Agent, changing some of the Agent Component's properties, and
Brains to the **Broadcast Hub**.
2. Select the RollerAgent GameObject to show its properties in the Inspector
3. Drag the Brain `RollerBallPlayer` from the Project window to the
RollerAgent `Brain` field.
4. Change `Decision Frequency` from `1` to `5`.
Brains to the **Broadcast Hub**.
2. Select the **RollerAgent** GameObject to show its properties in the Inspector
3. Drag the Brain **RollerBallPlayer** from the Project window to the
RollerAgent **Brain** field.
4. Change **Decision Frequency** from `1` to `10`.
5. Drag the Target GameObject from the Hierarchy window to the RollerAgent
Target field.
Also, drag the Target GameObject from the Hierarchy window to the RollerAgent
Target field.
Finally, select the **RollerBallBrain** Asset in the **Project** window so that you can
see its properties in the Inspector window. Set the following properties:
Finally, select the the `RollerBallBrain` and `RollerBallPlayer` Brain assets
so that you can edit their properties in the Inspector window. Set the following
properties on both of them:
* `Vector Observation` `Space Size` = 8
* `Vector Action` `Space Type` = **Continuous**
* `Vector Action` `Space Size` = 2
* `Vector Observation Space Size` = 6
* `Vector Action Space Type` = **Continuous**
* `Vector Action Space Size` = 2
Select the **RollerBallPlayer** Asset in the **Project** window and set the same
property values.
Now you are ready to test the environment before training.

positive values and one to specify negative values for each action, for a total
of four keys.
1. Select the `RollerBallPlayer` Brain to view its properties in the Inspector.
2. Expand the **Continuous Player Actions** dictionary (only visible when using
a **PlayerBrain**).
1. Select the `RollerBallPlayer` Aset to view its properties in the Inspector.
2. Expand the **Key Continuous Player Actions** dictionary (only visible when using
a **PlayerBrain**).
3. Set **Size** to 4.
4. Set the following mappings:

`env_name` to the name of the environment file you specify when building this
## Training the Environment
Then drag the `RollerBallBrain` into the Academy's `Broadcast Hub` and check
to `Control` checkbox for that brain. From there, the process is
Then, select the Academy GameObject and check the `Control` checkbox for
the RollerBallBrain item in the **Broadcast Hub** list. From there, the process is
The hyperparameters for training are specified in the configuration file that you ls
pass to the `mlagents-learn` program. Using the default settings specified
in the `config/trainer_config.yaml` file (in your ml-agents folder), the
RollerAgent takes about 300,000 steps to train. However, you can change the
following hyperparameters to speed up training considerably (to under 20,000 steps):
batch_size: 10
buffer_size: 100
Since this example creates a very simple training environment with only a few inputs
and outputs, using small batch and buffer sizes speeds up the training considerably.
However, if you add more complexity to the environment or change the reward or
observation functions, you might also find that training performs better with different
hyperparameter values.
**Note:** In addition to setting these hyperparameter values, the Agent
**DecisionFrequency** parameter has a large effect on training time and success.
A larger value reduces the number of decisions the training algorithm has to consider and,
in this simple environment, speeds up training.
To train in the editor, run the following Python command from a Terminal or Console
window before pressing play:
mlagents-learn config/config.yaml --run-id=RollerBall-1 --train
(where `config.yaml` is a copy of `trainer_config.yaml` that you have edited
to change the `batch_size` and `buffer_size` hyperparameters for your brain.)
**Note:** If you get a `command not found` error when running this command, make sure
that you have followed the *Install Python and mlagents Package* section of the
ML-Agents [Installation](Installation.md) instructions.
To monitor the statistics of Agent performance during training, use
![TensorBoard statistics display](images/mlagents-RollerAgentStats.png)
In particular, the *cumulative_reward* and *value_estimate* statistics show how
well the Agent is achieving the task. In this example, the maximum reward an
Agent can earn is 1.0, so these statistics approach that value when the Agent
has successfully *solved* the problem.
**Note:** If you use TensorBoard, always increment or change the `run-id`
you pass to the `mlagents-learn` command for each training run. If you use
the same id value, the statistics for multiple runs are combined and become
difficult to interpret.
## Review: Scene Layout
This section briefly reviews how to organize your scene when using Agents in

to use Unity ML-Agents:
* Academy
* Agents
You also need to have brain assets linked appropriately to Agents and to Academy
to use Unity ML-Agents: an Academy and one or more Agents. You also need to
have brain assets linked appropriately to your Agents and to the Academy.
* You can only train Learning Brains that have been included into the Academy's
Broadcast Hub.
* You can only train Learning Brains that have been added to the Academy's Broadcast Hub list.


之前 之后
宽度: 499  |  高度: 361  |  大小: 48 KiB


之前 之后
宽度: 451  |  高度: 210  |  大小: 36 KiB



之前 之后
宽度: 438  |  高度: 708  |  大小: 101 KiB


之前 之后
宽度: 438  |  高度: 627  |  大小: 94 KiB


之前 之后
宽度: 307  |  高度: 271  |  大小: 29 KiB


之前 之后
宽度: 438  |  高度: 624  |  大小: 93 KiB


之前 之后
宽度: 653  |  高度: 287  |  大小: 88 KiB


之前 之后
宽度: 694  |  高度: 669  |  大小: 97 KiB