Joe Ward
7 年前
当前提交
ac5e6bc7
共有 16 个文件被更改,包括 293 次插入 和 203 次删除
-
5docs/Feature-Broadcasting.md
-
4docs/Feature-Memory.md
-
29docs/Learning-Environment-Design-Brains.md
-
4docs/Learning-Environment-Design-CoreBrains.md
-
23docs/Learning-Environment-Design-Player-Brains.md
-
2docs/Learning-Environment-Design.md
-
2docs/Learning-Environment-Examples.md
-
18docs/Python-API.md
-
93docs/Training-ML-Agents.md
-
16docs/Training-PPO.md
-
118docs/Using-TensorFlow-Sharp-in-Unity.md
-
53docs/Using-Tensorboard.md
-
15docs/dox-ml-agents.conf
-
62docs/Learning-Environment-Design-External-Internal-Brains.md
-
2docs/Learning-Environment-Heuristic-Brains.md
-
50docs/Learning-Environment-Design-Internal-Brains.md
|
|||
# Player Brain |
|||
|
|||
The **Player** brain type allows you to control an agent using keyboard commands. You can use Player brains to control a "teacher" agent that trains other agents during [imitation learning](Training-Imitation-Learning.md). You can also use Player brains to test your agents and environment before changing their brain types to **External** and running the training process. |
|||
|
|||
The **Player** brain properties allow you to assign one or more keyboard keys to each action and a unique value to send when a key is pressed. |
|||
If the action space is discrete, you must map input keys to their corresponding integer values. If the action space is continuous, you must map input keys to their corresponding indices and float values. |
|||
Note the differences between the discrete and continuous action spaces. When a brain uses the discrete action space, you can send one integer value as the action per step. In contrast, when a brain uses the continuous action space you can send any number of floating point values (up to the **Vector Action Space Size** setting). |
|||
|
|||
| **Property** | | **Description** | |
|||
| :-- |:-- | :-- | |
|||
|**Continuous Player Actions**|| The mapping for the continuous vector action space. Shown when the action space is **Continuous**|. |
|||
|| **Size** | The number of key commands defined. You can assign more than one command to the same action index in order to send different values for that action. (If you press both keys at the same time, deterministic results are not guarenteed.)| |
|||
||**Element 0–N**| The mapping of keys to action values. | |
|||
|| **Key** | The key on the keyboard. | |
|||
|| **Index** | The element of the agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index).| |
|||
|| **Value** | The value to send to the agent as its action for the specified index when the mapped key is pressed. All other members of the action vector are set to 0. | |
|||
|**Discrete Player Actions**|| The mapping for the discrete vector action space. Shown when the action space is **Discrete**.| |
|||
|| **Default Action** | The value to send when no keys are pressed.| |
|||
|| **Size** | The number of key commands defined. | |
|||
||**Element 0–N**| The mapping of keys to action values. | |
|||
|| **Key** | The key on the keyboard. | |
|||
|| **Value** | The value to send to the agent as its action when the mapped key is pressed.| |
|||
|
|||
For more information about the Unity input system, see [Input](https://docs.unity3d.com/ScriptReference/Input.html). |
|||
|
|
|||
# Using TensorBoard to Observe Training |
|||
|
|||
This document is still to be written. It will discuss using TensorBoard and interpreting the TensorBoard charts. |
|||
ML-Agents saves statistics during learning session that you can view with a TensorFlow utility named, [TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard). |
|||
|
|||
The `learn.py` program saves training statistics to a folder named `summaries`, organized by the `run-id` value you assign to a training session. |
|||
|
|||
In order to observe the training process, either during training or afterward, |
|||
start TensorBoard: |
|||
|
|||
1. Open a terminal or console window: |
|||
2. Navigate to the ml-agents/python folder. |
|||
3. From the command line run : |
|||
|
|||
tensorboard --logdir=summaries |
|||
|
|||
4. Open a browser window and navigate to [localhost:6006](http://localhost:6006). |
|||
|
|||
**Note:** If you don't assign a `run-id` identifier, `learn.py` uses the default string, "ppo". All the statistics will be saved to the same sub-folder and displayed as one session in TensorBoard. After a few runs, the displays can become difficult to interprete in this situation. You can delete the folders under the `summaries` directory to clear out old statistics. |
|||
|
|||
On the left side of the TensorBoard window, you can select which of the training runs you want to display. You can select multiple run-ids to compare statistics. The TensorBoard window also provides options for how to display and smooth graphs. |
|||
|
|||
When you run the training program, `learn.py`, you can use the `--save-freq` option to specify how frequently to save the statistics. |
|||
|
|||
## ML-Agents training statistics |
|||
|
|||
The ML-agents training program saves the following statistics: |
|||
|
|||
* Lesson - Plots the progress from lesson to lesson. Only interesting when performing |
|||
[curriculum training](Training-Curriculum-Learning.md). |
|||
|
|||
* Cumulative Reward - The mean cumulative episode reward over all agents. |
|||
Should increase during a successful training session. |
|||
|
|||
* Entropy - How random the decisions of the model are. Should slowly decrease |
|||
during a successful training process. If it decreases too quickly, the `beta` |
|||
hyperparameter should be increased. |
|||
|
|||
* Episode Length - The mean length of each episode in the environment for all |
|||
agents. |
|||
|
|||
* Learning Rate - How large a step the training algorithm takes as it searches |
|||
for the optimal policy. Should decrease over time. |
|||
|
|||
* Policy Loss - The mean loss of the policy function update. Correlates to how |
|||
much the policy (process for deciding actions) is changing. The magnitude of |
|||
this should decrease during a successful training session. |
|||
|
|||
* Value Estimate - The mean value estimate for all states visited by the agent. |
|||
Should increase during a successful training session. |
|||
|
|||
* Value Loss - The mean loss of the value function update. Correlates to how |
|||
well the model is able to predict the value of each state. This should decrease |
|||
during a successful training session. |
|||
|
|
|||
# External and Internal Brains |
|||
|
|||
The **External** and **Internal** types of Brains work in different phases of training. When training your agents, set their brain types to **External**; when using the trained models, set their brain types to **Internal**. |
|||
|
|||
## External Brain |
|||
|
|||
When [running an ML-Agents training algorithm](Training-ML-Agents.md), at least one Brain object in a scene must be set to **External**. This allows the training process to collect the observations of agents using that brain and give the agents their actions. |
|||
|
|||
In addition to using an External brain for training using the ML-Agents learning algorithms, you can use an External brain to control agents in a Unity environment using an external Python program. See [Python API](Python-API.md) for more information. |
|||
|
|||
Unlike the other types, the External Brain has no properties to set in the Unity Inspector window. |
|||
|
|||
## Internal Brain |
|||
|
|||
The Internal Brain type uses a [TensorFlow model](https://www.tensorflow.org/get_started/get_started_for_beginners#models_and_training) to make decisions. The Proximal Policy Optimization (PPO) and Behavioral Cloning algorithms included with the ML-Agents SDK produce trained TensorFlow models that you can use with the Internal Brain type. |
|||
|
|||
A __model__ is a mathematical relationship mapping an agent's observations to its actions. TensorFlow is a software library for performing numerical computation through data flow graphs. A TensorFlow model, then, defines the mathematical relationship between your agent's observations and its actions using a TensorFlow data flow graph. |
|||
|
|||
### Creating a graph model |
|||
|
|||
The training algorithma included in the ML-Agents SDK produce TensorFlow graph models as the end result of the training process. See [Training ML-Agents](Training-ML-Agents.md) for instructions on how to train a model. |
|||
|
|||
### Using a graph model |
|||
|
|||
To use a graph model: |
|||
|
|||
1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy Gameobject and must have a Brain component.) |
|||
2. Set the **Brain Type** to **Internal**. |
|||
|
|||
**Note:** In order to see the **Internal** Brain Type option, you must [enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md). |
|||
|
|||
3. Import the `environment_run-id.bytes` file produced by the PPO training program. (Where `environment_run-id` is the name of the model file, which is constructed from the name of your Unity environment executable and the run-id value you assigned when running the training process.) |
|||
|
|||
You can [import assets into Unity](https://docs.unity3d.com/Manual/ImportingAssets.html) in various ways. The easiest way is to simply drag the file into the **Project** window and drop it into an appropriate folder. |
|||
|
|||
4. Once the `environment.bytes` file is imported, drag it from the **Project** window to the **Graph Model** field of the Brain component. |
|||
|
|||
If you are using a model produced by the ML-Agents `learn.py` program, use the default values for the other Internal Brain parameters. |
|||
|
|||
### Internal Brain properties |
|||
|
|||
The default values of the TensorFlow graph parameters work with the model produced by the PPO and BC training code in the ML-Agents SDK. To use a default ML-Agents model, the only parameter that you need to set is the `Graph Model`, which must be set to the .bytes file containing the trained model itself. |
|||
|
|||
![Internal Brain Inspector](images/internal_brain.png) |
|||
|
|||
|
|||
* `Graph Model` : This must be the `bytes` file corresponding to the pretrained Tensorflow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector) |
|||
|
|||
Only change the following Internal Brain properties if you have created your own TensorFlow model and are not using an ML-Agents model: |
|||
|
|||
* `Graph Scope` : If you set a scope while training your TensorFlow model, all your placeholder name will have a prefix. You must specify that prefix here. |
|||
* `Batch Size Node Name` : If the batch size is one of the inputs of your graph, you must specify the name if the placeholder here. The brain will make the batch size equal to the number of agents connected to the brain automatically. |
|||
* `State Node Name` : If your graph uses the state as an input, you must specify the name of the placeholder here. |
|||
* `Recurrent Input Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the input placeholder here. |
|||
* `Recurrent Output Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the output placeholder here. |
|||
* `Observation Placeholder Name` : If your graph uses observations as input, you must specify it here. Note that the number of observations is equal to the length of `Camera Resolutions` in the brain parameters. |
|||
* `Action Node Name` : Specify the name of the placeholder corresponding to the actions of the brain in your graph. If the action space type is continuous, the output must be a one dimensional tensor of float of length `Action Space Size`, if the action space type is discrete, the output must be a one dimensional tensor of int of length 1. |
|||
* `Graph Placeholder` : If your graph takes additional inputs that are fixed (example: noise level) you can specify them here. Note that in your graph, these must correspond to one dimensional tensors of int or float of size 1. |
|||
* `Name` : Corresponds to the name of the placeholder. |
|||
* `Value Type` : Either Integer or Floating Point. |
|||
* `Min Value` and `Max Value` : Specify the range of the value here. The value will be sampled from the uniform distribution ranging from `Min Value` to `Max Value` inclusive. |
|||
|
|
|||
# Heuristic Brain |
|||
|
|
|||
# Internal Brain |
|||
|
|||
The Internal Brain type uses a [TensorFlow model](https://www.tensorflow.org/get_started/get_started_for_beginners#models_and_training) to make decisions. The Proximal Policy Optimization (PPO) algorithm included with the ML-Agents SDK produces a trained TensorFlow model that you can use with the Internal Brain type. |
|||
|
|||
A __model__ is a mathematical relationship mapping an agent's observations to its actions. TensorFlow is a software library for performing numerical computation through data flow graphs. A TensorFlow model, then, defines the mathematical relationship between your agent's observations and its actions using a TensorFlow data flow graph. |
|||
|
|||
## Creating a graph model |
|||
|
|||
The PPO algorithm included with the ML-Agents SDK produces a TensorFlow graph model as the end result of the training process. See [Training with Proximal Policy Optimization](Training-PPO.md) for intructions on how to create and train a model using PPO. |
|||
|
|||
## Using a graph model |
|||
|
|||
To use a graph model: |
|||
|
|||
1. Select the Brain GameObject in the **Hierarchy** window of the Unity Editor. (The Brain GameObject must be a child of the Academy Gameobject and must have a Brain component.) |
|||
2. Set the **Brain Type** to **Internal**. |
|||
|
|||
**Note:** In order to see the **Internal** Brain Type option, you must [enable TensorFlowSharp](link). |
|||
|
|||
3. Import the `environment_run-id.bytes` file produced by the PPO training program. (Where `environment_run-id` is the name of the model file, which is constructed from the name of your Unity environment executable and the run-id value you assigned when running the training process.) |
|||
|
|||
You can [import assets into Unity](https://docs.unity3d.com/Manual/ImportingAssets.html) in various ways. The easiest way is to simply drag the file into the **Project** window and drop it into an appropriate folder. |
|||
|
|||
4. Once the `environment.bytes` file is imported, drag it from the **Project** window to the **Graph Model** field of the Brain component. |
|||
|
|||
If you are using a model produced by the ML-Agents PPO program, use the default values for the other Internal Brain parameters. |
|||
|
|||
## Internal Brain properties |
|||
|
|||
The default values of the TensorFlow graph parameters work with the model produced by the PPO training code in the ML-Agents SDK. To use a default PPO model, the only parameter that you need to set is the `Graph Model`, which must be set to the .bytes file containing the trained model itself. |
|||
|
|||
![Internal Brain Inspector](images/internal_brain.png) |
|||
|
|||
|
|||
* `Graph Model` : This must be the `bytes` file corresponding to the pretrained Tensorflow graph. (You must first drag this file into your Resources folder and then from the Resources folder into the inspector) |
|||
|
|||
Only change the following Internal Brain properties if you have created your own TensorFlow model and are not using the ML-Agents PPO model: |
|||
|
|||
* `Graph Scope` : If you set a scope while training your TensorFlow model, all your placeholder name will have a prefix. You must specify that prefix here. |
|||
* `Batch Size Node Name` : If the batch size is one of the inputs of your graph, you must specify the name if the placeholder here. The brain will make the batch size equal to the number of agents connected to the brain automatically. |
|||
* `State Node Name` : If your graph uses the state as an input, you must specify the name of the placeholder here. |
|||
* `Recurrent Input Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the input placeholder here. |
|||
* `Recurrent Output Node Name` : If your graph uses a recurrent input / memory as input and outputs new recurrent input / memory, you must specify the name if the output placeholder here. |
|||
* `Observation Placeholder Name` : If your graph uses observations as input, you must specify it here. Note that the number of observations is equal to the length of `Camera Resolutions` in the brain parameters. |
|||
* `Action Node Name` : Specify the name of the placeholder corresponding to the actions of the brain in your graph. If the action space type is continuous, the output must be a one dimensional tensor of float of length `Action Space Size`, if the action space type is discrete, the output must be a one dimensional tensor of int of length 1. |
|||
* `Graph Placeholder` : If your graph takes additional inputs that are fixed (example: noise level) you can specify them here. Note that in your graph, these must correspond to one dimensional tensors of int or float of size 1. |
|||
* `Name` : Corresponds to the name of the placeholder. |
|||
* `Value Type` : Either Integer or Floating Point. |
|||
* `Min Value` and `Max Value` : Specify the range of the value here. The value will be sampled from the uniform distribution ranging from `Min Value` to `Max Value` inclusive. |
|||
|
撰写
预览
正在加载...
取消
保存
Reference in new issue