浏览代码

Merge pull request #461 from Unity-Technologies/dev-doc-fixes

Several documentation enhancements
/develop-generalizationTraining-TrainerController
GitHub 7 年前
当前提交
aff0ba28
共有 21 个文件被更改,包括 172 次插入3180 次删除
  1. 4
      README.md
  2. 2
      docs/Feature-Memory.md
  3. 4
      docs/Getting-Started-with-Balance-Ball.md
  4. 12
      docs/Installation.md
  5. 23
      docs/Learning-Environment-Create-New.md
  6. 35
      docs/Learning-Environment-Design-Agents.md
  7. 19
      docs/Learning-Environment-Design-Brains.md
  8. 2
      docs/Limitations-and-Common-Issues.md
  9. 18
      docs/ML-Agents-Overview.md
  10. 5
      docs/Python-API.md
  11. 7
      docs/Readme.md
  12. 2
      docs/Training-Imitation-Learning.md
  13. 4
      docs/Training-ML-Agents.md
  14. 2
      docs/Training-on-Amazon-Web-Service.md
  15. 136
      docs/images/agent.png
  16. 1001
      docs/images/mlagents-3DBall.png
  17. 1001
      docs/images/mlagents-Scene.png
  18. 841
      docs/images/agents_diagram.png
  19. 176
      docs/images/ml-agents-ODD.png
  20. 19
      docs/Feature-Broadcasting.md
  21. 39
      docs/Feature-On-Demand-Decisions.md

4
README.md


## Documentation and References
**For more information, in addition to installation and usage
instructions, see our [documentation home](docs/README.md).**
instructions, see our [documentation home](docs/README.md).** If you have
used a version of ML-Agents prior to v0.3, we strongly recommend
our [guide on migrating to v0.3](docs/Migrating-v0.3.md).
We have also published a series of blog posts that are relevant for ML-Agents:
- Overviewing reinforcement learning concepts

2
docs/Feature-Memory.md


## Limitations
* LSTM does not work well with continuous vector action space.
Please use discrete vector action space for better results.
* Since the memories must be sent back and forth between python
* Since the memories must be sent back and forth between Python
and Unity, using too large `memory_size` will slow down training.
* Adding a recurrent layer increases the complexity of the neural
network, it is recommended to decrease `num_layers` when using recurrent.

4
docs/Getting-Started-with-Balance-Ball.md


To train the agents within the Ball Balance environment, we will be using the python
package. We have provided a convenient python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
package. We have provided a convenient Python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
We will pass to this script the path of the environment executable that we just built. (Optionally) We can

```
python python/learn.py <env_file_path> --run-id=<run-identifier> --train
python3 python/learn.py <env_file_path> --run-id=<run-identifier> --train
```

12
docs/Installation.md


## Install Python (with Dependencies)
In order to use ML-Agents, you need Python (2 or 3; 64 bit required) along with
In order to use ML-Agents, you need Python 3 along with
We **strongly** recommend using Python 3 as we do not guarantee supporting Python 2
in future releases. In all of our subsequent instructions, we use `python`
to refer to either Python 2 or 3, depending on your installation.
### Windows Users

on installing it.
To install dependencies, go into the `python` subdirectory of the repository,
and run (depending on your Python version) from the command line:
pip install .
or
and run from the command line:
pip3 install .

23
docs/Learning-Environment-Create-New.md


Press **Play** to run the scene and use the WASD keys to move the agent around the platform. Make sure that there are no errors displayed in the Unity editor Console window and that the agent resets when it reaches its target or falls from the platform. Note that for more involved debugging, the ML-Agents SDK includes a convenient Monitor class that you can use to easily display agent status information in the Game window.
One additional test you can perform is to first ensure that your environment
and the Python API work as expected using the `python/Basics`
[Jupyter notebook](Background-Jupyter.md). Within `Basics`, be sure to set
`env_name` to the name of the environment file you specify when building
this environment.
## Review: Scene Layout
This section briefly reviews how to organize your scene when using
Agents in your Unity environment.
There are three kinds of game objects you need to include in your scene in order to use Unity ML-Agents:
* Academy
* Brain
* Agents
Keep in mind:
* There can only be one Academy game object in a scene.
* You can have multiple Brain game objects but they must be child of the Academy game object.
Here is an example of what your scene hierarchy should look like:
![Scene Hierarchy](images/scene-hierarchy.png)

35
docs/Learning-Environment-Design-Agents.md


To control the frequency of step-based decision making, set the **Decision Frequency** value for the Agent object in the Unity Inspector window. Agents using the same Brain instance can use a different frequency. During simulation steps in which no decision is requested, the agent receives the same action chosen by the previous decision.
When you turn on **On Demand Decisions** for an agent, your agent code must call the `Agent.RequestDecision()` function. This function call starts one iteration of the observation-decision-action-reward cycle. The Brain invokes the agent's `CollectObservations()` method, makes a decision and returns it by calling the `AgentAction()` method. The Brain waits for the agent to request the next decision before starting another iteration.
### On Demand Decision Making
See [On Demand Decision Making](Feature-On-Demand-Decision.md).
On demand decision making allows agents to request decisions from their
brains only when needed instead of receiving decisions at a fixed
frequency. This is useful when the agents commit to an action for a
variable number of steps or when the agents cannot make decisions
at the same time. This typically the case for turn based games, games
where agents must react to events or games where agents can take
actions of variable duration.
When you turn on **On Demand Decisions** for an agent, your agent code must call the `Agent.RequestDecision()` function. This function call starts one iteration of the observation-decision-action-reward cycle. The Brain invokes the agent's `CollectObservations()` method, makes a decision and returns it by calling the `AgentAction()` method. The Brain waits for the agent to request the next decision before starting another iteration.
## Observations

* `Max Step` - The per-agent maximum number of steps. Once this number is reached, the agent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the agent's `AgentReset()` function should be called when the agent reaches its `Max Step` count or is marked as done in code.
* `On Demand Decision` - Whether the agent requests decisions at a fixed step interval or explicitly requests decisions by calling `RequestDecision()`.
* If not checked, the Agent will request a new
decision every `Decision Frequency` steps and
perform an action every step. In the example above,
`CollectObservations()` will be called every 5 steps and
`AgentAction()` will be called at every step. This means that the
Agent will reuse the decision the Brain has given it.
* If checked, the Agent controls when to receive
decisions, and take actions. To do so, the Agent may leverage one or two methods:
* `RequestDecision()` Signals that the Agent is requesting a decision.
This causes the Agent to collect its observations and ask the Brain for a
decision at the next step of the simulation. Note that when an Agent
requests a decision, it also request an action.
This is to ensure that all decisions lead to an action during training.
* `RequestAction()` Signals that the Agent is requesting an action. The
action provided to the Agent in this case is the same action that was
provided the last time it requested a decision.
## Monitoring Agents
We created a helpful `Monitor` class that enables visualizing variables within
a Unity environment. While this was built for monitoring an Agent's value
function throughout the training process, we imagine it can be more broadly
useful. You can learn more [here](Feature-Monitor.md).
## Instantiating an Agent at Runtime

19
docs/Learning-Environment-Design-Brains.md


* `Player` - Actions are decided using keyboard input mappings.
* `Heuristic` - Actions are decided using a custom `Decision` script, which must be attached to the Brain game object.
## Using the Broadcast Feature
The Player, Heuristic and Internal brains have been updated to support broadcast. The broadcast feature allows you to collect data from your agents using a Python program without controlling them.
### How to use: Unity
To turn it on in Unity, simply check the `Broadcast` box as shown bellow:
![Broadcast](images/broadcast.png)
### How to use: Python
When you launch your Unity Environment from a Python program, you can see what the agents connected to non-external brains are doing. When calling `step` or `reset` on your environment, you retrieve a dictionary mapping brain names to `BrainInfo` objects. The dictionary contains a `BrainInfo` object for each non-external brain set to broadcast as well as for any external brains.
Just like with an external brain, the `BrainInfo` object contains the fields for `visual_observations`, `vector_observations`, `text_observations`, `memories`,`rewards`, `local_done`, `max_reached`, `agents` and `previous_actions`. Note that `previous_actions` corresponds to the actions that were taken by the agents at the previous step, not the current one.
Note that when you do a `step` on the environment, you cannot provide actions for non-external brains. If there are no external brains in the scene, simply call `step()` with no arguments.
You can use the broadcast feature to collect data generated by Player, Heuristics or Internal brains game sessions. You can then use this data to train an agent in a supervised context.

2
docs/Limitations-and-Common-Issues.md


If you receive an exception `"Couldn't launch new environment because
communication port {} is still in use. "`, you can change the worker
number in the python script when calling
number in the Python script when calling
`UnityEnvironment(file_name=filename, worker_id=X)`

18
docs/ML-Agents-Overview.md


must react to events or games where agents can take actions of variable
duration. Switching between decision taking at every step and
on-demand-decision is one button click away. You can learn more about the
on-demand-decision feature [here](Feature-On-Demand-Decisions.md).
on-demand-decision feature
[here](Learning-Environment-Design-Agents.md#on-demand-decision-making).
* **Memory-enhanced Agents** - In some scenarios, agents must learn to
remember the past in order to take the

Player Brain are used to learn the policies of an agent through demonstration.
However, this could also be helpful for the Heuristic and Internal Brains,
particularly when debugging agent behaviors. You can learn more about using
the broadcasting feature [here](Feature-Broadcasting.md).
the broadcasting feature
[here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
* **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents
without installing Python or TensorFlow directly, we provide a
[guide](Using-Docker.md) on how
to create and run a Docker container. Due to limitations on rendering visual
observations, this feature is marked experimental.
* **Cloud Training on AWS** - To facilitate using ML-Agents on
Amazon Web Services (AWS) machines, we provide a
[guide](Training-on-Amazon-Web-Service.md)
on how to set-up EC2 instances in addition to a public pre-configured Amazon
Machine Image (AMI).
## Summary and Next Steps

5
docs/Python-API.md


These classes are all defined in the `python/unityagents` folder of the ML-Agents SDK.
To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Feature-Broadcast.md).
To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook, which opens an environment, runs a few simulation steps taking random actions, and closes the environment.

```python
from unityagents import UnityEnvironment
env = UnityEnvironment(file_name="3DBall", worker_id=0)
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
* `seed` indicates the seed to use when generating random numbers during the training process. In environments which do not involve physics calculations, setting the seed enables reproducible experimentation by ensuring that the environment and trainers utilize the same random seed.
## Interacting with a Unity Environment

7
docs/Readme.md


* [Designing a Learning Environment](Learning-Environment-Design.md)
* [Agents](Learning-Environment-Design-Agents.md)
* [Academy](Learning-Environment-Design-Academy.md)
* [Brains](Learning-Environment-Design-Brains.md)
* [Brains](Learning-Environment-Design-Brains.md): [Player](Learning-Environment-Design-Player-Brains.md), [Heuristic](Learning-Environment-Design-Heuristic-Brains.md), [Internal & External](Learning-Environment-Design-External-Internal-Brains.md)
* [Using the Monitor](Feature-Monitor.md)
* [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md)
## Training

* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
## Help
* [Migrating to ML-Agents v0.3](Migrating-v0.3.md)
* [Migrating to ML-Agents v0.3](Migrating-v0.3.md)
* Brain
* CoreBrain
* Decision
* Monitor

2
docs/Training-Imitation-Learning.md


4. Link the brains to the desired agents (one agent as the teacher and at least one agent as a student).
5. Build the Unity executable for your desired platform.
6. In `trainer_config.yaml`, add an entry for the "Student" brain. Set the `trainer` parameter of this entry to `imitation`, and the `brain_to_imitate` parameter to the name of the teacher brain: "Teacher". Additionally, set `batches_per_epoch`, which controls how much training to do each moment. Increase the `max_steps` option if you'd like to keep training the agents for a longer period of time.
7. Launch the training process with `python python/learn.py <env_name> --train --slow`, where `<env_name>` is the path to your built Unity executable.
7. Launch the training process with `python3 python/learn.py <env_name> --train --slow`, where `<env_name>` is the path to your built Unity executable.
8. From the Unity window, control the agent with the Teacher brain by providing "teacher demonstrations" of the behavior you would like to see.
9. Watch as the agent(s) with the student brain attached begin to behave similarly to the demonstrations.
10. Once the Student agents are exhibiting the desired behavior, end the training process with `CTL+C` from the command line.

4
docs/Training-ML-Agents.md


The basic command for training is:
python learn.py <env_file_path> --run-id=<run-identifier> --train
python3 learn.py <env_file_path> --run-id=<run-identifier> --train
where `<env_file_path>` is the path to your Unity executable containing the agents to be trained and `<run-identifier>` is an optional identifier you can use to identify the results of individual training runs.

3. Navigate to the ml-agents `python` folder.
4. Run the following to launch the training process using the path to the Unity environment you built in step 1:
python learn.py ../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train
python3 learn.py ../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train
During a training session, the training program prints out and saves updates at regular intervals (specified by the `summary_freq` option). The saved statistics are grouped by the `run-id` value so you should assign a unique id to each training run if you plan to view the statistics. You can view these statistics using TensorBoard during or after training by running the following command (from the ML-Agents python directory):

2
docs/Training-on-Amazon-Web-Service.md


## Testing
If all steps worked correctly, upload an example binary built for Linux to the instance, and test it from python with:
If all steps worked correctly, upload an example binary built for Linux to the instance, and test it from Python with:
```python
from unityagents import UnityEnvironment

136
docs/images/agent.png

之前 之后
宽度: 526  |  高度: 160  |  大小: 20 KiB

1001
docs/images/mlagents-3DBall.png
文件差异内容过多而无法显示
查看文件

1001
docs/images/mlagents-Scene.png
文件差异内容过多而无法显示
查看文件

841
docs/images/agents_diagram.png

之前 之后

176
docs/images/ml-agents-ODD.png

之前 之后

19
docs/Feature-Broadcasting.md


# Using the Broadcast Feature
The Player, Heuristic and Internal brains have been updated to support broadcast. The broadcast feature allows you to collect data from your agents using a Python program without controlling them.
## How to use : Unity
To turn it on in Unity, simply check the `Broadcast` box as shown bellow:
![Broadcast](images/broadcast.png)
## How to use : Python
When you launch your Unity Environment from a Python program, you can see what the agents connected to non-external brains are doing. When calling `step` or `reset` on your environment, you retrieve a dictionary mapping brain names to `BrainInfo` objects. The dictionary contains a `BrainInfo` object for each non-external brain set to broadcast as well as for any external brains.
Just like with an external brain, the `BrainInfo` object contains the fields for `visual_observations`, `vector_observations`, `text_observations`, `memories`,`rewards`, `local_done`, `max_reached`, `agents` and `previous_actions`. Note that `previous_actions` corresponds to the actions that were taken by the agents at the previous step, not the current one.
Note that when you do a `step` on the environment, you cannot provide actions for non-external brains. If there are no external brains in the scene, simply call `step()` with no arguments.
You can use the broadcast feature to collect data generated by Player, Heuristics or Internal brains game sessions. You can then use this data to train an agent in a supervised context.

39
docs/Feature-On-Demand-Decisions.md


# On Demand Decision Making
## Description
On demand decision making allows agents to request decisions from their
brains only when needed instead of receiving decisions at a fixed
frequency. This is useful when the agents commit to an action for a
variable number of steps or when the agents cannot make decisions
at the same time. This typically the case for turn based games, games
where agents must react to events or games where agents can take
actions of variable duration.
## How to use
To enable or disable on demand decision making, use the checkbox called
`On Demand Decisions` in the Agent Inspector.
<p align="center">
<img src="images/ml-agents-ODD.png"
alt="On Demand Decision"
width="500" border="10" />
</p>
* If `On Demand Decisions` is not checked, the Agent will request a new
decision every `Decision Frequency` steps and
perform an action every step. In the example above,
`CollectObservations()` will be called every 5 steps and
`AgentAction()` will be called at every step. This means that the
Agent will reuse the decision the Brain has given it.
* If `On Demand Decisions` is checked, the Agent controls when to receive
decisions, and take actions. To do so, the Agent may leverage one or two methods:
* `RequestDecision()` Signals that the Agent is requesting a decision.
This causes the Agent to collect its observations and ask the Brain for a
decision at the next step of the simulation. Note that when an Agent
requests a decision, it also request an action.
This is to ensure that all decisions lead to an action during training.
* `RequestAction()` Signals that the Agent is requesting an action. The
action provided to the Agent in this case is the same action that was
provided the last time it requested a decision.
正在加载...
取消
保存