浏览代码
Documentation 0.5 Release Check List (Part 1) (#1154)
/develop-generalizationTraining-TrainerController
Documentation 0.5 Release Check List (Part 1) (#1154)
/develop-generalizationTraining-TrainerController
Arthur Juliani
6 年前
当前提交
2cd8e250
共有 48 个文件被更改,包括 880 次插入 和 822 次删除
-
5CODE_OF_CONDUCT.md
-
60CONTRIBUTING.md
-
94README.md
-
2docs/API-Reference.md
-
2docs/Background-TensorFlow.md
-
32docs/Basic-Guide.md
-
18docs/FAQ.md
-
2docs/Feature-Memory.md
-
2docs/Feature-Monitor.md
-
94docs/Getting-Started-with-Balance-Ball.md
-
4docs/Glossary.md
-
4docs/Installation-Windows.md
-
29docs/Installation.md
-
4docs/Learning-Environment-Best-Practices.md
-
77docs/Learning-Environment-Create-New.md
-
6docs/Learning-Environment-Design-Academy.md
-
147docs/Learning-Environment-Design-Agents.md
-
42docs/Learning-Environment-Design-Brains.md
-
26docs/Learning-Environment-Design-External-Internal-Brains.md
-
14docs/Learning-Environment-Design-Heuristic-Brains.md
-
24docs/Learning-Environment-Design-Player-Brains.md
-
82docs/Learning-Environment-Design.md
-
72docs/Learning-Environment-Examples.md
-
19docs/Learning-Environment-Executable.md
-
4docs/Limitations.md
-
24docs/ML-Agents-Overview.md
-
19docs/Migrating.md
-
4docs/Readme.md
-
8docs/Training-Curriculum-Learning.md
-
24docs/Training-Imitation-Learning.md
-
13docs/Training-ML-Agents.md
-
2docs/Training-PPO.md
-
2docs/Training-on-Amazon-Web-Service.md
-
15docs/Training-on-Microsoft-Azure.md
-
8docs/Using-TensorFlow-Sharp-in-Unity.md
-
8docs/Using-Tensorboard.md
-
26docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
-
4docs/localized/zh-CN/docs/Installation.md
-
4docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
-
14docs/localized/zh-CN/docs/Learning-Environment-Design.md
-
42docs/localized/zh-CN/docs/Learning-Environment-Examples.md
-
2docs/localized/zh-CN/docs/ML-Agents-Overview.md
-
183ml-agents/README.md
-
149docs/Python-API.md
-
158gym-unity/README.md
-
1MLAgentsSDK/README.md
-
127gym-unity/Readme.md
|
|||
# Contribution Guidelines |
|||
|
|||
Thank you for your interest in contributing to the ML-Agents toolkit! We are incredibly |
|||
excited to see how members of our community will use and extend the ML-Agents toolkit. |
|||
To facilitate your contributions, we've outlined a brief set of guidelines |
|||
to ensure that your extensions can be easily integrated. |
|||
Thank you for your interest in contributing to the ML-Agents toolkit! We are |
|||
incredibly excited to see how members of our community will use and extend the |
|||
ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set |
|||
of guidelines to ensure that your extensions can be easily integrated. |
|||
### Communication |
|||
## Communication |
|||
First, please read through our [code of conduct](CODE_OF_CONDUCT.md), |
|||
as we expect all our contributors to follow it. |
|||
First, please read through our [code of conduct](CODE_OF_CONDUCT.md), as we |
|||
expect all our contributors to follow it. |
|||
Second, before starting on a project that you intend to contribute |
|||
to the ML-Agents toolkit (whether environments or modifications to the codebase), |
|||
we **strongly** recommend posting on our |
|||
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) and |
|||
briefly outlining the changes you plan to make. This will enable us to provide |
|||
some context that may be helpful for you. This could range from advice and |
|||
feedback on how to optimally perform your changes or reasons for not doing it. |
|||
Second, before starting on a project that you intend to contribute to the |
|||
ML-Agents toolkit (whether environments or modifications to the codebase), we |
|||
**strongly** recommend posting on our |
|||
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) |
|||
and briefly outlining the changes you plan to make. This will enable us to |
|||
provide some context that may be helpful for you. This could range from advice |
|||
and feedback on how to optimally perform your changes or reasons for not doing |
|||
it. |
|||
### Git Branches |
|||
## Git Branches |
|||
Starting with v0.3, we adopted the |
|||
Starting with v0.3, we adopted the |
|||
Consequently, the `master` branch corresponds to the latest release of |
|||
Consequently, the `master` branch corresponds to the latest release of |
|||
|
|||
* Corresponding changes to documentation, unit tests and sample environments |
|||
(if applicable) |
|||
* Corresponding changes to documentation, unit tests and sample environments (if |
|||
applicable) |
|||
### Environments |
|||
## Environments |
|||
We are also actively open to adding community contributed environments as |
|||
examples, as long as they are small, simple, demonstrate a unique feature of |
|||
the platform, and provide a unique non-trivial challenge to modern |
|||
We are also actively open to adding community contributed environments as |
|||
examples, as long as they are small, simple, demonstrate a unique feature of |
|||
the platform, and provide a unique non-trivial challenge to modern |
|||
PR explaining the nature of the environment and task. |
|||
PR explaining the nature of the environment and task. |
|||
### Style Guide |
|||
## Style Guide |
|||
When performing changes to the codebase, ensure that you follow the style |
|||
guide of the file you're modifying. For Python, we follow |
|||
[PEP 8](https://www.python.org/dev/peps/pep-0008/). For C#, we will soon be |
|||
adding a formal style guide for our repository. |
|||
When performing changes to the codebase, ensure that you follow the style guide |
|||
of the file you're modifying. For Python, we follow |
|||
[PEP 8](https://www.python.org/dev/peps/pep-0008/). |
|||
For C#, we will soon be adding a formal style guide for our repository. |
|
|||
# Unity ml-agents interface and trainers |
|||
# Unity ML-Agents Python Interface and Trainers |
|||
The `mlagents` package contains two components : The low level API which allows |
|||
you to interact directly with a Unity Environment and a training component whcih |
|||
allows you to train agents in Unity Environments using our implementations of |
|||
reinforcement learning or imitation learning. |
|||
The `mlagents` Python package is part of the |
|||
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents). |
|||
`mlagents` provides a Python API that allows direct interaction with the Unity |
|||
game engine as well as a collection of trainers and algorithms to train agents |
|||
in Unity environments. |
|||
|
|||
The `mlagents` Python package contains two components: The low level API which |
|||
allows you to interact directly with a Unity Environment (`mlagents.envs`) and |
|||
an entry point to train (`mlagents-learn`) which allows you to train agents in |
|||
Unity Environments using our implementations of reinforcement learning or |
|||
imitation learning. |
|||
The `ml-agents` package can be installed using: |
|||
Install `mlagents` with: |
|||
or by running the following from the `ml-agents` directory of the repository: |
|||
|
|||
```sh |
|||
pip install . |
|||
``` |
|||
|
|||
## `mlagents.envs` |
|||
|
|||
The ML-Agents toolkit provides a Python API for controlling the agent simulation |
|||
loop of a environment or game built with Unity. This API is used by the ML-Agent |
|||
training algorithms (run with `mlagents-learn`), but you can also write your |
|||
Python programs using this API. |
|||
|
|||
The key objects in the Python API include: |
|||
|
|||
- **UnityEnvironment** — the main interface between the Unity application and |
|||
your code. Use UnityEnvironment to start and control a simulation or training |
|||
session. |
|||
- **BrainInfo** — contains all the data from agents in the simulation, such as |
|||
observations and rewards. |
|||
- **BrainParameters** — describes the data elements in a BrainInfo object. For |
|||
example, provides the array length of an observation in BrainInfo. |
|||
|
|||
These classes are all defined in the `ml-agents/mlagents/envs` folder of |
|||
the ML-Agents SDK. |
|||
|
|||
To communicate with an agent in a Unity environment from a Python program, the |
|||
agent must either use an **External** brain or use a brain that is broadcasting |
|||
(has its **Broadcast** property set to true). Your code is expected to return |
|||
actions for agents with external brains, but can only observe broadcasting |
|||
brains (the information you receive for an agent is the same in both cases). |
|||
|
|||
_Notice: Currently communication between Unity and Python takes place over an |
|||
open socket without authentication. As such, please make sure that the network |
|||
where training takes place is secure. This will be addressed in a future |
|||
release._ |
|||
|
|||
### Loading a Unity Environment |
|||
|
|||
Python-side communication happens through `UnityEnvironment` which is located in |
|||
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary |
|||
file, put the file in the same directory as `envs`. For example, if the filename |
|||
of your Unity environment is 3DBall.app, in python, run: |
|||
|
|||
```python |
|||
from mlagents.env import UnityEnvironment |
|||
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1) |
|||
``` |
|||
|
|||
- `file_name` is the name of the environment binary (located in the root |
|||
directory of the python project). |
|||
- `worker_id` indicates which port to use for communication with the |
|||
environment. For use in parallel training regimes such as A3C. |
|||
- `seed` indicates the seed to use when generating random numbers during the |
|||
training process. In environments which do not involve physics calculations, |
|||
setting the seed enables reproducible experimentation by ensuring that the |
|||
environment and trainers utilize the same random seed. |
|||
## Usage & More Information |
|||
If you want to directly interact with the Editor, you need to use |
|||
`file_name=None`, then press the :arrow_forward: button in the Editor when the |
|||
message _"Start training by pressing the Play button in the Unity Editor"_ is |
|||
displayed on the screen |
|||
|
|||
### Interacting with a Unity Environment |
|||
|
|||
A BrainInfo object contains the following fields: |
|||
|
|||
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of |
|||
the list corresponds to the n<sup>th</sup> observation of the brain. |
|||
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch |
|||
size, vector observation size)`. |
|||
- **`text_observations`** : A list of string corresponding to the agents text |
|||
observations. |
|||
- **`memories`** : A two dimensional numpy array of dimension `(batch size, |
|||
memory size)` which corresponds to the memories sent at the previous step. |
|||
- **`rewards`** : A list as long as the number of agents using the brain |
|||
containing the rewards they each obtained at the previous step. |
|||
- **`local_done`** : A list as long as the number of agents using the brain |
|||
containing `done` flags (whether or not the agent is done). |
|||
- **`max_reached`** : A list as long as the number of agents using the brain |
|||
containing true if the agents reached their max steps. |
|||
- **`agents`** : A list of the unique ids of the agents using the brain. |
|||
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch |
|||
size, vector action size)` if the vector action space is continuous and |
|||
`(batch size, number of branches)` if the vector action space is discrete. |
|||
|
|||
Once loaded, you can use your UnityEnvironment object, which referenced by a |
|||
variable named `env` in this example, can be used in the following way: |
|||
|
|||
- **Print : `print(str(env))`** |
|||
Prints all parameters relevant to the loaded environment and the external |
|||
brains. |
|||
- **Reset : `env.reset(train_model=True, config=None)`** |
|||
Send a reset signal to the environment, and provides a dictionary mapping |
|||
brain names to BrainInfo objects. |
|||
- `train_model` indicates whether to run the environment in train (`True`) or |
|||
test (`False`) mode. |
|||
- `config` is an optional dictionary of configuration flags specific to the |
|||
environment. For generic environments, `config` can be ignored. `config` is |
|||
a dictionary of strings to floats where the keys are the names of the |
|||
`resetParameters` and the values are their corresponding float values. |
|||
Define the reset parameters on the Academy Inspector window in the Unity |
|||
Editor. |
|||
- **Step : `env.step(action, memory=None, text_action=None)`** |
|||
Sends a step signal to the environment using the actions. For each brain : |
|||
- `action` can be one dimensional arrays or two dimensional arrays if you have |
|||
multiple agents per brains. |
|||
- `memory` is an optional input that can be used to send a list of floats per |
|||
agents to be retrieved at the next step. |
|||
- `text_action` is an optional input that be used to send a single string per |
|||
agent. |
|||
|
|||
Returns a dictionary mapping brain names to BrainInfo objects. |
|||
|
|||
For example, to access the BrainInfo belonging to a brain called |
|||
'brain_name', and the BrainInfo field 'vector_observations': |
|||
|
|||
```python |
|||
info = env.step() |
|||
brainInfo = info['brain_name'] |
|||
observations = brainInfo.vector_observations |
|||
``` |
|||
|
|||
Note that if you have more than one external brain in the environment, you |
|||
must provide dictionaries from brain names to arrays for `action`, `memory` |
|||
and `value`. For example: If you have two external brains named `brain1` and |
|||
`brain2` each with one agent taking two continuous actions, then you can |
|||
have: |
|||
|
|||
```python |
|||
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]} |
|||
``` |
|||
|
|||
Returns a dictionary mapping brain names to BrainInfo objects. |
|||
- **Close : `env.close()`** |
|||
Sends a shutdown signal to the environment and closes the communication |
|||
socket. |
|||
|
|||
## `mlagents.trainers` |
|||
|
|||
1. Open a command or terminal window. |
|||
2. Run |
|||
|
|||
```sh |
|||
mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train <environment-name> |
|||
``` |
|||
|
|||
Where: |
|||
|
|||
- `<trainer-config-path>` is the relative or absolute filepath of the trainer |
|||
configuration. The defaults used by environments in the ML-Agents SDK can be |
|||
found in `config/trainer_config.yaml`. |
|||
- `<run-identifier>` is a string used to separate the results of different |
|||
training runs |
|||
- The `--train` flag tells `mlagents-learn` to run a training session (rather |
|||
than inference) |
|||
- `<environment-name>` __(Optional)__ is the path to the Unity executable you |
|||
want to train. __Note:__ If this argument is not passed, the training |
|||
will be made through the editor. |
|||
|
|||
For more detailled documentation, check out the |
|||
[ML-Agents toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md) |
|||
For more detailed documentation, check out the |
|||
[ML-Agents Toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md) |
|
|||
# Unity ML-Agents Python Interface and Trainers |
|||
|
|||
The `mlagents` Python package is part of the [ML-Agents |
|||
Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents` provides a |
|||
Python API that allows direct interaction with the Unity game engine as well as |
|||
a collection of trainers and algorithms to train agents in Unity environments. |
|||
|
|||
The `mlagents` Python package contains two components: a low level API which |
|||
allows you to interact directly with a Unity Environment (`mlagents.envs`) and |
|||
an entry point to train (`mlagents-learn`) which allows you to train agents in |
|||
Unity Environments using our implementations of reinforcement learning or |
|||
imitation learning. |
|||
|
|||
## mlagents.envs |
|||
|
|||
The ML-Agents Toolkit provides a Python API for controlling the Agent simulation |
|||
loop of an environment or game built with Unity. This API is used by the |
|||
training algorithms inside the ML-Agent Toolkit, but you can also write your own |
|||
Python programs using this API. |
|||
|
|||
The key objects in the Python API include: |
|||
|
|||
- **UnityEnvironment** — the main interface between the Unity application and |
|||
your code. Use UnityEnvironment to start and control a simulation or training |
|||
session. |
|||
- **BrainInfo** — contains all the data from Agents in the simulation, such as |
|||
observations and rewards. |
|||
- **BrainParameters** — describes the data elements in a BrainInfo object. For |
|||
example, provides the array length of an observation in BrainInfo. |
|||
|
|||
These classes are all defined in the `ml-agents/mlagents/envs` folder of |
|||
the ML-Agents SDK. |
|||
|
|||
To communicate with an Agent in a Unity environment from a Python program, the |
|||
Agent must either use an **External** Brain or use a Brain that is broadcasting |
|||
(has its **Broadcast** property set to true). Your code is expected to return |
|||
actions for Agents with external Brains, but can only observe broadcasting |
|||
Brains (the information you receive for an Agent is the same in both cases). |
|||
|
|||
_Notice: Currently communication between Unity and Python takes place over an |
|||
open socket without authentication. As such, please make sure that the network |
|||
where training takes place is secure. This will be addressed in a future |
|||
release._ |
|||
|
|||
### Loading a Unity Environment |
|||
|
|||
Python-side communication happens through `UnityEnvironment` which is located in |
|||
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary |
|||
file, put the file in the same directory as `envs`. For example, if the filename |
|||
of your Unity environment is 3DBall.app, in python, run: |
|||
|
|||
```python |
|||
from mlagents.env import UnityEnvironment |
|||
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1) |
|||
``` |
|||
|
|||
- `file_name` is the name of the environment binary (located in the root |
|||
directory of the python project). |
|||
- `worker_id` indicates which port to use for communication with the |
|||
environment. For use in parallel training regimes such as A3C. |
|||
- `seed` indicates the seed to use when generating random numbers during the |
|||
training process. In environments which do not involve physics calculations, |
|||
setting the seed enables reproducible experimentation by ensuring that the |
|||
environment and trainers utilize the same random seed. |
|||
|
|||
If you want to directly interact with the Editor, you need to use |
|||
`file_name=None`, then press the :arrow_forward: button in the Editor when the |
|||
message _"Start training by pressing the Play button in the Unity Editor"_ is |
|||
displayed on the screen |
|||
|
|||
### Interacting with a Unity Environment |
|||
|
|||
A BrainInfo object contains the following fields: |
|||
|
|||
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of |
|||
the list corresponds to the n<sup>th</sup> observation of the Brain. |
|||
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch |
|||
size, vector observation size)`. |
|||
- **`text_observations`** : A list of string corresponding to the Agents text |
|||
observations. |
|||
- **`memories`** : A two dimensional numpy array of dimension `(batch size, |
|||
memory size)` which corresponds to the memories sent at the previous step. |
|||
- **`rewards`** : A list as long as the number of Agents using the Brain |
|||
containing the rewards they each obtained at the previous step. |
|||
- **`local_done`** : A list as long as the number of Agents using the Brain |
|||
containing `done` flags (whether or not the Agent is done). |
|||
- **`max_reached`** : A list as long as the number of Agents using the Brain |
|||
containing true if the Agents reached their max steps. |
|||
- **`agents`** : A list of the unique ids of the Agents using the Brain. |
|||
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch |
|||
size, vector action size)` if the vector action space is continuous and |
|||
`(batch size, number of branches)` if the vector action space is discrete. |
|||
|
|||
Once loaded, you can use your UnityEnvironment object, which referenced by a |
|||
variable named `env` in this example, can be used in the following way: |
|||
|
|||
- **Print : `print(str(env))`** |
|||
Prints all parameters relevant to the loaded environment and the external |
|||
Brains. |
|||
- **Reset : `env.reset(train_model=True, config=None)`** |
|||
Send a reset signal to the environment, and provides a dictionary mapping |
|||
Brain names to BrainInfo objects. |
|||
- `train_model` indicates whether to run the environment in train (`True`) or |
|||
test (`False`) mode. |
|||
- `config` is an optional dictionary of configuration flags specific to the |
|||
environment. For generic environments, `config` can be ignored. `config` is |
|||
a dictionary of strings to floats where the keys are the names of the |
|||
`resetParameters` and the values are their corresponding float values. |
|||
Define the reset parameters on the Academy Inspector window in the Unity |
|||
Editor. |
|||
- **Step : `env.step(action, memory=None, text_action=None)`** |
|||
Sends a step signal to the environment using the actions. For each Brain : |
|||
- `action` can be one dimensional arrays or two dimensional arrays if you have |
|||
multiple Agents per Brain. |
|||
- `memory` is an optional input that can be used to send a list of floats per |
|||
Agents to be retrieved at the next step. |
|||
- `text_action` is an optional input that be used to send a single string per |
|||
Agent. |
|||
|
|||
Returns a dictionary mapping Brain names to BrainInfo objects. |
|||
|
|||
For example, to access the BrainInfo belonging to a Brain called |
|||
'brain_name', and the BrainInfo field 'vector_observations': |
|||
|
|||
```python |
|||
info = env.step() |
|||
brainInfo = info['brain_name'] |
|||
observations = brainInfo.vector_observations |
|||
``` |
|||
|
|||
Note that if you have more than one external Brain in the environment, you |
|||
must provide dictionaries from Brain names to arrays for `action`, `memory` |
|||
and `value`. For example: If you have two external Brains named `brain1` and |
|||
`brain2` each with one Agent taking two continuous actions, then you can |
|||
have: |
|||
|
|||
```python |
|||
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]} |
|||
``` |
|||
|
|||
Returns a dictionary mapping Brain names to BrainInfo objects. |
|||
- **Close : `env.close()`** |
|||
Sends a shutdown signal to the environment and closes the communication |
|||
socket. |
|||
|
|||
## mlagents-learn |
|||
|
|||
For more detailed documentation on using `mlagents-learn`, check out |
|||
[Training ML-Agents](Training-ML-Agents.md) |
|
|||
# Unity ML-Agents Gym Wrapper |
|||
|
|||
A common way in which machine learning researchers interact with simulation |
|||
environments is via a wrapper provided by OpenAI called `gym`. For more |
|||
information on the gym interface, see [here](https://github.com/openai/gym). |
|||
|
|||
We provide a a gym wrapper, and instructions for using it with existing machine |
|||
learning algorithms which utilize gyms. Both wrappers provide interfaces on top |
|||
of our `UnityEnvironment` class, which is the default way of interfacing with a |
|||
Unity environment via Python. |
|||
|
|||
## Installation |
|||
|
|||
The gym wrapper can be installed using: |
|||
|
|||
```sh |
|||
pip install gym_unity |
|||
``` |
|||
|
|||
or by running the following from the `/gym-unity` directory of the repository: |
|||
|
|||
```sh |
|||
pip install . |
|||
``` |
|||
|
|||
## Using the Gym Wrapper |
|||
|
|||
The gym interface is available from `gym_unity.envs`. To launch an environmnent |
|||
from the root of the project repository use: |
|||
|
|||
```python |
|||
from gym_unity.envs import UnityEnv |
|||
|
|||
env = UnityEnv(environment_filename, worker_id, default_visual, multiagent) |
|||
``` |
|||
|
|||
* `environment_filename` refers to the path to the Unity environment. |
|||
* `worker_id` refers to the port to use for communication with the environment. |
|||
Defaults to `0`. |
|||
* `use_visual` refers to whether to use visual observations (True) or vector |
|||
observations (False) as the default observation provided by the `reset` and |
|||
`step` functions. Defaults to `False`. |
|||
* `multiagent` refers to whether you intent to launch an environment which |
|||
contains more than one agent. Defaults to `False`. |
|||
|
|||
The returned environment `env` will function as a gym. |
|||
|
|||
For more on using the gym interface, see our |
|||
[Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb). |
|||
|
|||
## Limitation |
|||
|
|||
* It is only possible to use an environment with a single Brain. |
|||
* By default the first visual observation is provided as the `observation`, if |
|||
present. Otherwise vector observations are provided. |
|||
* All `BrainInfo` output from the environment can still be accessed from the |
|||
`info` provided by `env.step(action)`. |
|||
* Stacked vector observations are not supported. |
|||
* Environment registration for use with `gym.make()` is currently not supported. |
|||
|
|||
## Running OpenAI Baselines Algorithms |
|||
|
|||
OpenAI provides a set of open-source maintained and tested Reinforcement |
|||
Learning algorithms called the [Baselines](https://github.com/openai/baselines). |
|||
|
|||
Using the provided Gym wrapper, it is possible to train ML-Agents environments |
|||
using these algorithms. This requires the creation of custom training scripts to |
|||
launch each algorithm. In most cases these scripts can be created by making |
|||
slightly modifications to the ones provided for Atari and Mujoco environments. |
|||
|
|||
### Example - DQN Baseline |
|||
|
|||
In order to train an agent to play the `GridWorld` environment using the |
|||
Baselines DQN algorithm, create a file called `train_unity.py` within the |
|||
`baselines/deepq/experiments` subfolder of the baselines repository. This file |
|||
will be a modification of the `run_atari.py` file within the same folder. Then |
|||
create and `/envs/` directory within the repository, and build the GridWorld |
|||
environment to that directory. For more information on building Unity |
|||
environments, see [here](../docs/Learning-Environment-Executable.md). Add the |
|||
following code to the `train_unity.py` file: |
|||
|
|||
```python |
|||
import gym |
|||
|
|||
from baselines import deepq |
|||
from gym_unity.envs import UnityEnv |
|||
|
|||
def main(): |
|||
env = UnityEnv("./envs/GridWorld", 0, use_visual=True) |
|||
model = deepq.models.cnn_to_mlp( |
|||
convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)], |
|||
hiddens=[256], |
|||
dueling=True, |
|||
) |
|||
act = deepq.learn( |
|||
env, |
|||
q_func=model, |
|||
lr=1e-3, |
|||
max_timesteps=100000, |
|||
buffer_size=50000, |
|||
exploration_fraction=0.1, |
|||
exploration_final_eps=0.02, |
|||
print_freq=10, |
|||
) |
|||
print("Saving model to unity_model.pkl") |
|||
act.save("unity_model.pkl") |
|||
|
|||
|
|||
if __name__ == '__main__': |
|||
main() |
|||
``` |
|||
|
|||
To start the training process, run the following from the root of the baselines |
|||
repository: |
|||
|
|||
```sh |
|||
python -m baselines.deepq.experiments.train_unity |
|||
``` |
|||
|
|||
### Other Algorithms |
|||
|
|||
Other algorithms in the Baselines repository can be run using scripts similar to |
|||
the example provided above. In most cases, the primary changes needed to use a |
|||
Unity environment are to import `UnityEnv`, and to replace the environment |
|||
creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)` |
|||
passing the environment binary path. |
|||
|
|||
A typical rule of thumb is that for vision-based environments, modification |
|||
should be done to Atari training scripts, and for vector observation |
|||
environments, modification should be done to Mujoco scripts. |
|||
|
|||
Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()` |
|||
functions. These are defined in `baselines/common/cmd_util.py`. In order to use |
|||
Unity environments for these algorithms, add the following import statement and |
|||
function to `cmd_utils.py`: |
|||
|
|||
```python |
|||
from gym_unity.envs import UnityEnv |
|||
|
|||
def make_unity_env(env_directory, num_env, visual, start_index=0): |
|||
""" |
|||
Create a wrapped, monitored Unity environment. |
|||
""" |
|||
def make_env(rank): # pylint: disable=C0111 |
|||
def _thunk(): |
|||
env = UnityEnv(env_directory, rank, use_visual=True) |
|||
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank))) |
|||
return env |
|||
return _thunk |
|||
if visual: |
|||
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)]) |
|||
else: |
|||
rank = MPI.COMM_WORLD.Get_rank() |
|||
env = UnityEnv(env_directory, rank, use_visual=False) |
|||
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank))) |
|||
return env |
|||
|
|||
``` |
|
|||
# ML-Agents SDK |
|
|||
# Unity ML-Agents Gym Wrapper |
|||
|
|||
A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by OpenAI called `gym`. For more information on the gym interface, see [here](https://github.com/openai/gym). |
|||
|
|||
We provide a a gym wrapper, and instructions for using it with existing machine learning algorithms which utilize gyms. Both wrappers provide interfaces on top of our `UnityEnvironment` class, which is the default way of interfacing with a Unity environment via Python. |
|||
|
|||
## Installation |
|||
|
|||
The gym wrapper can be installed using: |
|||
|
|||
``` |
|||
pip install gym_unity |
|||
``` |
|||
|
|||
or by running the following from the `/gym-unity` directory of the repository: |
|||
|
|||
``` |
|||
pip install . |
|||
``` |
|||
|
|||
|
|||
## Using the Gym Wrapper |
|||
The gym interface is available from `gym_unity.envs`. To launch an environmnent from the root of the project repository use: |
|||
|
|||
```python |
|||
from gym_unity.envs import UnityEnv |
|||
|
|||
env = UnityEnv(environment_filename, worker_id, default_visual, multiagent) |
|||
``` |
|||
|
|||
* `environment_filename` refers to the path to the Unity environment. |
|||
* `worker_id` refers to the port to use for communication with the environment. Defaults to `0`. |
|||
* `use_visual` refers to whether to use visual observations (True) or vector observations (False) as the default observation provided by the `reset` and `step` functions. Defaults to `False`. |
|||
* `multiagent` refers to whether you intent to launch an environment which contains more than one agent. Defaults to `False`. |
|||
|
|||
The returned environment `env` will function as a gym. |
|||
|
|||
For more on using the gym interface, see our [Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb). |
|||
|
|||
## Limitation |
|||
|
|||
* It is only possible to use an environment with a single Brain. |
|||
* By default the first visual observation is provided as the `observation`, if |
|||
present. Otherwise vector observations are provided. |
|||
* All `BrainInfo` output from the environment can still be accessed from the |
|||
`info` provided by `env.step(action)`. |
|||
* Stacked vector observations are not supported. |
|||
* Environment registration for use with `gym.make()` is currently not supported. |
|||
|
|||
## Running OpenAI Baselines Algorithms |
|||
|
|||
OpenAI provides a set of open-source maintained and tested Reinforcement Learning algorithms called the [Baselines](https://github.com/openai/baselines). |
|||
|
|||
Using the provided Gym wrapper, it is possible to train ML-Agents environments using these algorithms. This requires the creation of custom training scripts to launch each algorithm. In most cases these scripts can be created by making slightly modifications to the ones provided for Atari and Mujoco environments. |
|||
|
|||
### Example - DQN Baseline |
|||
|
|||
In order to train an agent to play the `GridWorld` environment using the Baselines DQN algorithm, create a file called `train_unity.py` within the `baselines/deepq/experiments` subfolder of the baselines repository. This file will be a modification of the `run_atari.py` file within the same folder. Then create and `/envs/` directory within the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). Add the following code to the `train_unity.py` file: |
|||
|
|||
``` |
|||
import gym |
|||
|
|||
from baselines import deepq |
|||
from gym_unity.envs import UnityEnv |
|||
|
|||
def main(): |
|||
env = UnityEnv("./envs/GridWorld", 0, use_visual=True) |
|||
model = deepq.models.cnn_to_mlp( |
|||
convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)], |
|||
hiddens=[256], |
|||
dueling=True, |
|||
) |
|||
act = deepq.learn( |
|||
env, |
|||
q_func=model, |
|||
lr=1e-3, |
|||
max_timesteps=100000, |
|||
buffer_size=50000, |
|||
exploration_fraction=0.1, |
|||
exploration_final_eps=0.02, |
|||
print_freq=10, |
|||
) |
|||
print("Saving model to unity_model.pkl") |
|||
act.save("unity_model.pkl") |
|||
|
|||
|
|||
if __name__ == '__main__': |
|||
main() |
|||
``` |
|||
|
|||
|
|||
To start the training process, run the following from the root of the baselines repository: |
|||
|
|||
``` |
|||
python -m baselines.deepq.experiments.train_unity |
|||
``` |
|||
|
|||
### Other Algorithms |
|||
|
|||
Other algorithms in the Baselines repository can be run using scripts similar to the example provided above. In most cases, the primary changes needed to use a Unity environment are to import `UnityEnv`, and to replace the environment creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)` passing the environment binary path. |
|||
|
|||
A typical rule of thumb is that for vision-based environments, modification should be done to Atari training scripts, and for vector observation environments, modification should be done to Mujoco scripts. |
|||
|
|||
Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()` functions. These are defined in `baselines/common/cmd_util.py`. In order to use Unity environments for these algorithms, add the following import statement and function to `cmd_utils.py`: |
|||
|
|||
```python |
|||
from gym_unity.envs import UnityEnv |
|||
|
|||
def make_unity_env(env_directory, num_env, visual, start_index=0): |
|||
""" |
|||
Create a wrapped, monitored Unity environment. |
|||
""" |
|||
def make_env(rank): # pylint: disable=C0111 |
|||
def _thunk(): |
|||
env = UnityEnv(env_directory, rank, use_visual=True) |
|||
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank))) |
|||
return env |
|||
return _thunk |
|||
if visual: |
|||
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)]) |
|||
else: |
|||
rank = MPI.COMM_WORLD.Get_rank() |
|||
env = UnityEnv(env_directory, rank, use_visual=False) |
|||
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank))) |
|||
return env |
|||
|
|||
``` |
撰写
预览
正在加载...
取消
保存
Reference in new issue