浏览代码

Added a README for the ml-agents package (#1136)

* Added a README for the ml-agents package

* address comments

* addressed comments

* Silencing linter complaints.

* Missed a couple of long lines.
/develop-generalizationTraining-TrainerController
GitHub 7 年前
当前提交
a776c668
共有 9 个文件被更改,包括 184 次插入144 次删除
  1. 2
      docs/Basic-Guide.md
  2. 2
      docs/Learning-Environment-Design-External-Internal-Brains.md
  3. 2
      docs/Learning-Environment-Design.md
  4. 2
      docs/Learning-Environment-Executable.md
  5. 2
      docs/ML-Agents-Overview.md
  6. 2
      docs/Readme.md
  7. 2
      ml-agents/setup.py
  8. 177
      ml-agents/README.md
  9. 137
      docs/Python-API.md

2
docs/Basic-Guide.md


`None` if you want to interact with the current scene in the Unity Editor.
More information and documentation is provided in the
[Python API](Python-API.md) page.
[Python API](../ml-agents/README.md) page.
## Training the Brain with Reinforcement Learning

2
docs/Learning-Environment-Design-External-Internal-Brains.md


In addition to using an External brain for training using the ML-Agents learning
algorithms, you can use an External brain to control agents in a Unity
environment using an external Python program. See [Python API](Python-API.md)
environment using an external Python program. See [Python API](../ml-agents/README.md)
for more information.
Unlike the other types, the External Brain has no properties to set in the Unity

2
docs/Learning-Environment-Design.md


**Note:** The API used by the Python PPO training process to communicate with
and control the Academy during training can be used for other purposes as well.
For example, you could use the API to use Unity as the simulation engine for
your own machine learning algorithms. See [Python API](Python-API.md) for more
your own machine learning algorithms. See [Python API](../ml-agents/README.md) for more
information.
## Organizing the Unity Scene

2
docs/Learning-Environment-Executable.md


## Interacting with the Environment
If you want to use the [Python API](Python-API.md) to interact with your
If you want to use the [Python API](../ml-agents/README.md) to interact with your
executable, you can pass the name of the executable with the argument
'file_name' of the `UnityEnvironment`. For instance:

2
docs/ML-Agents-Overview.md


the scene will be controlled within Python.
We do not currently have a tutorial highlighting this mode, but you can
learn more about the Python API [here](Python-API.md).
learn more about the Python API [here](../ml-agents/README.md).
### Curriculum Learning

2
docs/Readme.md


## API Docs
* [API Reference](API-Reference.md)
* [How to use the Python API](Python-API.md)
* [How to use the Python API](../ml-agents/README.md)
* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)

2
ml-agents/setup.py


here = path.abspath(path.dirname(__file__))
# Get the long description from the README file
with open(path.join(here, '../README.md'), encoding='utf-8') as f:
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
long_description = f.read()
setup(

177
ml-agents/README.md


# Unity ml-agents interface and trainers
The `mlagents` package contains two components : The low level API which allows
you to interact directly with a Unity Environment and a training component whcih
allows you to train agents in Unity Environments using our implementations of
reinforcement learning or imitation learning.
## Installation
The `ml-agents` package can be installed using:
```sh
pip install mlagents
```
or by running the following from the `ml-agents` directory of the repository:
```sh
pip install .
```
## `mlagents.envs`
The ML-Agents toolkit provides a Python API for controlling the agent simulation
loop of a environment or game built with Unity. This API is used by the ML-Agent
training algorithms (run with `mlagents-learn`), but you can also write your
Python programs using this API.
The key objects in the Python API include:
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BrainInfo** — contains all the data from agents in the simulation, such as
observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
These classes are all defined in the `ml-agents/mlagents/envs` folder of
the ML-Agents SDK.
To communicate with an agent in a Unity environment from a Python program, the
agent must either use an **External** brain or use a brain that is broadcasting
(has its **Broadcast** property set to true). Your code is expected to return
actions for agents with external brains, but can only observe broadcasting
brains (the information you receive for an agent is the same in both cases).
_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._
### Loading a Unity Environment
Python-side communication happens through `UnityEnvironment` which is located in
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
file, put the file in the same directory as `envs`. For example, if the filename
of your Unity environment is 3DBall.app, in python, run:
```python
from mlagents.env import UnityEnvironment
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
```
- `file_name` is the name of the environment binary (located in the root
directory of the python project).
- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which do not involve physics calculations,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the :arrow_forward: button in the Editor when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
### Interacting with a Unity Environment
A BrainInfo object contains the following fields:
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
the list corresponds to the n<sup>th</sup> observation of the brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
size, vector observation size)`.
- **`text_observations`** : A list of string corresponding to the agents text
observations.
- **`memories`** : A two dimensional numpy array of dimension `(batch size,
memory size)` which corresponds to the memories sent at the previous step.
- **`rewards`** : A list as long as the number of agents using the brain
containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of agents using the brain
containing `done` flags (whether or not the agent is done).
- **`max_reached`** : A list as long as the number of agents using the brain
containing true if the agents reached their max steps.
- **`agents`** : A list of the unique ids of the agents using the brain.
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
size, vector action size)` if the vector action space is continuous and
`(batch size, number of branches)` if the vector action space is discrete.
Once loaded, you can use your UnityEnvironment object, which referenced by a
variable named `env` in this example, can be used in the following way:
- **Print : `print(str(env))`**
Prints all parameters relevant to the loaded environment and the external
brains.
- **Reset : `env.reset(train_model=True, config=None)`**
Send a reset signal to the environment, and provides a dictionary mapping
brain names to BrainInfo objects.
- `train_model` indicates whether to run the environment in train (`True`) or
test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the
environment. For generic environments, `config` can be ignored. `config` is
a dictionary of strings to floats where the keys are the names of the
`resetParameters` and the values are their corresponding float values.
Define the reset parameters on the Academy Inspector window in the Unity
Editor.
- **Step : `env.step(action, memory=None, text_action=None)`**
Sends a step signal to the environment using the actions. For each brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have
multiple agents per brains.
- `memory` is an optional input that can be used to send a list of floats per
agents to be retrieved at the next step.
- `text_action` is an optional input that be used to send a single string per
agent.
Returns a dictionary mapping brain names to BrainInfo objects.
For example, to access the BrainInfo belonging to a brain called
'brain_name', and the BrainInfo field 'vector_observations':
```python
info = env.step()
brainInfo = info['brain_name']
observations = brainInfo.vector_observations
```
Note that if you have more than one external brain in the environment, you
must provide dictionaries from brain names to arrays for `action`, `memory`
and `value`. For example: If you have two external brains named `brain1` and
`brain2` each with one agent taking two continuous actions, then you can
have:
```python
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
```
Returns a dictionary mapping brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication
socket.
## `mlagents.trainers`
1. Open a command or terminal window.
2. Run
```sh
mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train <environment-name>
```
Where:
- `<trainer-config-path>` is the relative or absolute filepath of the trainer
configuration. The defaults used by environments in the ML-Agents SDK can be
found in `config/trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs
- The `--train` flag tells `mlagents-learn` to run a training session (rather
than inference)
- `<environment-name>` __(Optional)__ is the path to the Unity executable you
want to train. __Note:__ If this argument is not passed, the training
will be made through the editor.
For more detailled documentation, check out the
[ML-Agents toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md)

137
docs/Python-API.md


# Python API
The ML-Agents toolkit provides a Python API for controlling the agent simulation
loop of a environment or game built with Unity. This API is used by the ML-Agent
training algorithms (run with `mlagents-learn`), but you can also write your
Python programs using this API.
The key objects in the Python API include:
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BrainInfo** — contains all the data from agents in the simulation, such as
observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
These classes are all defined in the `ml-agents/mlagents/envs` folder of
the ML-Agents SDK.
To communicate with an agent in a Unity environment from a Python program, the
agent must either use an **External** brain or use a brain that is broadcasting
(has its **Broadcast** property set to true). Your code is expected to return
actions for agents with external brains, but can only observe broadcasting
brains (the information you receive for an agent is the same in both cases). See
[Using the Broadcast
Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
For a simple example of using the Python API to interact with a Unity
environment, see the Basic [Jupyter](Background-Jupyter.md) notebook
(`notebooks/getting-started.ipynb`), which opens an environment, runs a few
simulation steps taking random actions, and closes the environment.
_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._
## Loading a Unity Environment
Python-side communication happens through `UnityEnvironment` which is located in
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
file, put the file in the same directory as `envs`. For example, if the filename
of your Unity environment is 3DBall.app, in python, run:
```python
from mlagents.env import UnityEnvironment
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
```
- `file_name` is the name of the environment binary (located in the root
directory of the python project).
- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which do not involve physics calculations,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the :arrow_forward: button in the Editor when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
## Interacting with a Unity Environment
A BrainInfo object contains the following fields:
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
the list corresponds to the n<sup>th</sup> observation of the brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
size, vector observation size)`.
- **`text_observations`** : A list of string corresponding to the agents text
observations.
- **`memories`** : A two dimensional numpy array of dimension `(batch size,
memory size)` which corresponds to the memories sent at the previous step.
- **`rewards`** : A list as long as the number of agents using the brain
containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of agents using the brain
containing `done` flags (whether or not the agent is done).
- **`max_reached`** : A list as long as the number of agents using the brain
containing true if the agents reached their max steps.
- **`agents`** : A list of the unique ids of the agents using the brain.
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
size, vector action size)` if the vector action space is continuous and
`(batch size, number of branches)` if the vector action space is discrete.
Once loaded, you can use your UnityEnvironment object, which referenced by a
variable named `env` in this example, can be used in the following way:
- **Print : `print(str(env))`**
Prints all parameters relevant to the loaded environment and the external
brains.
- **Reset : `env.reset(train_model=True, config=None)`**
Send a reset signal to the environment, and provides a dictionary mapping
brain names to BrainInfo objects.
- `train_model` indicates whether to run the environment in train (`True`) or
test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the
environment. For generic environments, `config` can be ignored. `config` is
a dictionary of strings to floats where the keys are the names of the
`resetParameters` and the values are their corresponding float values.
Define the reset parameters on the [Academy
Inspector](Learning-Environment-Design-Academy.md#academy-properties) window
in the Unity Editor.
- **Step : `env.step(action, memory=None, text_action=None)`**
Sends a step signal to the environment using the actions. For each brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have
multiple agents per brains.
- `memory` is an optional input that can be used to send a list of floats per
agents to be retrieved at the next step.
- `text_action` is an optional input that be used to send a single string per
agent.
Returns a dictionary mapping brain names to BrainInfo objects.
For example, to access the BrainInfo belonging to a brain called
'brain_name', and the BrainInfo field 'vector_observations':
```python
info = env.step()
brainInfo = info['brain_name']
observations = brainInfo.vector_observations
```
Note that if you have more than one external brain in the environment, you
must provide dictionaries from brain names to arrays for `action`, `memory`
and `value`. For example: If you have two external brains named `brain1` and
`brain2` each with one agent taking two continuous actions, then you can
have:
```python
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
```
Returns a dictionary mapping brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication
socket.
正在加载...
取消
保存