浏览代码

Updating docs and moving learn.py to right place.

/develop-generalizationTraining-TrainerController
Deric Pang 7 年前
当前提交
bdaf7a1e
共有 16 个文件被更改,包括 58 次插入50 次删除
  1. 9
      docs/Basic-Guide.md
  2. 9
      docs/Getting-Started-with-Balance-Ball.md
  3. 4
      docs/Installation.md
  4. 2
      docs/Learning-Environment-Design-External-Internal-Brains.md
  5. 15
      docs/Learning-Environment-Executable.md
  6. 11
      docs/Python-API.md
  7. 2
      docs/Readme.md
  8. 11
      docs/Training-Curriculum-Learning.md
  9. 2
      docs/Training-Imitation-Learning.md
  10. 25
      docs/Training-ML-Agents.md
  11. 8
      docs/Using-Docker.md
  12. 6
      docs/Using-Tensorboard.md
  13. 2
      python/gym-unity/setup.py
  14. 2
      python/mlagents/setup.py
  15. 0
      python/mlagents/README.md
  16. 0
      /python/mlagents/mlagents/learn.py

9
docs/Basic-Guide.md


1. Open a command or terminal window.
2. Nagivate to the folder where you installed the ML-Agents toolkit.
3. Run `learn <trainer-config-path> --run-id=<run-identifier> --train` Where:
3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train`
Where:
- And the `--train` tells learn.py to run a training session (rather than
inference)
- And the `--train` tells `mlagents-learn` to run a training session (rather
than inference)
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.

**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.
If the learn.py runs correctly and starts training, you should see something
If `mlagents-learn` runs correctly and starts training, you should see something
like this:
![Training running](images/training-running.png)

9
docs/Getting-Started-with-Balance-Ball.md


explaining it.
To train the agents within the Ball Balance environment, we will be using the
python package. We have provided a convenient Python wrapper script called
`learn.py` which accepts arguments used to configure both training and inference
phases.
python package. We have provided a convenient script called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
We can use `run_id` to identify the experiment and create a folder where the
model and summary statistics are stored. When using TensorBoard to observe the

To summarize, go to your command line, enter the `ml-agents` directory and type:
```shell
learn trainer_config.yaml --run-id=<run-identifier> --train
mlagents-learn trainer_config.yaml --run-id=<run-identifier> --train
```
When the message _"Start training by pressing the Play button in the Unity

### Observing Training Progress
Once you start training using `learn.py` in the way described in the previous
Once you start training using `mlagents-learn` in the way described in the previous
section, the `ml-agents` directory will contain a `summaries` directory. In
order to observe the training process in more detail, you can use TensorBoard.
From the command line run:

4
docs/Installation.md


## Install Python (with Dependencies)
In order to use ML-Agents toolkit, you need Python 3.6 along with
the dependencies listed in the [requirements file](../requirements.txt).
the dependencies listed in the [requirements file](../python/mlagents/requirements.txt).
Some of the primary dependencies include:
- [TensorFlow](Background-TensorFlow.md)

[instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
on installing it.
To install dependencies, enter the top level ml-agents directory and run from
To install dependencies, enter the `python/mlagents/` directory and run from
the command line:
pip install -r requirements.txt

2
docs/Learning-Environment-Design-External-Internal-Brains.md


4. Once the `environment.bytes` file is imported, drag it from the **Project** window to the **Graph Model** field of the Brain component.
If you are using a model produced by the ML-Agents `learn.py` program, use the default values for the other Internal Brain parameters.
If you are using a model produced by the ML-Agents `mlagents-learn` command, use the default values for the other Internal Brain parameters.
### Internal Brain properties

15
docs/Learning-Environment-Executable.md


1. Open a command or terminal window.
2. Nagivate to the folder where you installed ML-Agents.
3. Change to the python directory.
4. Run `python3 learn.py <env_name> --run-id=<run-identifier> --train`
4. Run `mlagents-learn <trainer-config-file> <env_name> --run-id=<run-identifier> --train`
- `<env_name>` is the name and path to the executable you exported from Unity (without extension)
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells learn.py to run a training session (rather than inference)
- `<trainer-config-file>` is the filepath of the trainer configuration yaml.
- `<env_name>` is the name and path to the executable you exported from Unity (without extension)
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells `mlagents-learn` to run a training session (rather than inference)
```
python3 learn.py 3DBall --run-id=firstRun --train
```shell
mlagents-learn 3DBall --run-id=firstRun --train
```
![Training command example](images/training-command-example.png)

If the learn.py runs correctly and starts training, you should see something like this:
If the `mlagents-learn` runs correctly and starts training, you should see something like this:
![Training running](images/training-running.png)

11
docs/Python-API.md


The ML-Agents toolkit provides a Python API for controlling the agent simulation
loop of a environment or game built with Unity. This API is used by the ML-Agent
training algorithms (run with `learn.py`), but you can also write your Python
training algorithms (run with `mlagents-learn`), but you can also write your Python
programs using this API.
The key objects in the Python API include:

- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
These classes are all defined in the `mlagents/envs` folder of the ML-Agents SDK.
These classes are all defined in the `python/mlagents/mlagents/envs` folder of
the ML-Agents SDK.
To communicate with an agent in a Unity environment from a Python program, the
agent must either use an **External** brain or use a brain that is broadcasting

## Loading a Unity Environment
Python-side communication happens through `UnityEnvironment` which is located in
`mlagents/envs`. To load a Unity environment from a built binary file, put the
file in the same directory as `envs`. For example, if the filename of your Unity
environment is 3DBall.app, in python, run:
`python/mlagents/mlagents/envs`. To load a Unity environment from a built binary
file, put the file in the same directory as `envs`. For example, if the filename
of your Unity environment is 3DBall.app, in python, run:
```python
from mlagents.env import UnityEnvironment

2
docs/Readme.md


* [API Reference](API-Reference.md)
* [How to use the Python API](Python-API.md)
* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
* [Wrapping Learning Environment as a Gym](../python/gym-unity/Readme.md)

11
docs/Training-Curriculum-Learning.md


### Training with a Curriculum
Once we have specified our metacurriculum and curriculums, we can launch
`learn.py` using the `–curriculum` flag to point to the metacurriculum folder
and PPO will train using Curriculum Learning. For example, to train agents in
the Wall Jump environment with curriculum learning, we can run `python learn.py
--curriculum=curricula/wall-jump/ --run-id=wall-jump-curriculum --train`. We can
then keep track of the current lessons and progresses via TensorBoard.
`mlagents-learn` using the `–curriculum` flag to point to the metacurriculum
folder and PPO will train using Curriculum Learning. For example, to train
agents in the Wall Jump environment with curriculum learning, we can run
`mlagents-learn --curriculum=curricula/wall-jump/ --run-id=wall-jump-curriculum
--train`. We can then keep track of the current lessons and progresses via
TensorBoard.

2
docs/Training-Imitation-Learning.md


3. Set the "Student" brain to External mode.
4. Link the brains to the desired agents (one agent as the teacher and at least one agent as a student).
5. In `trainer_config.yaml`, add an entry for the "Student" brain. Set the `trainer` parameter of this entry to `imitation`, and the `brain_to_imitate` parameter to the name of the teacher brain: "Teacher". Additionally, set `batches_per_epoch`, which controls how much training to do each moment. Increase the `max_steps` option if you'd like to keep training the agents for a longer period of time.
6. Launch the training process with `learn --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
6. Launch the training process with `mlagents-learn --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
7. From the Unity window, control the agent with the Teacher brain by providing "teacher demonstrations" of the behavior you would like to see.
8. Watch as the agent(s) with the student brain attached begin to behave similarly to the demonstrations.
9. Once the Student agents are exhibiting the desired behavior, end the training process with `CTL+C` from the command line.

25
docs/Training-ML-Agents.md


The output of the training process is a model file containing the optimized policy. This model file is a TensorFlow data graph containing the mathematical operations and the optimized weights selected during the training process. You can use the generated model file with the Internal Brain type in your Unity project to decide the best course of action for an agent.
Use the Python program, `learn.py` to train your agents. This program can be found in the `python` directory of the ML-Agents SDK. The [configuration file](#training-config-file), `trainer_config.yaml` specifies the hyperparameters used during training. You can edit this file with a text editor to add a specific configuration for each brain.
Use the command `mlagents-learn` to train your agents. This command is installed with the `mlagents` package
and its implementation can be found at `python/mlagents/learn.py`. The [configuration file](#training-config-file), `trainer_config.yaml` specifies the hyperparameters used during training. You can edit this file with a text editor to add a specific configuration for each brain.
## Training with learn.py
## Training with mlagents-learn
Use the Python `learn.py` program to train agents. `learn.py` supports training with [reinforcement learning](Background-Machine-Learning.md#reinforcement-learning), [curriculum learning](Training-Curriculum-Learning.md), and [behavioral cloning imitation learning](Training-Imitation-Learning.md).
Use the `mlagents-learn` command to train agents. `mlagents-learn` supports training with [reinforcement learning](Background-Machine-Learning.md#reinforcement-learning), [curriculum learning](Training-Curriculum-Learning.md), and [behavioral cloning imitation learning](Training-Imitation-Learning.md).
Run `learn.py` from the command line to launch the training process. Use the command line patterns and the `trainer_config.yaml` file to control training options.
Run `mlagents-learn` from the command line to launch the training process. Use the command line patterns and the `trainer_config.yaml` file to control training options.
python3 learn.py <env_name> --run-id=<run-identifier> --train
```shell
mlagents-learn <trainer-config-file> <env_name> --run-id=<run-identifier> --train
```
where
where
* `<trainer-config-file>` is the filepath of the trainer configuration yaml.
* `<env_name>`__(Optional)__ is the name (including path) of your Unity executable containing the agents to be trained. If `<env_name>` is not passed, the training will happen in the Editor. Press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen.
* `<run-identifier>` is an optional identifier you can use to identify the results of individual training runs.

3. Navigate to the ml-agents `python` folder.
4. Run the following to launch the training process using the path to the Unity environment you built in step 1:
python3 learn.py ../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train
mlagents-learn ../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train
During a training session, the training program prints out and saves updates at regular intervals (specified by the `summary_freq` option). The saved statistics are grouped by the `run-id` value so you should assign a unique id to each training run if you plan to view the statistics. You can view these statistics using TensorBoard during or after training by running the following command (from the ML-Agents python directory):

### Command line training options
In addition to passing the path of the Unity executable containing your training environment, you can set the following command line options when invoking `learn.py`:
In addition to passing the path of the Unity executable containing your training environment, you can set the following command line options when invoking `mlagents-learn`:
* `--curriculum=<file>` – Specify a curriculum JSON file for defining the lessons for curriculum training. See [Curriculum Training](Training-Curriculum-Learning.md) for more information.
* `--keep-checkpoints=<n>` – Specify the maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the `save-freq` option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.

* `--seed=<n>` – Specifies a number to use as a seed for the random number generator used by the training code.
* `--slow` – Specify this option to run the Unity environment at normal, game speed. The `--slow` mode uses the **Time Scale** and **Target Frame Rate** specified in the Academy's **Inference Configuration**. By default, training runs using the speeds specified in your Academy's **Training Configuration**. See [Academy Properties](Learning-Environment-Design-Academy.md#academy-properties).
* `--train` – Specifies whether to train model or only run in inference mode. When training, **always** use the `--train` option.
* `--worker-id=<n>` – When you are running more than one training environment at the same time, assign each a unique worker-id number. The worker-id is added to the communication port opened between the current instance of learn.py and the ExternalCommunicator object in the Unity environment. Defaults to 0.
* `--worker-id=<n>` – When you are running more than one training environment at the same time, assign each a unique worker-id number. The worker-id is added to the communication port opened between the current instance of `mlagents-learn` and the ExternalCommunicator object in the Unity environment. Defaults to 0.
The training config file, `trainer_config.yaml` specifies the training method, the hyperparameters, and a few additional values to use during training. The file is divided into sections. The **default** section defines the default values for all the available settings. You can also add new sections to override these defaults to train specific Brains. Name each of these override sections after the GameObject containing the Brain component that should use these settings. (This GameObject will be a child of the Academy in your scene.) Sections for the example environments are included in the provided config file. `Learn.py` finds the config file by name and looks for it in the same directory as itself.
The training config file, `trainer_config.yaml` specifies the training method, the hyperparameters, and a few additional values to use during training. The file is divided into sections. The **default** section defines the default values for all the available settings. You can also add new sections to override these defaults to train specific Brains. Name each of these override sections after the GameObject containing the Brain component that should use these settings. (This GameObject will be a child of the Academy in your scene.) Sections for the example environments are included in the provided config file.
| ** Setting ** | **Description** | **Applies To Trainer**|
| :-- | :-- | :-- |

8
docs/Using-Docker.md


disk where it can read the Unity executable and store the graph. **This should
therefore be identical to `target`.**
- `trainer-config-path`, `train`, `run-id`: ML-Agents arguments passed to
`learn.py`. `trainer-config-path` is the filepath of the trainer config file,
`train` trains the algorithm, and `run-id` is used to tag each experiment with
a unique identifier. We recommend placing the trainer-config file inside
`unity-volume` so that the container has access to the file.
`mlagents-learn`. `trainer-config-path` is the filepath of the trainer config
file, `train` trains the algorithm, and `run-id` is used to tag each
experiment with a unique identifier. We recommend placing the trainer-config
file inside `unity-volume` so that the container has access to the file.
To train with a `3DBall` environment executable, the command would be:

6
docs/Using-Tensorboard.md


The ML-Agents toolkit saves statistics during learning session that you can view with a TensorFlow utility named, [TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard).
The `learn.py` program saves training statistics to a folder named `summaries`, organized by the `run-id` value you assign to a training session.
The `mlagents-learn` command saves training statistics to a folder named `summaries`, organized by the `run-id` value you assign to a training session.
In order to observe the training process, either during training or afterward,
start TensorBoard:

4. Open a browser window and navigate to [localhost:6006](http://localhost:6006).
**Note:** If you don't assign a `run-id` identifier, `learn.py` uses the default string, "ppo". All the statistics will be saved to the same sub-folder and displayed as one session in TensorBoard. After a few runs, the displays can become difficult to interpret in this situation. You can delete the folders under the `summaries` directory to clear out old statistics.
**Note:** If you don't assign a `run-id` identifier, `mlagents-learn` uses the default string, "ppo". All the statistics will be saved to the same sub-folder and displayed as one session in TensorBoard. After a few runs, the displays can become difficult to interpret in this situation. You can delete the folders under the `summaries` directory to clear out old statistics.
When you run the training program, `learn.py`, you can use the `--save-freq` option to specify how frequently to save the statistics.
When you run the training program, `mlagents-learn`, you can use the `--save-freq` option to specify how frequently to save the statistics.
## The ML-Agents toolkit training statistics

2
python/gym-unity/setup.py


author_email='ML-Agents@unity3d.com',
url='https://github.com/Unity-Technologies/ml-agents',
packages=find_packages(),
install_requires = ['gym', 'unityagents']
install_requires = ['gym', 'mlagents']
)

2
python/mlagents/setup.py


here = path.abspath(path.dirname(__file__))
# Get the long description from the README file
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
with open(path.join(here, '../../README.md'), encoding='utf-8') as f:
long_description = f.read()
# Arguments marked as "Required" below must be included for upload to PyPI.

0
python/mlagents/README.md

/python/mlagents/learn.py → /python/mlagents/mlagents/learn.py

正在加载...
取消
保存