Updating docs and moving learn.py to right place.

7 年前 · bdaf7a1e
--- a/docs/Basic-Guide.md
+++ b/docs/Basic-Guide.md

 1. Open a command or terminal window.
 2. Nagivate to the folder where you installed the ML-Agents toolkit.
-3. Run `learn <trainer-config-path> --run-id=<run-identifier> --train` Where:
+3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train`
+   Where:
-    - And the `--train` tells learn.py to run a training session (rather than
-      inference)
+    - And the `--train` tells `mlagents-learn` to run a training session (rather
+      than inference)
 5. When the message _"Start training by pressing the Play button in the Unity
   Editor"_ is displayed on the screen, you can press the :arrow_forward: button
   in Unity to start training in the Editor.
 **Note**: If you're using Anaconda, don't forget to activate the ml-agents
 environment first.

-If the learn.py runs correctly and starts training, you should see something
+If `mlagents-learn` runs correctly and starts training, you should see something
 like this:

 ![Training running](images/training-running.png)
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md
 explaining it.

 To train the agents within the Ball Balance environment, we will be using the
-python package. We have provided a convenient Python wrapper script called
-`learn.py` which accepts arguments used to configure both training and inference
-phases.
+python package. We have provided a convenient script called `mlagents-learn`
+which accepts arguments used to configure both training and inference phases.

 We can use `run_id` to identify the experiment and create a folder where the
 model and summary statistics are stored. When using TensorBoard to observe the
 To summarize, go to your command line, enter the `ml-agents` directory and type:

 ```shell
-learn trainer_config.yaml --run-id=<run-identifier> --train
+mlagents-learn trainer_config.yaml --run-id=<run-identifier> --train
 ```

 When the message _"Start training by pressing the Play button in the Unity

 ### Observing Training Progress

-Once you start training using `learn.py` in the way described in the previous
+Once you start training using `mlagents-learn` in the way described in the previous
 section, the `ml-agents` directory will contain a `summaries` directory. In
 order to observe the training process in more detail, you can use TensorBoard.
 From the command line run:
--- a/docs/Installation.md
+++ b/docs/Installation.md
 ## Install Python (with Dependencies)

 In order to use ML-Agents toolkit, you need Python 3.6 along with
-the dependencies listed in the [requirements file](../requirements.txt).
+the dependencies listed in the [requirements file](../python/mlagents/requirements.txt).
 Some of the primary dependencies include:

 - [TensorFlow](Background-TensorFlow.md)
 [instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
 on installing it.

-To install dependencies, enter the top level ml-agents directory and run from
+To install dependencies, enter the `python/mlagents/` directory and run from
 the command line:

    pip install -r requirements.txt
--- a/docs/Learning-Environment-Design-External-Internal-Brains.md
+++ b/docs/Learning-Environment-Design-External-Internal-Brains.md
    
 4. Once the `environment.bytes` file is imported, drag it from the **Project** window to the **Graph Model** field of the Brain component.

-If you are using a model produced by the ML-Agents `learn.py` program, use the default values for the other Internal Brain parameters.
+If you are using a model produced by the ML-Agents `mlagents-learn` command, use the default values for the other Internal Brain parameters.

 ### Internal Brain properties

--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md
 1. Open a command or terminal window. 
 2. Nagivate to the folder where you installed ML-Agents. 
 3. Change to the python directory. 
-4. Run `python3 learn.py <env_name> --run-id=<run-identifier> --train`
+4. Run `mlagents-learn <trainer-config-file> <env_name> --run-id=<run-identifier> --train`
- `<env_name>` is the name and path to the executable you exported from Unity (without extension)
- `<run-identifier>` is a string used to separate the results of different training runs
- And the `--train` tells learn.py to run a training session (rather than inference)
+  - `<trainer-config-file>` is the filepath of the trainer configuration yaml.
+  - `<env_name>` is the name and path to the executable you exported from Unity (without extension)
+  - `<run-identifier>` is a string used to separate the results of different training runs
+- And the `--train` tells `mlagents-learn` to run a training session (rather than inference)
-```
-python3 learn.py 3DBall --run-id=firstRun --train
+```shell
+mlagents-learn 3DBall --run-id=firstRun --train
 ```

 ![Training command example](images/training-command-example.png)
-If the learn.py runs correctly and starts training, you should see something like this:
+If the `mlagents-learn` runs correctly and starts training, you should see something like this:

 ![Training running](images/training-running.png)

--- a/docs/Python-API.md
+++ b/docs/Python-API.md

 The ML-Agents toolkit provides a Python API for controlling the agent simulation
 loop of a environment or game built with Unity. This API is used by the ML-Agent
-training algorithms (run with `learn.py`), but you can also write your Python
+training algorithms (run with `mlagents-learn`), but you can also write your Python
 programs using this API.

 The key objects in the Python API include:
 - **BrainParameters** — describes the data elements in a BrainInfo object. For
  example, provides the array length of an observation in BrainInfo.

-These classes are all defined in the `mlagents/envs` folder of the ML-Agents SDK.
+These classes are all defined in the `python/mlagents/mlagents/envs` folder of
+the ML-Agents SDK.

 To communicate with an agent in a Unity environment from a Python program, the
 agent must either use an **External** brain or use a brain that is broadcasting
 ## Loading a Unity Environment

 Python-side communication happens through `UnityEnvironment` which is located in
-`mlagents/envs`. To load a Unity environment from a built binary file, put the
-file in the same directory as `envs`. For example, if the filename of your Unity
-environment is 3DBall.app, in python, run:
+`python/mlagents/mlagents/envs`. To load a Unity environment from a built binary
+file, put the file in the same directory as `envs`. For example, if the filename
+of your Unity environment is 3DBall.app, in python, run:

 ```python
 from mlagents.env import UnityEnvironment
--- a/docs/Readme.md
+++ b/docs/Readme.md

 * [API Reference](API-Reference.md)
 * [How to use the Python API](Python-API.md)
-* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
+* [Wrapping Learning Environment as a Gym](../python/gym-unity/Readme.md)
--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md
 ### Training with a Curriculum

 Once we have specified our metacurriculum and curriculums, we can launch
-`learn.py` using the `–curriculum` flag to point to the metacurriculum folder
-and PPO will train using Curriculum Learning. For example, to train agents in
-the Wall Jump environment with curriculum learning, we can run `python learn.py
--curriculum=curricula/wall-jump/ --run-id=wall-jump-curriculum --train`. We can
-then keep track of the current lessons and progresses via TensorBoard.
+`mlagents-learn` using the `–curriculum` flag to point to the metacurriculum
+folder and PPO will train using Curriculum Learning. For example, to train
+agents in the Wall Jump environment with curriculum learning, we can run
+`mlagents-learn --curriculum=curricula/wall-jump/ --run-id=wall-jump-curriculum
+--train`. We can then keep track of the current lessons and progresses via
+TensorBoard.
--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md
 3. Set the "Student" brain to External mode.
 4. Link the brains to the desired agents (one agent as the teacher and at least one agent as a student).
 5. In `trainer_config.yaml`, add an entry for the "Student" brain. Set the `trainer` parameter of this entry to `imitation`, and the `brain_to_imitate` parameter to the name of the teacher brain: "Teacher". Additionally, set `batches_per_epoch`, which controls how much training to do each moment. Increase the `max_steps` option if you'd like to keep training the agents for a longer period of time.
-6. Launch the training process with `learn --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
+6. Launch the training process with `mlagents-learn --train --slow`, and press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen
 7. From the Unity window, control the agent with the Teacher brain by providing "teacher demonstrations" of the behavior you would like to see.
 8. Watch as the agent(s) with the student brain attached begin to behave similarly to the demonstrations.
 9. Once the Student agents are exhibiting the desired behavior, end the training process with `CTL+C` from the command line.
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md

 The output of the training process is a model file containing the optimized policy. This model file is a TensorFlow data graph containing the mathematical operations and the optimized weights selected during the training process. You can use the generated model file with the Internal Brain type in your Unity project to decide the best course of action for an agent. 

-Use the Python program, `learn.py` to train your agents. This program can be found in the `python` directory of the ML-Agents SDK. The [configuration file](#training-config-file), `trainer_config.yaml` specifies the hyperparameters used during training. You can edit this file with a text editor to add a specific configuration for each brain.
+Use the command `mlagents-learn` to train your agents. This command is installed with the `mlagents` package
+and its implementation can be found at `python/mlagents/learn.py`. The [configuration file](#training-config-file), `trainer_config.yaml` specifies the hyperparameters used during training. You can edit this file with a text editor to add a specific configuration for each brain.
-## Training with learn.py
+## Training with mlagents-learn
-Use the Python `learn.py` program to train agents. `learn.py` supports training with [reinforcement learning](Background-Machine-Learning.md#reinforcement-learning), [curriculum learning](Training-Curriculum-Learning.md), and [behavioral cloning imitation learning](Training-Imitation-Learning.md).
+Use the `mlagents-learn` command to train agents. `mlagents-learn` supports training with [reinforcement learning](Background-Machine-Learning.md#reinforcement-learning), [curriculum learning](Training-Curriculum-Learning.md), and [behavioral cloning imitation learning](Training-Imitation-Learning.md).
-Run `learn.py` from the command line to launch the training process. Use the command line patterns and the `trainer_config.yaml` file to control training options.
+Run `mlagents-learn` from the command line to launch the training process. Use the command line patterns and the `trainer_config.yaml` file to control training options.
-    python3 learn.py <env_name> --run-id=<run-identifier> --train
+```shell
+mlagents-learn <trainer-config-file> <env_name> --run-id=<run-identifier> --train
+```
+
+where
-where 
+ * `<trainer-config-file>` is the filepath of the trainer configuration yaml.
 * `<env_name>`__(Optional)__ is the name (including path) of your Unity executable containing the agents to be trained. If `<env_name>` is not passed, the training will happen in the Editor. Press the :arrow_forward: button in Unity when the message _"Start training by pressing the Play button in the Unity Editor"_ is displayed on the screen.
 * `<run-identifier>` is an optional identifier you can use to identify the results of individual training runs.

 3. Navigate to the ml-agents `python` folder.
 4. Run the following to launch the training process using the path to the Unity environment you built in step 1:

-        python3 learn.py ../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train
+        mlagents-learn ../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train

 During a training session, the training program prints out and saves updates at regular intervals (specified by the `summary_freq` option). The saved statistics are grouped by the `run-id` value so you should assign a unique id to each training run if you plan to view the statistics. You can view these statistics using TensorBoard during or after training by running the following command (from the ML-Agents python directory):


 ### Command line training options

-In addition to passing the path of the Unity executable containing your training environment, you can set the following command line options when invoking `learn.py`:
+In addition to passing the path of the Unity executable containing your training environment, you can set the following command line options when invoking `mlagents-learn`:

 * `--curriculum=<file>` – Specify a curriculum JSON file for defining the lessons for curriculum training. See [Curriculum Training](Training-Curriculum-Learning.md) for more information.
 * `--keep-checkpoints=<n>` – Specify the maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the `save-freq` option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.
 * `--seed=<n>` – Specifies a number to use as a seed for the random number generator used by the training code.
 * `--slow` – Specify this option to run the Unity environment at normal, game speed. The `--slow` mode uses the **Time Scale** and **Target Frame Rate** specified in the Academy's **Inference Configuration**. By default, training runs using the speeds specified in your Academy's **Training Configuration**. See [Academy Properties](Learning-Environment-Design-Academy.md#academy-properties).
 * `--train` – Specifies whether to train model or only run in inference mode. When training, **always** use the `--train` option.
-* `--worker-id=<n>` – When you are running more than one training environment at the same time, assign each a unique worker-id number. The worker-id is added to the communication port opened between the current instance of learn.py and the ExternalCommunicator object in the Unity environment. Defaults to 0.
+* `--worker-id=<n>` – When you are running more than one training environment at the same time, assign each a unique worker-id number. The worker-id is added to the communication port opened between the current instance of `mlagents-learn` and the ExternalCommunicator object in the Unity environment. Defaults to 0.
-The training config file, `trainer_config.yaml` specifies the training method, the hyperparameters, and a few additional values to use during training. The file is divided into sections. The **default** section defines the default values for all the available settings. You can also add new sections to override these defaults to train specific Brains. Name each of these override sections after the GameObject containing the Brain component that should use these settings. (This GameObject will be a child of the Academy in your scene.) Sections for the example environments are included in the provided config file. `Learn.py` finds the config file by name and looks for it in the same directory as itself.
+The training config file, `trainer_config.yaml` specifies the training method, the hyperparameters, and a few additional values to use during training. The file is divided into sections. The **default** section defines the default values for all the available settings. You can also add new sections to override these defaults to train specific Brains. Name each of these override sections after the GameObject containing the Brain component that should use these settings. (This GameObject will be a child of the Academy in your scene.) Sections for the example environments are included in the provided config file.

 | ** Setting ** | **Description** | **Applies To Trainer**|
 | :--                | :--                       | :--                                  |
--- a/docs/Using-Docker.md
+++ b/docs/Using-Docker.md
  disk where it can read the Unity executable and store the graph. **This should
  therefore be identical to `target`.**
 - `trainer-config-path`, `train`, `run-id`: ML-Agents arguments passed to
-  `learn.py`. `trainer-config-path` is the filepath of the trainer config file,
-  `train` trains the algorithm, and `run-id` is used to tag each experiment with
-  a unique identifier. We recommend placing the trainer-config file inside
-  `unity-volume` so that the container has access to the file.
+  `mlagents-learn`. `trainer-config-path` is the filepath of the trainer config
+  file, `train` trains the algorithm, and `run-id` is used to tag each
+  experiment with a unique identifier. We recommend placing the trainer-config
+  file inside `unity-volume` so that the container has access to the file.

 To train with a `3DBall` environment executable, the command would be:

--- a/docs/Using-Tensorboard.md
+++ b/docs/Using-Tensorboard.md

 The ML-Agents toolkit saves statistics during learning session that you can view with a TensorFlow utility named, [TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard).

-The `learn.py` program saves training statistics to a folder named `summaries`, organized by the `run-id` value you assign to a training session.
+The `mlagents-learn` command saves training statistics to a folder named `summaries`, organized by the `run-id` value you assign to a training session.

 In order to observe the training process, either during training or afterward, 
 start TensorBoard:

 4. Open a browser window and navigate to [localhost:6006](http://localhost:6006).

-**Note:** If you don't assign a `run-id` identifier, `learn.py` uses the default string, "ppo". All the statistics will be saved to the same sub-folder and displayed as one session in TensorBoard. After a few runs, the displays can become difficult to interpret in this situation. You can delete the folders under the `summaries` directory to clear out old statistics.
+**Note:** If you don't assign a `run-id` identifier, `mlagents-learn` uses the default string, "ppo". All the statistics will be saved to the same sub-folder and displayed as one session in TensorBoard. After a few runs, the displays can become difficult to interpret in this situation. You can delete the folders under the `summaries` directory to clear out old statistics.
-When you run the training program, `learn.py`, you can use the `--save-freq` option to specify how frequently to save the statistics.
+When you run the training program, `mlagents-learn`, you can use the `--save-freq` option to specify how frequently to save the statistics.

 ## The ML-Agents toolkit training statistics

--- a/python/gym-unity/setup.py
+++ b/python/gym-unity/setup.py
      author_email='ML-Agents@unity3d.com',
      url='https://github.com/Unity-Technologies/ml-agents',
      packages=find_packages(),
-      install_requires = ['gym', 'unityagents']
+      install_requires = ['gym', 'mlagents']
     )
--- a/python/mlagents/setup.py
+++ b/python/mlagents/setup.py
 here = path.abspath(path.dirname(__file__))

 # Get the long description from the README file
-with open(path.join(here, 'README.md'), encoding='utf-8') as f:
+with open(path.join(here, '../../README.md'), encoding='utf-8') as f:
    long_description = f.read()

 # Arguments marked as "Required" below must be included for upload to PyPI.
--- a/python/mlagents/README.md
+++ b/python/mlagents/README.md
--- a//python/mlagents/mlagents/learn.py
+++ b//python/mlagents/mlagents/learn.py