Fixing learn.py, trainer_controller.py, and Docker (#1164)

* Fixing learn.py, trainer_controller.py, and Docker - learn.py has been moved under trainers. - this was a two line change - learn.py will no longer be run as a main method - docopt arguments are strings by default. learn.py now uses this assumption to correctly parse arguments. - trainer_controller.py now considers the Docker volume when accepting a trainer config file path. - the Docker container now uses mlagents-learn. * Removing extraneous unity-volume ref.
7 年前 · a6f45b76
--- a/7
+++ b/7
 # xvfb is used to do CPU based rendering of Unity
 RUN apt-get install -y xvfb

-
-COPY ml-agents/requirements.txt .
-RUN pip install --trusted-host pypi.python.org -r requirements.txt
-
-COPY README.md .
 COPY ml-agents /ml-agents
 WORKDIR /ml-agents
 RUN pip install .

-ENTRYPOINT ["python", "mlagents/learn.py"]
+ENTRYPOINT ["mlagents-learn"]
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md

 Use the command `mlagents-learn` to train your agents. This command is installed
 with the `mlagents` package and its implementation can be found at
-`ml-agents/learn.py`. The [configuration file](#training-config-file),
+`ml-agents/mlagents/trainers/learn.py`. The [configuration file](#training-config-file),
 `config/trainer_config.yaml` specifies the hyperparameters used during training.
 You can edit this file with a text editor to add a specific configuration for
 each brain.
--- a/docs/Using-Docker.md
+++ b/docs/Using-Docker.md

 - Since Docker runs a container in an environment that is isolated from the host
  machine, a mounted directory in your host machine is used to share data, e.g.
-  the Unity executable, curriculum files and TensorFlow graph. For convenience,
-  we created an empty `unity-volume` directory at the root of the repository for
-  this purpose, but feel free to use any other directory. The remainder of this
-  guide assumes that the `unity-volume` directory is the one used.
+  the trainer configuration file, Unity executable, curriculum files and
+  TensorFlow graph. For convenience, we created an empty `unity-volume`
+  directory at the root of the repository for this purpose, but feel free to use
+  any other directory. The remainder of this guide assumes that the
+  `unity-volume` directory is the one used.

 ## Usage

           -p 5005:5005 \
           <image-name>:latest \
           --docker-target-name=unity-volume \
-           <trainer-config-path> \
+           <trainer-config-file> \
           --env=<environment-name> \
           --train \
           --run-id=<run-id>
 - `docker-target-name`: Tells the ML-Agents Python package what the name of the
  disk where it can read the Unity executable and store the graph. **This should
  therefore be identical to `target`.**
- `trainer-config-path`, `train`, `run-id`: ML-Agents arguments passed to
-  `mlagents-learn`. `trainer-config-path` is the filepath of the trainer config
+- `trainer-config-file`, `train`, `run-id`: ML-Agents arguments passed to
+  `mlagents-learn`. `trainer-config-file` is the filename of the trainer config
  file, `train` trains the algorithm, and `run-id` is used to tag each
  experiment with a unique identifier. We recommend placing the trainer-config
  file inside `unity-volume` so that the container has access to the file.
           -p 5005:5005 \
           balance.ball.v0.1:latest 3DBall \
           --docker-target-name=unity-volume \
-           <trainer-config-path> \
+           trainer_config.yaml \
+           --env=3DBall
           --train \
           --run-id=3dball_first_trial
 ```
--- a/ml-agents/mlagents/trainers/trainer_controller.py
+++ b/ml-agents/mlagents/trainers/trainer_controller.py
        :param no_graphics: Whether to run the Unity simulator in no-graphics
                            mode.
        """
-        self.trainer_config_path = trainer_config_path
-
        if env_path is not None:
            # Strip out executable extensions if passed
            env_path = (env_path.strip()
                        .replace('.x86', ''))

        # Recognize and use docker volume if one is passed as an argument
-        if docker_target_name == '':
+        if not docker_target_name:
+            self.trainer_config_path = trainer_config_path
+            self.trainer_config_path = \
+                '/{docker_target_name}/{trainer_config_path}'.format(
+                    docker_target_name=docker_target_name,
+                    trainer_config_path = trainer_config_path)
            self.model_path = '/{docker_target_name}/models/{run_id}'.format(
                docker_target_name=docker_target_name,
                run_id=run_id)
--- a/ml-agents/setup.py
+++ b/ml-agents/setup.py

    entry_points={
        'console_scripts': [
-            'mlagents-learn=mlagents.learn:main',
+            'mlagents-learn=mlagents.trainers.learn:main',
        ],
    },
 )
--- a/ml-agents/mlagents/trainers/learn.py
+++ b/ml-agents/mlagents/trainers/learn.py
+# # Unity ML-Agents Toolkit
+
+import logging
+
+import os
+import multiprocessing
+import numpy as np
+from docopt import docopt
+
+from .trainer_controller import TrainerController
+from .exception import TrainerError
+
+
+def run_training(sub_id, run_seed, run_options):
+    """
+    Launches training session.
+    :param sub_id: Unique id for training session.
+    :param run_seed: Random seed used for training.
+    :param run_options: Command line arguments for training.
+    """
+    # Docker Parameters
+    docker_target_name = (run_options['--docker-target-name']
+        if run_options['--docker-target-name'] != 'None' else None)
+
+    # General parameters
+    env_path = (run_options['--env']
+        if run_options['--env'] != 'None' else None)
+    run_id = run_options['--run-id']
+    load_model = run_options['--load']
+    train_model = run_options['--train']
+    save_freq = int(run_options['--save-freq'])
+    keep_checkpoints = int(run_options['--keep-checkpoints'])
+    worker_id = int(run_options['--worker-id'])
+    curriculum_file = (run_options['--curriculum']
+        if run_options['--curriculum'] != 'None' else None)
+    lesson = int(run_options['--lesson'])
+    fast_simulation = not bool(run_options['--slow'])
+    no_graphics = run_options['--no-graphics']
+    trainer_config_path = run_options['<trainer-config-path>']
+
+    # Create controller and begin training.
+    tc = TrainerController(env_path, run_id + '-' + str(sub_id),
+                           save_freq, curriculum_file, fast_simulation,
+                           load_model, train_model, worker_id + sub_id,
+                           keep_checkpoints, lesson, run_seed,
+                           docker_target_name, trainer_config_path, no_graphics)
+    tc.start_learning()
+
+
+def main():
+    try:
+        print('''
+    
+                        ▄▄▄▓▓▓▓
+                   ╓▓▓▓▓▓▓█▓▓▓▓▓
+              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
+            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
+          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
+        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
+        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
+          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
+            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
+               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
+                   `▀█▓▓▓▓▓▓▓▓▓▌
+                        ¬`▀▀▀█▓
+
+        ''')
+    except:
+        print('\n\n\tUnity Technologies\n')
+
+    logger = logging.getLogger('mlagents.trainers')
+    _USAGE = '''
+    Usage:
+      mlagents-learn <trainer-config-path> [options]
+      mlagents-learn --help
+
+    Options:
+      --env=<file>               Name of the Unity executable [default: None].
+      --curriculum=<directory>   Curriculum json directory for environment [default: None].
+      --keep-checkpoints=<n>     How many model checkpoints to keep [default: 5].
+      --lesson=<n>               Start learning from this lesson [default: 0].
+      --load                     Whether to load the model or randomly initialize [default: False].
+      --run-id=<path>            The directory name for model and summary statistics [default: ppo].
+      --num-runs=<n>             Number of concurrent training sessions [default: 1]. 
+      --save-freq=<n>            Frequency at which to save model [default: 50000].
+      --seed=<n>                 Random seed used for training [default: -1].
+      --slow                     Whether to run the game at training speed [default: False].
+      --train                    Whether to train model, or only run inference [default: False].
+      --worker-id=<n>            Number to add to communication port (5005) [default: 0].
+      --docker-target-name=<dt>  Docker volume to store training-specific files [default: None].
+      --no-graphics              Whether to run the environment in no-graphics mode [default: False].
+    '''
+
+    options = docopt(_USAGE)
+    logger.info(options)
+    num_runs = int(options['--num-runs'])
+    seed = int(options['--seed'])
+
+    if options['--env'] == 'None' and num_runs > 1:
+        raise TrainerError('It is not possible to launch more than one concurrent training session '
+                           'when training from the editor.')
+
+    jobs = []
+    run_seed = seed
+    for i in range(num_runs):
+        if seed == -1:
+            run_seed = np.random.randint(0, 10000)
+        p = multiprocessing.Process(target=run_training, args=(i, run_seed, options))
+        jobs.append(p)
+        p.start()
--- a/ml-agents/mlagents/learn.py
+++ b/ml-agents/mlagents/learn.py
-# # Unity ML-Agents Toolkit
-
-import logging
-
-import os
-import multiprocessing
-import numpy as np
-from docopt import docopt
-
-from mlagents.trainers.trainer_controller import TrainerController
-from mlagents.trainers.exception import TrainerError
-
-
-def run_training(sub_id, run_seed, run_options):
-    """
-    Launches training session.
-    :param sub_id: Unique id for training session.
-    :param run_seed: Random seed used for training.
-    :param run_options: Command line arguments for training.
-    """
-    # Docker Parameters
-    if run_options['--docker-target-name'] == 'Empty':
-        docker_target_name = ''
-    else:
-        docker_target_name = run_options['--docker-target-name']
-
-    # General parameters
-    env_path = run_options['--env']
-    if env_path == 'None':
-        env_path = None
-    run_id = run_options['--run-id']
-    load_model = run_options['--load']
-    train_model = run_options['--train']
-    save_freq = int(run_options['--save-freq'])
-    keep_checkpoints = int(run_options['--keep-checkpoints'])
-    worker_id = int(run_options['--worker-id'])
-    curriculum_file = str(run_options['--curriculum'])
-    if curriculum_file == 'None':
-        curriculum_file = None
-    lesson = int(run_options['--lesson'])
-    fast_simulation = not bool(run_options['--slow'])
-    no_graphics = run_options['--no-graphics']
-    trainer_config_path = run_options['<trainer-config-path>']
-
-    # Create controller and begin training.
-    tc = TrainerController(env_path, run_id + '-' + str(sub_id),
-                           save_freq, curriculum_file, fast_simulation,
-                           load_model, train_model, worker_id + sub_id,
-                           keep_checkpoints, lesson, run_seed,
-                           docker_target_name, trainer_config_path, no_graphics)
-    tc.start_learning()
-
-
-def main():
-    try:
-        print('''
-    
-                        ▄▄▄▓▓▓▓
-                   ╓▓▓▓▓▓▓█▓▓▓▓▓
-              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
-            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
-          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
-        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
-        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
-          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
-            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
-               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
-                   `▀█▓▓▓▓▓▓▓▓▓▌
-                        ¬`▀▀▀█▓
-
-        ''')
-    except:
-        print('\n\n\tUnity Technologies\n')
-
-    logger = logging.getLogger('mlagents.learn')
-    _USAGE = '''
-    Usage:
-      mlagents-learn <trainer-config-path> [options]
-      mlagents-learn --help
-
-    Options:
-      --env=<file>               Name of the Unity executable [default: None].
-      --curriculum=<directory>   Curriculum json directory for environment [default: None].
-      --keep-checkpoints=<n>     How many model checkpoints to keep [default: 5].
-      --lesson=<n>               Start learning from this lesson [default: 0].
-      --load                     Whether to load the model or randomly initialize [default: False].
-      --run-id=<path>            The directory name for model and summary statistics [default: ppo].
-      --num-runs=<n>             Number of concurrent training sessions [default: 1]. 
-      --save-freq=<n>            Frequency at which to save model [default: 50000].
-      --seed=<n>                 Random seed used for training [default: -1].
-      --slow                     Whether to run the game at training speed [default: False].
-      --train                    Whether to train model, or only run inference [default: False].
-      --worker-id=<n>            Number to add to communication port (5005) [default: 0].
-      --docker-target-name=<dt>  Docker volume to store training-specific files [default: Empty].
-      --no-graphics              Whether to run the environment in no-graphics mode [default: False].
-    '''
-
-    options = docopt(_USAGE)
-    logger.info(options)
-    num_runs = int(options['--num-runs'])
-    seed = int(options['--seed'])
-
-    if options['--env'] == 'None' and num_runs > 1:
-        raise TrainerError('It is not possible to launch more than one concurrent training session '
-                           'when training from the editor.')
-
-    jobs = []
-    run_seed = seed
-    for i in range(num_runs):
-        if seed == -1:
-            run_seed = np.random.randint(0, 10000)
-        p = multiprocessing.Process(target=run_training, args=(i, run_seed, options))
-        jobs.append(p)
-        p.start()
-
-
-if __name__ == '__main__':
-    main()