Merge pull request #1589 from Unity-Technologies/hotfix-0.6.0a

Hotfix 0.6.0a to develop
6 年前 · 8b1f0a38
--- a/UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/ModelParamLoader.cs
+++ b/UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/ModelParamLoader.cs
            var widthBp = resolutionBp.width;
            var heightBp = resolutionBp.height;
            var pixelBp = resolutionBp.blackAndWhite ? 1 : 3;
-            var widthT = tensor.Shape[1];
-            var heightT = tensor.Shape[2];
+            var heightT = tensor.Shape[1];
+            var widthT = tensor.Shape[2];
            var pixelT = tensor.Shape[3];
            if  ((widthBp != widthT) || (heightBp != heightT) || (pixelBp != pixelT))
            {
--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md

 ## Recording Demonstrations

-It is possible to record demonstrations of agent behavior from the Unity Editor, and save them as assets. These demonstrations contain information on the observations, actions, and rewards for a given agent during the recording session. They can be managed from the Editor, as well as used for training with Offline Behavioral Cloning (see below).
+It is possible to record demonstrations of agent behavior from the Unity Editor, 
+and save them as assets. These demonstrations contain information on the 
+observations, actions, and rewards for a given agent during the recording session. 
+They can be managed from the Editor, as well as used for training with Offline 
+Behavioral Cloning (see below).
-In order to record demonstrations from an agent, add the `Demonstration Recorder` component to a GameObject in the scene which contains an `Agent` component. Once added, it is possible to name the demonstration that will be recorded from the agent.
+In order to record demonstrations from an agent, add the `Demonstration Recorder` 
+component to a GameObject in the scene which contains an `Agent` component. 
+Once added, it is possible to name the demonstration that will be recorded 
+from the agent.

 <p align="center">
  <img src="images/demo_component.png"

-When `Record` is checked, a demonstration will be created whenever the scene is played from the Editor. Depending on the complexity of the task, anywhere from a few minutes or a few hours of demonstration data may be necessary to be useful for imitation learning. When you have recorded enough data, end the Editor play session, and a `.demo` file will be created in the `Assets/Demonstrations` folder. This file contains the demonstrations. Clicking on the file will provide metadata about the demonstration in the inspector.
+When `Record` is checked, a demonstration will be created whenever the scene 
+is played from the Editor. Depending on the complexity of the task, anywhere 
+from a few minutes or a few hours of demonstration data may be necessary to 
+be useful for imitation learning. When you have recorded enough data, end 
+the Editor play session, and a `.demo` file will be created in the 
+`Assets/Demonstrations` folder. This file contains the demonstrations. 
+Clicking on the file will provide metadata about the demonstration in the 
+inspector.

 <p align="center">
  <img src="images/demo_inspector.png"

 ## Training with Behavioral Cloning

-There are a variety of possible imitation learning algorithms which can be used,
-the simplest one of them is Behavioral Cloning. It works by collecting demonstrations from a teacher, and then simply uses them to directly learn a policy, in the
-same way the supervised learning for image classification or other traditional
-Machine Learning tasks work.
+There are a variety of possible imitation learning algorithms which can 
+be used, the simplest one of them is Behavioral Cloning. It works by collecting 
+demonstrations from a teacher, and then simply uses them to directly learn a 
+policy, in the same way the supervised learning for image classification 
+or other traditional Machine Learning tasks work.
-With offline behavioral cloning, we can use demonstrations (`.demo` files) generated using the `Demonstration Recorder` as the dataset used to train a behavior.
+With offline behavioral cloning, we can use demonstrations (`.demo` files) 
+generated using the `Demonstration Recorder` as the dataset used to train a behavior.
-2. Record a set of demonstration using the `Demonstration Recorder` (see above). For illustrative purposes we will refer to this file as `AgentRecording.demo`. 
-3. Build the scene, assigning the agent a Learning Brain, and set the Brain to Control in the Broadcast Hub. For more information on Brains, see [here](Learning-Environment-Design-Brains.md).
+2. Record a set of demonstration using the `Demonstration Recorder` (see above). 
+   For illustrative purposes we will refer to this file as `AgentRecording.demo`. 
+3. Build the scene, assigning the agent a Learning Brain, and set the Brain to 
+   Control in the Broadcast Hub. For more information on Brains, see 
+   [here](Learning-Environment-Design-Brains.md).
-5. Modify the `demo_path` parameter in the file to reference the path to the demonstration file recorded in step 2. In our case this is: `./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
-6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml` as the config parameter, and include the `--run-id` and `--train` as usual. Provide your environment as the `--env` parameter if it has been compiled as standalone, or omit to train in the editor.
+5. Modify the `demo_path` parameter in the file to reference the path to the 
+   demonstration file recorded in step 2. In our case this is: 
+   `./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
+6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml` 
+   as the config parameter, and include the `--run-id` and `--train` as usual. 
+   Provide your environment as the `--env` parameter if it has been compiled 
+   as standalone, or omit to train in the editor.
-This will use the demonstration file to train a neural network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.
+This will use the demonstration file to train a neural network driven agent 
+to directly imitate the actions provided in the demonstration. The environment 
+will launch and be used for evaluating the agent's performance during training.
-It is also possible to provide demonstrations in realtime during training, without pre-recording a demonstration file. The steps to do this are as follows:
+It is also possible to provide demonstrations in realtime during training, 
+without pre-recording a demonstration file. The steps to do this are as follows:

 1. First create two Brains, one which will be the "Teacher," and the other which
   will be the "Student." We will assume that the names of the Brain
 3. The "Student" Brain must be a **Learning Brain**.
 4. The Brain Parameters of both the "Teacher" and "Student" Brains must be 
   compatible with the agent.
-5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
+5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub` 
-4. Link the Brains to the desired Agents (one Agent as the teacher and at least
+6. Link the Brains to the desired Agents (one Agent as the teacher and at least
-5. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
-   the `trainer` parameter of this entry to `imitation`, and the
+7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
+   the `trainer` parameter of this entry to `online_bc`, and the
-6. Launch the training process with `mlagents-learn config/online_bc_config.yaml
+8. Launch the training process with `mlagents-learn config/online_bc_config.yaml
-7. From the Unity window, control the Agent with the Teacher Brain by providing
+9. From the Unity window, control the Agent with the Teacher Brain by providing
-8. Watch as the Agent(s) with the student Brain attached begin to behave
+10. Watch as the Agent(s) with the student Brain attached begin to behave
-9. Once the Student Agents are exhibiting the desired behavior, end the training
+11. Once the Student Agents are exhibiting the desired behavior, end the training
-10. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the
+12. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the
    Assets folder (or a subdirectory within Assets of your choosing) , and use
    with `Learning` Brain.

--- a/ml-agents/mlagents/envs/rpc_communicator.py
+++ b/ml-agents/mlagents/envs/rpc_communicator.py
            self.server = grpc.server(ThreadPoolExecutor(max_workers=10))
            self.unity_to_external = UnityToExternalServicerImplementation()
            add_UnityToExternalServicer_to_server(self.unity_to_external, self.server)
-            self.server.add_insecure_port('localhost:' + str(self.port))
+            # Using unspecified address, which means that grpc is communicating on all IPs
+            # This is so that the docker container can connect.
+            self.server.add_insecure_port('[::]:' + str(self.port))
            self.server.start()
            self.is_open = True
        except:
--- a/ml-agents/mlagents/trainers/buffer.py
+++ b/ml-agents/mlagents/trainers/buffer.py
            AgentBufferField with the append method.
            """

+            def __init__(self):
+                self.padding_value = 0
+                super(Buffer.AgentBuffer.AgentBufferField, self).__init__()
+
+            def append(self, element, padding_value=0):
+                """
+                Adds an element to this list. Also lets you change the padding 
+                type, so that it can be set on append (e.g. action_masks should
+                be padded with 1.) 
+                :param element: The element to append to the list.
+                :param padding_value: The value used to pad when get_batch is called.
+                """
+                super(Buffer.AgentBuffer.AgentBufferField, self).append(element)
+                self.padding_value = padding_value
+
-                Ads a list of np.arrays to the end of the list of np.arrays.
+                Adds a list of np.arrays to the end of the list of np.arrays.
                :param data: The np.array list to append.
                """
                self += list(np.array(data))
                            raise BufferException("The batch size and training length requested for get_batch where"
                                                  " too large given the current number of data points.")
                        tmp_list = []
-                        padding = np.array(self[-1]) * 0
+                        padding = np.array(self[-1]) * self.padding_value
                        # The padding is made with zeros and its shape is given by the shape of the last element
                        for end in range(len(self), len(self) % training_length, -training_length)[:batch_size]:
                            tmp_list += [np.array(self[end - training_length:end])]
--- a/ml-agents/mlagents/trainers/learn.py
+++ b/ml-agents/mlagents/trainers/learn.py
 import numpy as np
 from docopt import docopt

-from .trainer_controller import TrainerController
-from .exception import TrainerError
+from mlagents.trainers.trainer_controller import TrainerController
+from mlagents.trainers.exception import TrainerError


 def run_training(sub_id, run_seed, run_options, process_queue):

    jobs = []
    run_seed = seed
-    for i in range(num_runs):
+
+    if num_runs == 1:
-        process_queue = Queue()
-        p = Process(target=run_training, args=(i, run_seed, options, process_queue))
-        jobs.append(p)
-        p.start()
-        # Wait for signal that environment has successfully launched
-        while process_queue.get() is not True:
-            continue
+        run_training(0, run_seed, options, Queue())
+    else:
+        for i in range(num_runs):
+            if seed == -1:
+                run_seed = np.random.randint(0, 10000)
+            process_queue = Queue()
+            p = Process(target=run_training, args=(i, run_seed, options, process_queue))
+            jobs.append(p)
+            p.start()
+            # Wait for signal that environment has successfully launched
+            while process_queue.get() is not True:
+                continue
+
+# For python debugger to directly run this script
+if __name__ == "__main__":
+    main()
--- a/ml-agents/mlagents/trainers/policy.py
+++ b/ml-agents/mlagents/trainers/policy.py
                clear_devices=True, initializer_nodes='', input_saver='',
                restore_op_name='save/restore_all',
                filename_tensor_name='save/Const:0')
+            logger.info('Exported ' + self.model_path + '.bytes file')

    def _process_graph(self):
        """
--- a/ml-agents/mlagents/trainers/ppo/trainer.py
+++ b/ml-agents/mlagents/trainers/ppo/trainer.py
                            epsilons[idx])
                    else:
                        self.training_buffer[agent_id]['action_mask'].append(
-                            stored_info.action_masks[idx])
+                            stored_info.action_masks[idx], padding_value=1)
                    a_dist = stored_take_action_outputs['log_probs']
                    value = stored_take_action_outputs['value']
                    self.training_buffer[agent_id]['actions'].append(actions[idx])
--- a/ml-agents/mlagents/trainers/trainer_controller.py
+++ b/ml-agents/mlagents/trainers/trainer_controller.py
 import glob
 import logging
 import shutil
+import sys
+if sys.platform.startswith('win'):
+    import win32api
+    import win32con

 import yaml
 import re
        self.keep_checkpoints = keep_checkpoints
        self.trainers = {}
        self.seed = seed
+        self.global_step = 0
        np.random.seed(self.seed)
        tf.set_random_seed(self.seed)
        self.env = UnityEnvironment(file_name=env_path,
            self.trainers[brain_name].save_model()
        self.logger.info('Saved Model')

+    def _save_model_when_interrupted(self, steps=0):
+        self.logger.info('Learning was interrupted. Please wait '
+                         'while the graph is generated.')
+        self._save_model(steps)
+
+    def _win_handler(self, event):
+        """
+        This function gets triggered after ctrl-c or ctrl-break is pressed
+        under Windows platform.
+        """
+        if event in (win32con.CTRL_C_EVENT, win32con.CTRL_BREAK_EVENT):
+            self._save_model_when_interrupted(self.global_step)
+            self._export_graph()
+            sys.exit()
+            return True
+        return False
+
    def _export_graph(self):
        """
        Exports latest saved models to .bytes format for Unity embedding.
        self._initialize_trainers(trainer_config)
        for _, t in self.trainers.items():
            self.logger.info(t)
-        global_step = 0  # This is only for saving the model
+            if sys.platform.startswith('win'):
+                # Add the _win_handler function to the windows console's handler function list
+                win32api.SetConsoleCtrlHandler(self._win_handler, True)
        try:
            while any([t.get_step <= t.get_max_steps \
                       for k, t in self.trainers.items()]) \
                    # Write training statistics to Tensorboard.
                    if self.meta_curriculum is not None:
                        trainer.write_summary(
-                            global_step,
+                            self.global_step,
-                        trainer.write_summary(global_step)
+                        trainer.write_summary(self.global_step)
-                global_step += 1
-                if global_step % self.save_freq == 0 and global_step != 0 \
+                self.global_step += 1
+                if self.global_step % self.save_freq == 0 and self.global_step != 0 \
-                    self._save_model(steps=global_step)
+                    self._save_model(steps=self.global_step)
-            if global_step != 0 and self.train_model:
-                self._save_model(steps=global_step)
+            if self.global_step != 0 and self.train_model:
+                self._save_model(steps=self.global_step)
-            print('--------------------------Now saving model--------------'
-                  '-----------')
-                self.logger.info('Learning was interrupted. Please wait '
-                                 'while the graph is generated.')
-                self._save_model(steps=global_step)
+                self._save_model_when_interrupted(steps=self.global_step)
            pass
        self.env.close()
        if self.train_model:
--- a//config/curricula/wall-jump/SmallWallJumpLearning.json
+++ b//config/curricula/wall-jump/SmallWallJumpLearning.json
--- a//config/curricula/wall-jump/BigWallJumpLearning.json
+++ b//config/curricula/wall-jump/BigWallJumpLearning.json