Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure

7 年前 · e0e02ae6
--- a/README.md
+++ b/README.md
 <img src="docs/images/unity-wide.png" align="middle" width="3000"/>

+<img src="docs/images/image-banner.png" align="middle" width="3000"/>
+
 # Unity ML-Agents Toolkit (Beta)

 **The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source Unity plugin 
--- a/docs/Feature-Monitor.md
+++ b/docs/Feature-Monitor.md

 The monitor allows visualizing information related to the agents or training process within a Unity scene. 

-You can track many different things both related and unrelated to the agents themselves. To use the Monitor, call the Log function anywhere in your code :
+You can track many different things both related and unrelated to the agents themselves. By default, the Monitor is only active in the *inference* phase, so not during training. To change this behaviour, you can activate or deactivate it by calling `SetActive(boolean)`. For example to also show the monitor during training, you can call it in the `InitializeAcademy()` method of your `Academy`:
+
+```csharp
+using MLAgents;
+
+public class YourAcademy : Academy { 
+    public override void InitializeAcademy()
+    {
+        Monitor.SetActive(true);
+    }
+}
+```
+
+To add values to monitor, call the `Log` function anywhere in your code :

 ```csharp
 Monitor.Log(key, value, target)
   * *`float[]`* - The Monitor Log call can take an additional argument called `displayType` that can be either `INDEPENDENT` (default) or `PROPORTIONAL` :
   		* *`INDEPENDENT`* is used to display multiple independent floats as a histogram. The histogram will be a sequence of vertical sliders.
   		* *`PROPORTION`* is used to see the proportions between numbers. For each float in values, a rectangle of width of value divided by the sum of all values will be show. It is best for visualizing values that sum to 1.
- * *`target`* is the transform to which you want to attach information. If the transform is `null` the information will be attached to the global monitor.
+ * *`target`* is the transform to which you want to attach information. If the transform is `null` the information will be attached to the global monitor. 
+   * **NB:** When adding a target transform that is not the global monitor, make sure you have your main camera object tagged as `MainCamera` via the inspector. This is needed to properly display the text onto the screen.
--- a/docs/Learning-Environment-Create-New.md
+++ b/docs/Learning-Environment-Create-New.md

 **Note:** When you mark an agent as done, it stops its activity until it is reset. You can have the agent reset immediately, by setting the Agent.ResetOnDone property to true in the inspector or you can wait for the Academy to reset the environment. This RollerBall environment relies on the `ResetOnDone` mechanism and doesn't set a `Max Steps` limit for the Academy (so it never resets the environment).

-To encourage the agent along, we also reward it for getting closer to the target (saving the previous distance measurement between steps):
-
-```csharp
-// Getting closer
-if (distanceToTarget < previousDistance)
-{
-    AddReward(0.1f);
-}
-```
-
 It can also encourage an agent to finish a task more quickly to assign a negative reward at each step:

 ```csharp
        Done();
    }
    
-    // Getting closer
-    if (distanceToTarget < previousDistance)
-    {
-        AddReward(0.1f);
-    }
-
    // Time penalty
    AddReward(-0.05f);

        AddReward(-1.0f);
        Done();
    }
-    previousDistance = distanceToTarget;

    // Actions, size = 2
    Vector3 controlSignal = Vector3.zero;

 ## Final Editor Setup

-Now, that all the GameObjects and ML-Agent components are in place, it is time to connect everything together in the Unity Editor. This involves assigning the Brain object to the Agent and setting the Brain properties so that they are compatible with our agent code. 
+Now, that all the GameObjects and ML-Agent components are in place, it is time to connect everything together in the Unity Editor. This involves assigning the Brain object to the Agent, changing some of the Agent Components properties, and setting the Brain properties so that they are compatible with our agent code. 
+4. Change `Decision Frequency` from `1` to `5`.

 ![Assign the Brain to the RollerAgent](images/mlagents-NewTutAssignBrain.png)

--- a/python/mlagents/mlagents/learn.py
+++ b/python/mlagents/mlagents/learn.py
 # # Unity ML-Agents Toolkit
-# ## ML-Agent Learning

 import logging

 from docopt import docopt

-
+
+def run_training(sub_id, run_seed, run_options):
+    """
+    Launches training session.
+    :param sub_id: Unique id for training session.
+    :param run_seed: Random seed used for training.
+    :param run_options: Command line arguments for training.
+    """
+    # Docker Parameters
+    if run_options['--docker-target-name'] == 'Empty':
+        docker_target_name = ''
+    else:
+        docker_target_name = run_options['--docker-target-name']
+
+    # General parameters
+    run_id = run_options['--run-id']
+    load_model = run_options['--load']
+    train_model = run_options['--train']
+    save_freq = int(run_options['--save-freq'])
+    keep_checkpoints = int(run_options['--keep-checkpoints'])
+    worker_id = int(run_options['--worker-id'])
+    curriculum_file = str(run_options['--curriculum'])
+    if curriculum_file == "None":
+        curriculum_file = None
+    lesson = int(run_options['--lesson'])
+    fast_simulation = not bool(run_options['--slow'])
+    no_graphics = run_options['--no-graphics']
+
+    # Constants
+    # Assumption that this yaml is present in same dir as this file
+    base_path = os.path.dirname(__file__)
+    trainer_config_path = os.path.abspath(os.path.join(base_path, "trainer_config.yaml"))
+
+    # Create controller and begin training.
+    tc = TrainerController(run_options['<env>'], run_id + "-" + str(sub_id),
+                           save_freq, curriculum_file, fast_simulation,
+                           load_model, train_model, worker_id + sub_id,
+                           keep_checkpoints, lesson, run_seed,
+                           docker_target_name, trainer_config_path, no_graphics)
+    tc.start_learning()
+
+
 def main():
    print('''
    
      --keep-checkpoints=<n>     How many model checkpoints to keep [default: 5].
      --lesson=<n>               Start learning from this lesson [default: 0].
      --load                     Whether to load the model or randomly initialize [default: False].
-      --run-id=<path>            The sub-directory name for model and summary statistics [default: ppo].
+      --run-id=<path>            The directory name for model and summary statistics [default: ppo].
-      --worker-id=<n>            Number to add to communication port (5005). Used for multi-environment [default: 0].
-      --docker-target-name=<dt>  Docker Volume to store curriculum, executable and model files [default: Empty].
-      --no-graphics              Whether to run the Unity simulator in no-graphics mode [default: False].
+      --worker-id=<n>            Number to add to communication port (5005) [default: 0].
+      --docker-target-name=<dt>  Docker volume to store training-specific files [default: Empty].
+      --no-graphics              Whether to run the environment in no-graphics mode [default: False].
-    # Docker Parameters
-    if options['--docker-target-name'] == 'Empty':
-        docker_target_name = ''
-    else:
-        docker_target_name = options['--docker-target-name']
-
-    # General parameters
-    run_id = options['--run-id']
-    load_model = options['--load']
-    train_model = options['--train']
-    save_freq = int(options['--save-freq'])
-    keep_checkpoints = int(options['--keep-checkpoints'])
-    worker_id = int(options['--worker-id'])
-    curriculum_file = str(options['--curriculum'])
-    if curriculum_file == "None":
-        curriculum_file = None
-    lesson = int(options['--lesson'])
-    fast_simulation = not bool(options['--slow'])
-    no_graphics = options['--no-graphics']
-    trainer_config_path = options['<trainer-config-path>']
-
-
-    def run_training(sub_id, use_seed):
-        tc = TrainerController(env_path, run_id + "-" + str(sub_id), save_freq, curriculum_file, fast_simulation,
-                               load_model, train_model, worker_id + sub_id, keep_checkpoints, lesson, use_seed,
-                               docker_target_name, trainer_config_path, no_graphics)
-        tc.start_learning()
-

    if env_path is None and num_runs > 1:
        raise TrainerError("It is not possible to launch more than one concurrent training session "
    for i in range(num_runs):
        if seed == -1:
-            use_seed = np.random.randint(0, 9999)
-        else:
-            use_seed = seed
-        p = multiprocessing.Process(target=run_training, args=(i, use_seed))
+            seed = np.random.randint(0, 9999)
+        p = multiprocessing.Process(target=run_training, args=(i, seed, options))
        jobs.append(p)
        p.start()

--- a/python/mlagents/mlagents/trainers/ppo/trainer.py
+++ b/python/mlagents/mlagents/trainers/ppo/trainer.py
            if curr_info.agents != next_info.agents:
                curr_info = self.construct_curr_info(next_info)

+            if len(curr_info.agents) == 0:
+                return []
+
            if self.use_visual_obs:
                for i in range(len(curr_info.visual_observations)):
                    feed_dict[self.model.visual_in[i]] = curr_info.visual_observations[i]
--- a/docs/images/3dballhard.png
+++ b/docs/images/3dballhard.png
--- a/docs/images/bananaimitation.png
+++ b/docs/images/bananaimitation.png
--- a/docs/images/image-banner.png
+++ b/docs/images/image-banner.png