Merge branch 'internal-policy-ghost' into soccer-2v1

5 年前 · 80469267
--- a/Project/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs
+++ b/Project/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs

    public override float[] Heuristic()
    {
-        var action = new float[2];
+        var action = new float[3];
-        action[0] = Input.GetAxis("Horizontal");
-        action[1] = Input.GetKey(KeyCode.Space) ? 1f : 0f;
+        action[0] = Input.GetAxis("Horizontal");    // Racket Movement
+        action[1] = Input.GetKey(KeyCode.Space) ? 1f : 0f;   // Racket Jumping
+        action[2] = Input.GetAxis("Vertical");   // Racket Rotation  
        return action;
    }

--- a/com.unity.ml-agents/CHANGELOG.md
+++ b/com.unity.ml-agents/CHANGELOG.md
 - The Jupyter notebooks have been removed from the repository.
 - Introduced the `SideChannelUtils` to register, unregister and access side channels.
 - `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
+ - Added ability to start training (initialize model weights) from a previous run ID. (#3710)

 ### Minor Changes
 - Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
 specified, you will not be able to continue with training. Use `--force` to force ML-Agents to
 overwrite the existing data.

+Alternatively, you might want to start a new training run but _initialize_ it using an already-trained
+model. You may want to do this, for instance, if your environment changed and you want
+a new model, but the old behavior is still better than random. You can do this by specifying `--initialize-from=<run-identifier>`, where `<run-identifier>` is the old run ID.
+
 ### Command Line Training Options

 In addition to passing the path of the Unity executable containing your training
  as the current agents in your scene.
 * `--force`: Attempting to train a model with a run-id that has been used before will
  throw an error. Use `--force` to force-overwrite this run-id's summary and model data.
+* `--initialize-from=<run-identifier>`: Specify an old run-id here to initialize your model from
+  a previously trained model. Note that the previously saved models _must_ have the same behavior
+  parameters as your current environment.
 * `--no-graphics`: Specify this option to run the Unity executable in
  `-batchmode` and doesn't initialize the graphics driver. Use this only if your
  training doesn't involve visual observations (reading from Pixels). See
 | train_interval       | How often to update the agent.                                                                                                                                                          | SAC                      |
 | num_update           | Number of mini-batches to update the agent with during each update.                                                                                                                     | SAC                      |
 | use_recurrent        | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).                                                                                       | PPO, SAC             |
+| init_path        | Initialize trainer from a previously saved model.                                                                                       | PPO, SAC             |

 \*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral Cloning (Imitation), GAIL = Generative Adversarial Imitaiton Learning

--- a/docs/Training-PPO.md
+++ b/docs/Training-PPO.md

 Typical Range: Approximately equal to PPO's `buffer_size`

+### (Optional) Advanced: Initialize Model Path
+
+`init_path` can be specified to initialize your model from a previous run before starting.
+Note that the prior run should have used the same trainer configurations as the current run,
+and have been saved with the same version of ML-Agents. You should provide the full path
+to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.
+
+This option is provided in case you want to initialize different behaviors from different runs;
+in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize
+all models from the same run.
+
 ## Training Statistics

 To view training statistics, use TensorBoard. For information on launching and
--- a/docs/Training-SAC.md
+++ b/docs/Training-SAC.md

 Typical Range (Discrete): `32` - `512`

+### (Optional) Advanced: Initialize Model Path
+
+`init_path` can be specified to initialize your model from a previous run before starting.
+Note that the prior run should have used the same trainer configurations as the current run,
+and have been saved with the same version of ML-Agents. You should provide the full path
+to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.
+
+This option is provided in case you want to initialize different behaviors from different runs;
+in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize
+all models from the same run.
+
 ## Training Statistics

 To view training statistics, use TensorBoard. For information on launching and
--- a/ml-agents/mlagents/trainers/learn.py
+++ b/ml-agents/mlagents/trainers/learn.py
        default=False,
        dest="force",
        action="store_true",
-        help="Force-overwrite existing models and summaries for a run-id that has been used "
+        help="Force-overwrite existing models and summaries for a run ID that has been used "
-        help="The directory name for model and summary statistics",
+        help="The run identifier for model and summary statistics.",
+    )
+    argparser.add_argument(
+        "--initialize-from",
+        metavar="RUN_ID",
+        default=None,
+        help="Specify a previously saved run ID from which to initialize the model from. "
+        "This can be used, for instance, to fine-tune an existing model on a new environment. ",
    )
    argparser.add_argument(
        "--save-freq", default=50000, type=int, help="Frequency at which to save model"
        dest="inference",
        action="store_true",
        help="Run in Python inference mode (don't train). Use with --resume to load a model trained with an "
-        "existing run-id.",
+        "existing run ID.",
    )
    argparser.add_argument(
        "--base-port",
    seed: int = parser.get_default("seed")
    env_path: Optional[str] = parser.get_default("env_path")
    run_id: str = parser.get_default("run_id")
+    initialize_from: str = parser.get_default("initialize_from")
    load_model: bool = parser.get_default("load_model")
    resume: bool = parser.get_default("resume")
    force: bool = parser.get_default("force")
    """
    with hierarchical_timer("run_training.setup"):
        model_path = f"./models/{options.run_id}"
+        maybe_init_path = (
+            f"./models/{options.initialize_from}" if options.initialize_from else None
+        )
        summaries_dir = "./summaries"
        port = options.base_port

            ],
        )
        handle_existing_directories(
-            model_path, summaries_dir, options.resume, options.force
+            model_path, summaries_dir, options.resume, options.force, maybe_init_path
        )
        tb_writer = TensorboardWriter(summaries_dir, clear_past_data=not options.resume)
        gauge_write = GaugeWriter()
            not options.inference,
            options.resume,
            run_seed,
+            maybe_init_path,
            maybe_meta_curriculum,
            options.multi_gpu,
        )
--- a/ml-agents/mlagents/trainers/policy/tf_policy.py
+++ b/ml-agents/mlagents/trainers/policy/tf_policy.py
        if self.use_continuous_act:
            self.num_branches = self.brain.vector_action_space_size[0]
        self.model_path = trainer_parameters["model_path"]
+        self.initialize_path = trainer_parameters.get("init_path", None)
        self.keep_checkpoints = trainer_parameters.get("keep_checkpoints", 5)
        self.graph = tf.Graph()
        self.sess = tf.Session(
            init = tf.global_variables_initializer()
            self.sess.run(init)

-    def _load_graph(self):
+    def _load_graph(self, model_path: str, reset_global_steps: bool = False) -> None:
-            logger.info("Loading Model for brain {}".format(self.brain.brain_name))
-            ckpt = tf.train.get_checkpoint_state(self.model_path)
+            logger.info(
+                "Loading model for brain {} from {}.".format(
+                    self.brain.brain_name, model_path
+                )
+            )
+            ckpt = tf.train.get_checkpoint_state(model_path)
-                    "--run-id. and that the previous run you are resuming from had the same "
-                    "behavior names.".format(self.model_path)
+                    "--run-id and that the previous run you are loading from had the same "
+                    "behavior names.".format(model_path)
-            self.saver.restore(self.sess, ckpt.model_checkpoint_path)
+            try:
+                self.saver.restore(self.sess, ckpt.model_checkpoint_path)
+            except tf.errors.NotFoundError:
+                raise UnityPolicyException(
+                    "The model {0} was found but could not be loaded. Make "
+                    "sure the model is from the same version of ML-Agents, has the same behavior parameters, "
+                    "and is using the same trainer configuration as the current run.".format(
+                        model_path
+                    )
+                )
+            if reset_global_steps:
+                logger.info(
+                    "Starting training from step 0 and saving to {}.".format(
+                        self.model_path
+                    )
+                )
+            else:
+                logger.info(
+                    "Resuming training from step {}.".format(self.get_current_step())
+                )
-        if self.load:
-            self._load_graph()
+        # If there is an initialize path, load from that. Else, load from the set model path.
+        # If load is set to True, don't reset steps to 0. Else, do. This allows a user to,
+        # e.g., resume from an initialize path.
+        reset_steps = not self.load
+        if self.initialize_path is not None:
+            self._load_graph(self.initialize_path, reset_global_steps=reset_steps)
+        elif self.load:
+            self._load_graph(self.model_path, reset_global_steps=reset_steps)
        else:
            self._initialize_graph()

        """
        step = self.sess.run(self.global_step)
        return step
+
+    def _set_step(self, step: int) -> int:
+        """
+        Sets current model step to step without creating additional ops.
+        :param step: Step to set the current model step to.
+        :return: The step the model was set to.
+        """
+        current_step = self.get_current_step()
+        # Increment a positive or negative number of steps.
+        return self.increment_step(step - current_step)

    def increment_step(self, n_steps):
        """
--- a/ml-agents/mlagents/trainers/tests/test_learn.py
+++ b/ml-agents/mlagents/trainers/tests/test_learn.py
                None,
            )
            handle_dir_mock.assert_called_once_with(
-                "./models/ppo", "./summaries", False, False
+                "./models/ppo", "./summaries", False, False, None
            )
    StatsReporter.writers.clear()  # make sure there aren't any writers as added by learn.py

--- a/ml-agents/mlagents/trainers/tests/test_nn_policy.py
+++ b/ml-agents/mlagents/trainers/tests/test_nn_policy.py
 import pytest
+import os
+from typing import Dict, Any

 import numpy as np
 from mlagents.tf_utils import tf
 NUM_AGENTS = 12


-def create_policy_mock(dummy_config, use_rnn, use_discrete, use_visual):
+def create_policy_mock(
+    dummy_config: Dict[str, Any],
+    use_rnn: bool = False,
+    use_discrete: bool = True,
+    use_visual: bool = False,
+    load: bool = False,
+    seed: int = 0,
+) -> NNPolicy:
    mock_brain = mb.setup_mock_brain(
        use_discrete,
        use_visual,
    trainer_parameters = dummy_config
    trainer_parameters["keep_checkpoints"] = 3
    trainer_parameters["use_recurrent"] = use_rnn
-    policy = NNPolicy(0, mock_brain, trainer_parameters, False, False)
+    policy = NNPolicy(seed, mock_brain, trainer_parameters, False, load)
+
+
+def test_load_save(dummy_config, tmp_path):
+    path1 = os.path.join(tmp_path, "runid1")
+    path2 = os.path.join(tmp_path, "runid2")
+    trainer_params = dummy_config
+    trainer_params["model_path"] = path1
+    policy = create_policy_mock(trainer_params)
+    policy.initialize_or_load()
+    policy.save_model(2000)
+
+    assert len(os.listdir(tmp_path)) > 0
+
+    # Try load from this path
+    policy2 = create_policy_mock(trainer_params, load=True, seed=1)
+    policy2.initialize_or_load()
+    _compare_two_policies(policy, policy2)
+
+    # Try initialize from path 1
+    trainer_params["model_path"] = path2
+    trainer_params["init_path"] = path1
+    policy3 = create_policy_mock(trainer_params, load=False, seed=2)
+    policy3.initialize_or_load()
+
+    _compare_two_policies(policy2, policy3)
+
+
+def _compare_two_policies(policy1: NNPolicy, policy2: NNPolicy) -> None:
+    """
+    Make sure two policies have the same output for the same input.
+    """
+    step = mb.create_batchedstep_from_brainparams(policy1.brain, num_agents=1)
+    run_out1 = policy1.evaluate(step, list(step.agent_id))
+    run_out2 = policy2.evaluate(step, list(step.agent_id))
+
+    np.testing.assert_array_equal(run_out2["log_probs"], run_out1["log_probs"])


@pytest.mark.parametrize("discrete", [True, False], ids=["discrete", "continuous"])
--- a/ml-agents/mlagents/trainers/tests/test_trainer_util.py
+++ b/ml-agents/mlagents/trainers/tests/test_trainer_util.py
    trainer_util.handle_existing_directories(model_path, summary_path, True, False)
    # Test try to train w/ force - should work
    trainer_util.handle_existing_directories(model_path, summary_path, False, True)
+
+    # Test initialize option
+    init_path = os.path.join(tmp_path, "runid2")
+    with pytest.raises(UnityTrainerException):
+        trainer_util.handle_existing_directories(
+            model_path, summary_path, False, True, init_path
+        )
+    os.mkdir(init_path)
+    # Should pass since the directory exists now.
+    trainer_util.handle_existing_directories(
+        model_path, summary_path, False, True, init_path
+    )
--- a/ml-agents/mlagents/trainers/trainer_util.py
+++ b/ml-agents/mlagents/trainers/trainer_util.py
        train_model: bool,
        load_model: bool,
        seed: int,
+        init_path: str = None,
        meta_curriculum: MetaCurriculum = None,
        multi_gpu: bool = False,
    ):
        self.model_path = model_path
+        self.init_path = init_path
        self.keep_checkpoints = keep_checkpoints
        self.train_model = train_model
        self.load_model = load_model
            self.load_model,
            self.ghost_controller,
            self.seed,
+            self.init_path,
            self.meta_curriculum,
            self.multi_gpu,
        )
    load_model: bool,
    ghost_controller: GhostController,
    seed: int,
+    init_path: str = None,
    meta_curriculum: MetaCurriculum = None,
    multi_gpu: bool = False,
 ) -> Trainer:
    :param load_model: Whether to load the model or randomly initialize
    :param ghost_controller: The object that coordinates ghost trainers
    :param seed: The random seed to use
+    :param init_path: Path from which to load model, if different from model_path.
    :param meta_curriculum: Optional meta_curriculum, used to determine a reward buffer length for PPOTrainer
    :return:
    """
    trainer_parameters["model_path"] = "{basedir}/{name}".format(
        basedir=model_path, name=brain_name
    )
+    if init_path is not None:
+        trainer_parameters["init_path"] = "{basedir}/{name}".format(
+            basedir=init_path, name=brain_name
+        )
    trainer_parameters["keep_checkpoints"] = keep_checkpoints
    if brain_name in trainer_config:
        _brain_key: Any = brain_name


 def handle_existing_directories(
-    model_path: str, summary_path: str, resume: bool, force: bool
+    model_path: str, summary_path: str, resume: bool, force: bool, init_path: str = None
 ) -> None:
    """
    Validates that if the run_id model exists, we do not overwrite it unless --force is specified.
    if model_path_exists:
        if not resume and not force:
            raise UnityTrainerException(
-                "Previous data from this run-id was found. "
-                "Either specify a new run-id, use --resume to resume this run, "
+                "Previous data from this run ID was found. "
+                "Either specify a new run ID, use --resume to resume this run, "
-                "Previous data from this run-id was not found. "
+                "Previous data from this run ID was not found. "
+
+    # Verify init path if specified.
+    if init_path is not None:
+        if not os.path.isdir(init_path):
+            raise UnityTrainerException(
+                "Could not initialize from {}. "
+                "Make sure models have already been saved with that run ID.".format(
+                    init_path
+                )
+            )