[skip ci] fixing property decorator in sac

[skip ci] adding should_still_train method to ppo
fixing should_still_train call in rl_trainer.py
--- a/README.md
+++ b/README.md

 # Unity ML-Agents Toolkit

-[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_2_docs/docs/)
+[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_3_docs/docs/)

 [![license badge](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)


 ## Releases & Documentation

-**Our latest, stable release is `Release 2`. Click
-[here](https://github.com/Unity-Technologies/ml-agents/tree/release_2_docs/docs/Readme.md)
+**Our latest, stable release is `Release 3`. Click
+[here](https://github.com/Unity-Technologies/ml-agents/tree/release_3_docs/docs/Readme.md)
 to get started with the latest release of ML-Agents.**

 The table below lists all our releases, including our `master` branch which is
--- a/config/ppo/3DBall.yaml
+++ b/config/ppo/3DBall.yaml
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
-    max_steps: 500000
+    max_steps: 100000
    time_horizon: 1000
    summary_freq: 12000
    threaded: true
--- a/config/sac/3DBall.yaml
+++ b/config/sac/3DBall.yaml
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
-    max_steps: 500000
+    max_steps: 100000
    time_horizon: 1000
    summary_freq: 12000
    threaded: true
--- a/docs/Installation-Anaconda-Windows.md
+++ b/docs/Installation-Anaconda-Windows.md
 the ml-agents Conda environment by typing `activate ml-agents`)_:

 ```sh
-git clone --branch release_2 https://github.com/Unity-Technologies/ml-agents.git
+git clone --branch release_3 https://github.com/Unity-Technologies/ml-agents.git
-The `--branch release_2` option will switch to the tag of the latest stable
+The `--branch release_3` option will switch to the tag of the latest stable
 release. Omitting that will get the `master` branch which is potentially
 unstable.

--- a/docs/Installation.md
+++ b/docs/Installation.md
 of our tutorials / guides assume you have access to our example environments).

 ```sh
-git clone --branch release_2 https://github.com/Unity-Technologies/ml-agents.git
+git clone --branch release_3 https://github.com/Unity-Technologies/ml-agents.git
-The `--branch release_2` option will switch to the tag of the latest stable
+The `--branch release_3` option will switch to the tag of the latest stable
 release. Omitting that will get the `master` branch which is potentially
 unstable.

 ML-Agents Toolkit for your purposes. If you plan to contribute those changes
-back, make sure to clone the `master` branch (by omitting `--branch release_2`
+back, make sure to clone the `master` branch (by omitting `--branch release_3`
 from the command above). See our
 [Contributions Guidelines](../com.unity.ml-agents/CONTRIBUTING.md) for more
 information on contributing to the ML-Agents Toolkit.
 #### Advanced: Local Installation for Development

 You can [add the local](https://docs.unity3d.com/Manual/upm-ui-local.html)
-`com.unity.ml-agents` package (from the repository that you just cloned) to our
+`com.unity.ml-agents` package (from the repository that you just cloned) to your
 project by:

 1. navigating to the menu `Window` -> `Package Manager`.
--- a/docs/Training-on-Amazon-Web-Service.md
+++ b/docs/Training-on-Amazon-Web-Service.md
 2. Clone the ML-Agents repo and install the required Python packages

   ```sh
-   git clone --branch release_2 https://github.com/Unity-Technologies/ml-agents.git
+   git clone --branch release_3 https://github.com/Unity-Technologies/ml-agents.git
   cd ml-agents/ml-agents/
   pip3 install -e .
   ```
--- a/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
+++ b/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
 [unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
 [unity inference engine]: https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
 [package manager documentation]: https://docs.unity3d.com/Manual/upm-ui-install.html
-[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Installation.md
+[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Installation.md
 [github repository]: https://github.com/Unity-Technologies/ml-agents
 [python package]: https://github.com/Unity-Technologies/ml-agents
 [execution order of event functions]: https://docs.unity3d.com/Manual/ExecutionOrder.html
--- a/com.unity.ml-agents/Runtime/Academy.cs
+++ b/com.unity.ml-agents/Runtime/Academy.cs
 * API. For more information on each of these entities, in addition to how to
 * set-up a learning environment and train the behavior of characters in a
 * Unity scene, please browse our documentation pages on GitHub:
- * https://github.com/Unity-Technologies/ml-agents/tree/release_2_docs/docs/
+ * https://github.com/Unity-Technologies/ml-agents/tree/release_3_docs/docs/
 */

 namespace Unity.MLAgents
    /// fall back to inference or heuristic decisions. (You can also set agents to always use
    /// inference or heuristics.)
    /// </remarks>
-    [HelpURL("https://github.com/Unity-Technologies/ml-agents/tree/release_2_docs/" +
+    [HelpURL("https://github.com/Unity-Technologies/ml-agents/tree/release_3_docs/" +
        "docs/Learning-Environment-Design.md")]
    public class Academy : IDisposable
    {
--- a/com.unity.ml-agents/Runtime/Agent.cs
+++ b/com.unity.ml-agents/Runtime/Agent.cs
    /// [OnDisable()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnDisable.html]
    /// [OnBeforeSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnBeforeSerialize.html
    /// [OnAfterSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnAfterSerialize.html
-    /// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md
-    /// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design.md
+    /// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md
+    /// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design.md
-    /// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Readme.md
+    /// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Readme.md
-    [HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/" +
+    [HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/" +
        "docs/Learning-Environment-Design-Agents.md")]
    [Serializable]
    [RequireComponent(typeof(BehaviorParameters))]
        /// for information about mixing reward signals from curiosity and Generative Adversarial
        /// Imitation Learning (GAIL) with rewards supplied through this method.
        ///
-        /// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#rewards
-        /// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
+        /// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#rewards
+        /// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
        /// </remarks>
        /// <param name="reward">The new value of the reward.</param>
        public void SetReward(float reward)
        /// for information about mixing reward signals from curiosity and Generative Adversarial
        /// Imitation Learning (GAIL) with rewards supplied through this method.
        ///
-        /// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#rewards
-        /// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
+        /// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#rewards
+        /// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
        ///</remarks>
        /// <param name="increment">Incremental reward value.</param>
        public void AddReward(float increment)
        /// implementing a simple heuristic function can aid in debugging agent actions and interactions
        /// with its environment.
        ///
-        /// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#recording-demonstrations
-        /// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#recording-demonstrations
+        /// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
        /// </remarks>
        /// <example>
        /// For more information about observations, see [Observations and Sensors].
        ///
        /// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
-        /// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
+        /// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
        /// </remarks>
        public virtual void CollectObservations(VectorSensor sensor)
        {
        ///
        /// See [Agents - Actions] for more information on masking actions.
        ///
-        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// </remarks>
        /// <seealso cref="OnActionReceived(float[])"/>
        public virtual void CollectDiscreteActionMasks(DiscreteActionMasker actionMasker)
        ///
        /// For more information about implementing agent actions see [Agents - Actions].
        ///
-        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// </remarks>
        /// <param name="vectorAction">
        /// An array containing the action vector. The length of the array is specified
--- a/com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
+++ b/com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
    /// See [Imitation Learning - Recording Demonstrations] for more information.
    ///
    /// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
-    /// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs//Learning-Environment-Design-Agents.md#recording-demonstrations
+    /// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs//Learning-Environment-Design-Agents.md#recording-demonstrations
    /// </remarks>
    [RequireComponent(typeof(Agent))]
    [AddComponentMenu("ML Agents/Demonstration Recorder", (int)MenuGroup.Default)]
--- a/com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
+++ b/com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
        ///
        /// See [Agents - Actions] for more information on masking actions.
        ///
-        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// </remarks>
        /// <param name="branch">The branch for which the actions will be masked.</param>
        /// <param name="actionIndices">The indices of the masked actions.</param>
--- a/ml-agents/mlagents/tf_utils/tf.py
+++ b/ml-agents/mlagents/tf_utils/tf.py
 # Everywhere else is caught by the banned-modules setting for flake8
 import tensorflow as tf  # noqa I201
 from distutils.version import LooseVersion
+import horovod.tensorflow as hvd


 # LooseVersion handles things "1.2.3a" or "4.5.6-rc7" fairly sensibly.
    """
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
+    config.gpu_options.visible_device_list = str(hvd.local_rank())
    # For multi-GPU training, set allow_soft_placement to True to allow
    # placing the operation into an alternative device automatically
    # to prevent from exceptions if the device doesn't suppport the operation
--- a/ml-agents/mlagents/trainers/stats.py
+++ b/ml-agents/mlagents/trainers/stats.py
 from mlagents_envs.logging_util import get_logger
 from mlagents_envs.timers import set_gauge
 from mlagents.tf_utils import tf, generate_session_config
+import horovod.tensorflow as hvd


 logger = get_logger(__name__)
    ) -> None:
        is_training = "Not Training."
        if "Is Training" in values:
-            stats_summary = stats_summary = values["Is Training"]
+            stats_summary = values["Is Training"]
+            rank = hvd.rank()
+                "Horovod Rank: {}. "
+                    rank,
                    category,
                    step,
                    time.time() - self.training_start_time,
--- a/ml-agents/mlagents/trainers/learn.py
+++ b/ml-agents/mlagents/trainers/learn.py
    add_metadata as add_timer_metadata,
 )
 from mlagents_envs import logging_util
+import horovod.tensorflow as hvd

 logger = logging_util.get_logger(__name__)

        sampler_manager, resampling_interval = create_sampler_manager(
            options.parameter_randomization, run_seed
        )
+        hvd.init()
        trainer_factory = TrainerFactory(
            options.behaviors,
            write_path,
--- a/ml-agents/mlagents/trainers/optimizer/tf_optimizer.py
+++ b/ml-agents/mlagents/trainers/optimizer/tf_optimizer.py
 from mlagents.trainers.settings import TrainerSettings, RewardSignalType
 from mlagents.trainers.components.bc.module import BCModule

+try:
+    import horovod.tensorflow as hvd
+except ImportError:
+    hvd = None
+

 class TFOptimizer(Optimizer):  # pylint: disable=W0223
    def __init__(self, policy: TFPolicy, trainer_params: TrainerSettings):
    def create_optimizer_op(
        self, learning_rate: tf.Tensor, name: str = "Adam"
    ) -> tf.train.Optimizer:
-        return tf.train.AdamOptimizer(learning_rate=learning_rate, name=name)
+        if hvd is not None:
+            adam_optimizer = tf.train.AdamOptimizer(
+                learning_rate=learning_rate, name=name
+            )
+            horovod_optimizer = hvd.DistributedOptimizer(adam_optimizer)
+        else:
+            adam_optimizer = tf.train.AdamOptimizer(
+                learning_rate=learning_rate, name=name
+            )
+        return horovod_optimizer if hvd is not None else adam_optimizer

    def _execute_model(
        self, feed_dict: Dict[tf.Tensor, np.ndarray], out_dict: Dict[str, tf.Tensor]
--- a/ml-agents/mlagents/trainers/policy/tf_policy.py
+++ b/ml-agents/mlagents/trainers/policy/tf_policy.py
 from mlagents.trainers.settings import TrainerSettings, NetworkSettings
 from mlagents.trainers.brain import BrainParameters
 from mlagents.trainers import __version__
+import horovod.tensorflow as hvd


 logger = get_logger(__name__)
            self._load_graph(self.model_path, reset_global_steps=reset_steps)
        else:
            self._initialize_graph()
+        self.sess.run(hvd.broadcast_global_variables(0))

    def get_weights(self):
        with self.graph.as_default():
        :param steps: The number of steps the model was trained for
        :return:
        """
+        if hvd.rank() != 0:
+            return
        with self.graph.as_default():
            last_checkpoint = os.path.join(self.model_path, f"model-{steps}.ckpt")
            self.saver.save(self.sess, last_checkpoint)
--- a/ml-agents/mlagents/trainers/trainer_controller.py
+++ b/ml-agents/mlagents/trainers/trainer_controller.py
 from mlagents.trainers.agent_processor import AgentManager
 from mlagents.trainers.settings import CurriculumSettings
 from mlagents.trainers.training_status import GlobalTrainingStatus, StatusType
+import horovod.tensorflow as hvd


 class TrainerController(object):
        """
        Saves current model to checkpoint folder.
        """
+        if hvd.rank() != 0:
+            return
+
        for brain_name in self.trainers.keys():
            for name_behavior_id in self.brain_name_to_identifier[brain_name]:
                self.trainers[brain_name].save_model(name_behavior_id)
        """
        Exports latest saved models to .nn format for Unity embedding.
        """
+        if hvd.rank() != 0:
+            return
+
        for brain_name in self.trainers.keys():
            for name_behavior_id in self.brain_name_to_identifier[brain_name]:
                self.trainers[brain_name].export_model(name_behavior_id)
--- a/ml-agents/mlagents/trainers/ppo/trainer.py
+++ b/ml-agents/mlagents/trainers/ppo/trainer.py
        Uses demonstration_buffer to update the policy.
        The reward signal generators must be updated in this method at their own pace.
        """
-        buffer_length = self.update_buffer.num_experiences
+
+        self._maybe_write_summary(self.get_step + self.hyperparameters.buffer_size)
+        self._maybe_save_model(self.get_step + self.hyperparameters.buffer_size)
+        self._increment_step(self.hyperparameters.buffer_size, self.brain_name)

        # Make sure batch_size is a multiple of sequence length. During training, we
        # will need to reshape the data into a batch_size x sequence_length tensor.
        for _ in range(num_epoch):
            self.update_buffer.shuffle(sequence_length=self.policy.sequence_length)
            buffer = self.update_buffer
-            max_num_batch = buffer_length // batch_size
+            max_num_batch = self.hyperparameters.buffer_size // batch_size
            for i in range(0, max_num_batch * batch_size, batch_size):
                update_stats = self.optimizer.update(
                    buffer.make_mini_batch(i, i + batch_size), n_sequences
--- a/ml-agents/mlagents/trainers/sac/trainer.py
+++ b/ml-agents/mlagents/trainers/sac/trainer.py
            )
        )

+    @property
+    def should_still_train(self) -> bool:
+        """
+        Returns whether or not the trainer should train. A Trainer could
+        stop training if it wasn't training to begin with, or if max_steps
+        is reached.
+        """
+        return self.is_training and self.steps_per_update * self.update_steps <= \
+               self.get_max_steps
+
-        super()._process_trajectory(trajectory)
+        self._maybe_write_summary(self.get_step + len(trajectory.steps))
+        self._maybe_save_model(self.get_step + len(trajectory.steps))
+        self._increment_step(len(trajectory.steps), trajectory.behavior_id)
+
        last_step = trajectory.steps[-1]
        agent_id = trajectory.agent_id  # All the agents should have the same ID

        while (
            self.step - self.hyperparameters.buffer_init_steps
        ) / self.update_steps > self.steps_per_update:
+
            logger.debug("Updating SAC policy at step {}".format(self.step))
            buffer = self.update_buffer
            if self.update_buffer.num_experiences >= self.hyperparameters.batch_size:
--- a/ml-agents/mlagents/trainers/trainer/rl_trainer.py
+++ b/ml-agents/mlagents/trainers/trainer/rl_trainer.py
        """
        return False

-    @abc.abstractmethod
    def _update_policy(self) -> bool:
        """
        Uses demonstration_buffer to update model.
        Takes a trajectory and processes it, putting it into the update buffer.
        :param trajectory: The Trajectory tuple containing the steps to be processed.
        """
-        self._maybe_write_summary(self.get_step + len(trajectory.steps))
-        self._maybe_save_model(self.get_step + len(trajectory.steps))
-        self._increment_step(len(trajectory.steps), trajectory.behavior_id)
+        pass

    def _maybe_write_summary(self, step_after_process: int) -> None:
        """
        """
        if self._next_summary_step == 0:  # Don't write out the first one
            self._next_summary_step = self._get_next_interval_step(self.summary_freq)
-        if step_after_process >= self._next_summary_step and self.get_step != 0:
+        if step_after_process >= self._next_summary_step:
            self._write_summary(self._next_summary_step)

    def _maybe_save_model(self, step_after_process: int) -> None:
作者	SHA1	备注	提交日期
Anupam Bhatnagar	392a84f1	[skip ci] fixing property decorator in sac	4 年前
Anupam Bhatnagar	8b6c19ae	[skip ci] adding should_still_train method to ppo	4 年前
Anupam Bhatnagar	0aedad7c	fixing should_still_train call in rl_trainer.py	4 年前
Anupam Bhatnagar	26dc42e5	[skip ci]	4 年前
Anupam Bhatnagar	a4567f27	[skip ci] restore process trajectory super calls	4 年前
Anupam Bhatnagar	f7a3c06e	[skip ci] updating sac	4 年前
Anupam Bhatnagar	beb7cb74	[skip ci] adding gpu config settings	4 年前
Anupam Bhatnagar	4afd8f92	first commit	4 年前