Rename Generalization -> Environment Parameter Randomization (#3646)

* Rename generalization to Environment Parameter Randomization
5 年前 · 3a771afa
--- a/com.unity.ml-agents/CHANGELOG.md
+++ b/com.unity.ml-agents/CHANGELOG.md

 ### Minor Changes
 - Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
+ - Renamed 'Generalization' feature to 'Environment Parameter Randomization'.

 ## [0.15.0-preview] - 2020-03-18
 ### Major Changes
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
  learn more about adding visual observations to an agent
  [here](Learning-Environment-Design-Agents.md#multiple-visual-observations).

- **Training with Reset Parameter Sampling** - To train agents to be adapt
-  to changes in its environment (i.e., generalization), the agent should be exposed
-  to several variations of the environment. Similar to Curriculum Learning,
+- **Training with Environment Parameter Randomization** - If an agent is exposed to several variations of an environment, it will be more robust (i.e. generalize better) to
+  unseen variations of the environment. Similar to Curriculum Learning,
-  a way to randomly sample Reset Parameters of the environment during training. See
-  [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
+  a way to randomly sample parameters of the environment during training. See
+  [Training With Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
  to learn more about this feature.

 - **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on
--- a/docs/Readme.md
+++ b/docs/Readme.md
 * [Training with Curriculum Learning](Training-Curriculum-Learning.md)
 * [Training with Imitation Learning](Training-Imitation-Learning.md)
 * [Training with LSTM](Feature-Memory.md)
-* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
+* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)

 ## Inference

--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md
  measure by previous values.
  * If `true`, weighting will be 0.75 (new) 0.25 (old).
 * `parameters` (dictionary of key:string, value:float array) - Corresponds to
-  Academy reset parameters to control. Length of each array should be one
+  Environment parameters to control. Length of each array should be one
-Once our curriculum is defined, we have to use the reset parameters we defined
+Once our curriculum is defined, we have to use the environment parameters we defined
 and modify the environment from the Agent's `OnEpisodeBegin()` function. See
 [WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
 for an example.
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
  lessons for curriculum training. See [Curriculum
  Training](Training-Curriculum-Learning.md) for more information.
 * `--sampler=<file>`: Specify a sampler YAML file for defining the
-  sampler for generalization training. See [Generalization
-  Training](Training-Generalized-Reinforcement-Learning-Agents.md) for more information.
+  sampler for parameter randomization. See [Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md) for more information.
 * `--keep-checkpoints=<n>`: Specify the maximum number of model checkpoints to
  keep. Checkpoints are saved after the number of steps specified by the
  `save-freq` option. Once the maximum number of checkpoints has been reached,
 * [Using Recurrent Neural Networks](Feature-Memory.md)
 * [Training with Curriculum Learning](Training-Curriculum-Learning.md)
 * [Training with Imitation Learning](Training-Imitation-Learning.md)
-* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
+* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)

 You can also compare the
 [example environments](Learning-Environment-Examples.md)
--- a/docs/Training-Environment-Parameter-Randomization.md
+++ b/docs/Training-Environment-Parameter-Randomization.md
+# Training With Environment Parameter Randomization
+
+One of the challenges of training and testing agents on the same
+environment is that the agents tend to overfit. The result is that the
+agents are unable to generalize to any tweaks or variations in the environment.
+This is analogous to a model being trained and tested on an identical dataset
+in supervised learning. This becomes problematic in cases where environments
+are instantiated with varying objects or properties.
+
+To help agents robust and better generalizable to changes in the environment, the agent
+can be trained over multiple variations of a given environment. We refer to this approach as **Environment Parameter Randomization**. For those familiar with Reinforcement Learning research, this approach is based on the concept of Domain Randomization (you can read more about it [here](https://arxiv.org/abs/1703.06907)). By using parameter randomization
+during training, the agent can be better suited to adapt (with higher performance)
+to future unseen variations of the environment.
+
+_Example of variations of the 3D Ball environment._
+
+Ball scale of 0.5          |  Ball scale of 4
+:-------------------------:|:-------------------------:
+![](images/3dball_small.png)  |  ![](images/3dball_big.png)
+
+
+To enable variations in the environments, we implemented `Environment Parameters`.
+`Environment Parameters` are `Academy.Instance.FloatProperties` that can be read when setting
+up the environment. We
+also included different sampling methods and the ability to create new kinds of
+sampling methods for each `Environment Parameter`. In the 3D ball environment example displayed
+in the figure above, the environment parameters are `gravity`, `ball_mass` and `ball_scale`.
+
+
+## How to Enable Environment Parameter Randomization
+
+We first need to provide a way to modify the environment by supplying a set of `Environment Parameters`
+and vary them over time. This provision can be done either deterministically or randomly.
+
+This is done by assigning each `Environment Parameter` a `sampler-type`(such as a uniform sampler),
+which determines how to sample an `Environment
+Parameter`. If a `sampler-type` isn't provided for a
+`Environment Parameter`, the parameter maintains the default value throughout the
+training procedure, remaining unchanged. The samplers for all the `Environment Parameters`
+are handled by a **Sampler Manager**, which also handles the generation of new
+values for the environment parameters when needed.
+
+To setup the Sampler Manager, we create a YAML file that specifies how we wish to
+generate new samples for each `Environment Parameters`. In this file, we specify the samplers and the
+`resampling-interval` (the number of simulation steps after which environment parameters are
+resampled). Below is an example of a sampler file for the 3D ball environment.
+
+```yaml
+resampling-interval: 5000
+
+mass:
+    sampler-type: "uniform"
+    min_value: 0.5
+    max_value: 10
+
+gravity:
+    sampler-type: "multirange_uniform"
+    intervals: [[7, 10], [15, 20]]
+
+scale:
+    sampler-type: "uniform"
+    min_value: 0.75
+    max_value: 3
+
+```
+
+Below is the explanation of the fields in the above example.
+
+* `resampling-interval` - Specifies the number of steps for the agent to
+train under a particular environment configuration before resetting the
+environment with a new sample of `Environment Parameters`.
+
+* `Environment Parameter` - Name of the `Environment Parameter` like `mass`, `gravity` and `scale`. This should match the name
+specified in the `FloatProperties` of the environment being trained. If a parameter specified in the file doesn't exist in the
+environment, then this parameter will be ignored.  Within each `Environment Parameter`
+
+    * `sampler-type` - Specify the sampler type to use for the `Environment Parameter`.
+    This is a string that should exist in the `Sampler Factory` (explained
+    below).
+
+    * `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
+    In the example above, this would correspond to the `intervals`
+    under the `sampler-type` `"multirange_uniform"` for the `Environment Parameter` called `gravity`.
+    The key name should match the name of the corresponding argument in the sampler definition.
+    (See below)
+
+The Sampler Manager allocates a sampler type for each `Environment Parameter` by using the *Sampler Factory*,
+which maintains a dictionary mapping of string keys to sampler objects. The available sampler types
+to be used for each `Environment Parameter` is available in the Sampler Factory.
+
+### Included Sampler Types
+
+Below is a list of included `sampler-type` as part of the toolkit.
+
+* `uniform` - Uniform sampler
+    *   Uniformly samples a single float value between defined endpoints.
+        The sub-arguments for this sampler to specify the interval
+        endpoints are as below. The sampling is done in the range of
+        [`min_value`, `max_value`).
+
+    * **sub-arguments** - `min_value`, `max_value`
+
+* `gaussian` - Gaussian sampler
+    *   Samples a single float value from the distribution characterized by
+        the mean and standard deviation. The sub-arguments to specify the
+        gaussian distribution to use are as below.
+
+    * **sub-arguments** - `mean`, `st_dev`
+
+* `multirange_uniform` - Multirange uniform sampler
+    *   Uniformly samples a single float value between the specified intervals.
+        Samples by first performing a weight pick of an interval from the list
+        of intervals (weighted based on interval width) and samples uniformly
+        from the selected interval (half-closed interval, same as the uniform
+        sampler). This sampler can take an arbitrary number of intervals in a
+        list in the following format:
+    [[`interval_1_min`, `interval_1_max`], [`interval_2_min`, `interval_2_max`], ...]
+
+    * **sub-arguments** - `intervals`
+
+The implementation of the samplers can be found at `ml-agents-envs/mlagents_envs/sampler_class.py`.
+
+### Defining a New Sampler Type
+
+If you want to define your own sampler type, you must first inherit the *Sampler*
+base class (included in the `sampler_class` file) and preserve the interface.
+Once the class for the required method is specified, it must be registered in the Sampler Factory.
+
+This can be done by subscribing to the *register_sampler* method of the SamplerFactory. The command
+is as follows:
+
+`SamplerFactory.register_sampler(*custom_sampler_string_key*, *custom_sampler_object*)`
+
+Once the Sampler Factory reflects the new register, the new sampler type can be used for sample any
+`Environment Parameter`. For example, lets say a new sampler type was implemented as below and we register
+the `CustomSampler` class with the string `custom-sampler` in the Sampler Factory.
+
+```python
+class CustomSampler(Sampler):
+
+    def __init__(self, argA, argB, argC):
+        self.possible_vals = [argA, argB, argC]
+
+    def sample_all(self):
+        return np.random.choice(self.possible_vals)
+```
+
+Now we need to specify the new sampler type in the sampler YAML file. For example, we use this new
+sampler type for the `Environment Parameter` *mass*.
+
+```yaml
+mass:
+    sampler-type: "custom-sampler"
+    argB: 1
+    argA: 2
+    argC: 3
+```
+
+### Training with Environment Parameter Randomization
+
+After the sampler YAML file is defined, we proceed by launching `mlagents-learn` and specify
+our configured sampler file with the `--sampler` flag. For example, if we wanted to train the
+3D ball agent with parameter randomization using `Environment Parameters` with `config/3dball_randomize.yaml`
+sampling setup, we would run
+
+```sh
+mlagents-learn config/trainer_config.yaml --sampler=config/3dball_randomize.yaml
+--run-id=3D-Ball-randomize --train
+```
+
+We can observe progress and metrics via Tensorboard.
--- a/docs/Training-Generalized-Reinforcement-Learning-Agents.md
+++ b/docs/Training-Generalized-Reinforcement-Learning-Agents.md
-# Training Generalized Reinforcement Learning Agents
-
-One of the challenges of training and testing agents on the same
-environment is that the agents tend to overfit. The result is that the
-agents are unable to generalize to any tweaks or variations in the environment.
-This is analogous to a model being trained and tested on an identical dataset
-in supervised learning. This becomes problematic in cases where environments
-are randomly instantiated with varying objects or properties.
-
-To make agents robust and generalizable to different environments, the agent
-should be trained over multiple variations of the environment. Using this approach
-for training, the agent will be better suited to adapt (with higher performance)
-to future unseen variations of the environment
-
-_Example of variations of the 3D Ball environment._
-
-Ball scale of 0.5          |  Ball scale of 4
-:-------------------------:|:-------------------------:
-![](images/3dball_small.png)  |  ![](images/3dball_big.png)
-
-## Introducing Generalization Using Reset Parameters
-
-To enable variations in the environments, we implemented `Reset Parameters`.
-`Reset Parameters` are `Academy.Instance.FloatProperties` that are used only when
-resetting the environment. We
-also included different sampling methods and the ability to create new kinds of
-sampling methods for each `Reset Parameter`. In the 3D ball environment example displayed
-in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`.
-
-
-## How to Enable Generalization Using Reset Parameters
-
-We first need to provide a way to modify the environment by supplying a set of `Reset Parameters`
-and vary them over time. This provision can be done either deterministically or randomly.
-
-This is done by assigning each `Reset Parameter` a `sampler-type`(such as a uniform sampler),
-which determines how to sample a `Reset
-Parameter`. If a `sampler-type` isn't provided for a
-`Reset Parameter`, the parameter maintains the default value throughout the
-training procedure, remaining unchanged. The samplers for all the `Reset Parameters`
-are handled by a **Sampler Manager**, which also handles the generation of new
-values for the reset parameters when needed.
-
-To setup the Sampler Manager, we create a YAML file that specifies how we wish to
-generate new samples for each `Reset Parameters`. In this file, we specify the samplers and the
-`resampling-interval` (the number of simulation steps after which reset parameters are
-resampled). Below is an example of a sampler file for the 3D ball environment.
-
-```yaml
-resampling-interval: 5000
-
-mass:
-    sampler-type: "uniform"
-    min_value: 0.5
-    max_value: 10
-
-gravity:
-    sampler-type: "multirange_uniform"
-    intervals: [[7, 10], [15, 20]]
-
-scale:
-    sampler-type: "uniform"
-    min_value: 0.75
-    max_value: 3
-
-```
-
-Below is the explanation of the fields in the above example.
-
-* `resampling-interval` - Specifies the number of steps for the agent to
-train under a particular environment configuration before resetting the
-environment with a new sample of `Reset Parameters`.
-
-* `Reset Parameter` - Name of the `Reset Parameter` like `mass`, `gravity` and `scale`. This should match the name
-specified in the academy of the intended environment for which the agent is
-being trained. If a parameter specified in the file doesn't exist in the
-environment, then this parameter will be ignored.  Within each `Reset Parameter`
-
-    * `sampler-type` - Specify the sampler type to use for the `Reset Parameter`.
-    This is a string that should exist in the `Sampler Factory` (explained
-    below).
-
-    * `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
-    In the example above, this would correspond to the `intervals`
-    under the `sampler-type` `"multirange_uniform"` for the `Reset Parameter` called `gravity`.
-    The key name should match the name of the corresponding argument in the sampler definition.
-    (See below)
-
-The Sampler Manager allocates a sampler type for each `Reset Parameter` by using the *Sampler Factory*,
-which maintains a dictionary mapping of string keys to sampler objects. The available sampler types
-to be used for each `Reset Parameter` is available in the Sampler Factory.
-
-### Included Sampler Types
-
-Below is a list of included `sampler-type` as part of the toolkit.
-
-* `uniform` - Uniform sampler
-    *   Uniformly samples a single float value between defined endpoints.
-        The sub-arguments for this sampler to specify the interval
-        endpoints are as below. The sampling is done in the range of
-        [`min_value`, `max_value`).
-
-    * **sub-arguments** - `min_value`, `max_value`
-
-* `gaussian` - Gaussian sampler
-    *   Samples a single float value from the distribution characterized by
-        the mean and standard deviation. The sub-arguments to specify the
-        gaussian distribution to use are as below.
-
-    * **sub-arguments** - `mean`, `st_dev`
-
-* `multirange_uniform` - Multirange uniform sampler
-    *   Uniformly samples a single float value between the specified intervals.
-        Samples by first performing a weight pick of an interval from the list
-        of intervals (weighted based on interval width) and samples uniformly
-        from the selected interval (half-closed interval, same as the uniform
-        sampler). This sampler can take an arbitrary number of intervals in a
-        list in the following format:
-    [[`interval_1_min`, `interval_1_max`], [`interval_2_min`, `interval_2_max`], ...]
-
-    * **sub-arguments** - `intervals`
-
-The implementation of the samplers can be found at `ml-agents-envs/mlagents_envs/sampler_class.py`.
-
-### Defining a New Sampler Type
-
-If you want to define your own sampler type, you must first inherit the *Sampler*
-base class (included in the `sampler_class` file) and preserve the interface.
-Once the class for the required method is specified, it must be registered in the Sampler Factory.
-
-This can be done by subscribing to the *register_sampler* method of the SamplerFactory. The command
-is as follows:
-
-`SamplerFactory.register_sampler(*custom_sampler_string_key*, *custom_sampler_object*)`
-
-Once the Sampler Factory reflects the new register, the new sampler type can be used for sample any
-`Reset Parameter`. For example, lets say a new sampler type was implemented as below and we register
-the `CustomSampler` class with the string `custom-sampler` in the Sampler Factory.
-
-```python
-class CustomSampler(Sampler):
-
-    def __init__(self, argA, argB, argC):
-        self.possible_vals = [argA, argB, argC]
-
-    def sample_all(self):
-        return np.random.choice(self.possible_vals)
-```
-
-Now we need to specify the new sampler type in the sampler YAML file. For example, we use this new
-sampler type for the `Reset Parameter` *mass*.
-
-```yaml
-mass:
-    sampler-type: "custom-sampler"
-    argB: 1
-    argA: 2
-    argC: 3
-```
-
-### Training with Generalization Using Reset Parameters
-
-After the sampler YAML file is defined, we proceed by launching `mlagents-learn` and specify
-our configured sampler file with the `--sampler` flag. For example, if we wanted to train the
-3D ball agent with generalization using `Reset Parameters` with `config/3dball_generalize.yaml`
-sampling setup, we would run
-
-```sh
-mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml
--run-id=3D-Ball-generalization --train
-```
-
-We can observe progress and metrics via Tensorboard.
--- a//config/3dball_generalize.yaml
+++ b//config/3dball_generalize.yaml