Fix docs for Generalization (#2334)

* Fix naming conventions for consistency * Add generalization link to ML-Agents Overview * Add generalization to main Readme * Include types of samplers available for use
6 年前 · 00a3b592
--- a/README.md
+++ b/README.md
 * 10+ sample Unity environments
 * Support for multiple environment configurations and training scenarios
 * Train memory-enhanced agents using deep reinforcement learning
-* Easily definable Curriculum Learning scenarios
+* Easily definable Curriculum Learning and Generalization scenarios
 * Broadcasting of agent behavior for supervised learning
 * Built-in support for Imitation Learning
 * Flexible agent control with On Demand Decision Making
 [submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
 make sure to include as much detail as possible.

-Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to [let us know about it](https://github.com/Unity-Technologies/ml-agents/issues/1454). 
+Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to [let us know about it](https://github.com/Unity-Technologies/ml-agents/issues/1454).
-team at ml-agents@unity3d.com. 
+team at ml-agents@unity3d.com.

 ## Translations

--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
 Link](https://youtu.be/kpb8ZkMBFYs).

 ML-Agents provides ways to both learn directly from demonstrations as well as
-use demonstrations to help speed up reward-based training. The
+use demonstrations to help speed up reward-based training, and two algorithms to do
+so (Generative Adversarial Imitation Learning and Behavioral Cloning). The
 [Training with Imitation Learning](Training-Imitation-Learning.md) tutorial
 covers these features in more depth.

  particularly when debugging agent behaviors. You can learn more about using
  the broadcasting feature
  [here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
+
+- **Training with Environment Parameter Sampling** - To train agents to be robust
+  to changes in its environment (i.e., generalization), the agent should be exposed
+  to a variety of environment variations. Similarly to Curriculum Learning, which
+  allows environments to get more difficult as the agent learns, we also provide
+  a way to randomly resample aspects of the environment during training. See
+  [Training with Environment Parameter Sampling](Training-Generalization-Learning.md)
+  to learn more about this feature.

 - **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without
  installing Python or TensorFlow directly, we provide a
--- a/docs/Training-Generalization-Learning.md
+++ b/docs/Training-Generalization-Learning.md
 _Variations of the 3D Ball environment._

 To vary environments, we first decide what parameters to vary in an
-environment. These parameters are known as `Reset Parameters`. In the 3D ball 
-environment example displayed in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`.
+environment. We call these parameters `Reset Parameters`. In the 3D ball 
+environment example displayed in the figure above, the reset parameters are 
+`gravity`, `ball_mass` and `ball_scale`.


 ## How-to
 This is done by assigning each reset parameter a sampler, which samples a reset
 parameter value (such as a uniform sampler). If a sampler isn't provided for a
 reset parameter, the parameter maintains the default value throughout the 
-training, remaining unchanged. The samplers for all the reset parameters are
-handled by a **Sampler Manager**, which also handles the generation of new 
+training procedure, remaining unchanged. The samplers for all the reset parameters 
+are handled by a **Sampler Manager**, which also handles the generation of new 
-`resampling-duration` (number of simulation steps after which reset parameters are 
+`resampling-interval` (number of simulation steps after which reset parameters are 
-episode-length: 5000
+resampling-interval: 5000

 mass:
    sampler-type: "uniform"

 ```

-* `resampling-duration` (int) - Specifies the number of steps for agent to 
+* `resampling-interval` (int) - Specifies the number of steps for agent to 
 train under a particular environment configuration before resetting the 
 environment with a new sample of reset parameters.

    key under the `multirange_uniform` sampler for the gravity reset parameter. 
    The key name should match the name of the corresponding argument in the sampler definition. (Look at defining a new sampler method)

+
+#### Possible Sampler Types
+
+The currently implemented samplers that can be used with the `sampler-type` arguments are:
+
+* `uniform` - Uniform sampler
+    *   Uniformly samples a single float value between defined endpoints. 
+        The sub-arguments for this sampler to specify the interval 
+        endpoints are as below. The sampling is done in the range of 
+        [`min_value`, `max_value`).
+
+    * **sub-arguments** - `min_value`, `max_value`
+
+* `gaussian` - Gaussian sampler 
+    *   Samples a single float value from the distribution characterized by
+        the mean and standard deviation. The sub-arguments to specify the 
+        gaussian distribution to use are as below.
+
+    * **sub-arguments** - `mean`, `st_dev`
+
+* `multirange_uniform` - Multirange Uniform sampler
+    *   Uniformly samples a single float value between the specified intervals. 
+        Samples by first performing a weight pick of an interval from the list 
+        of intervals (weighted based on interval width) and samples uniformly 
+        from the selected interval (half-closed interval, same as the uniform 
+        sampler). This sampler can take an arbitrary number of intervals in a 
+        list in the following format: 
+    [[`interval_1_min`, `interval_1_max`], [`interval_2_min`, `interval_2_max`], ...]
+    
+    * **sub-arguments** - `intervals`
+
+
 The implementation of the samplers can be found at `ml-agents-envs/mlagents/envs/sampler_class.py`.

 ### Defining a new sampler method

 ### Training with Generalization Learning

-We first begin with setting up the sampler file. After the sampler file is defined and configured, we proceed by launching `mlagents-learn` and specify our configured sampler file with the `--sampler` flag. To demonstrate, if we wanted to train a 3D ball agent with generalization using the `config/generalization-test.yaml` sampling setup, we can run
+We first begin with setting up the sampler file. After the sampler file is defined and configured, we proceed by launching `mlagents-learn` and specify our configured sampler file with the `--sampler` flag. To demonstrate, if we wanted to train a 3D ball agent with generalization using the `config/3dball_generalize.yaml` sampling setup, we can run
-mlagents-learn config/trainer_config.yaml --sampler=config/generalize_test.yaml --run-id=3D-Ball-generalization --train
+mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml --run-id=3D-Ball-generalization --train
 ```

 We can observe progress and metrics via Tensorboard.
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
 * [Training with PPO](Training-PPO.md)
 * [Using Recurrent Neural Networks](Feature-Memory.md)
 * [Training with Curriculum Learning](Training-Curriculum-Learning.md)
-* [Training with Generalization](Training-Generalization-Learning.md)
+* [Training with Environment Parameter Sampling](Training-Generalization-Learning.md)
 * [Training with Imitation Learning](Training-Imitation-Learning.md)

 You can also compare the
--- a//config/3dball_generalize.yaml
+++ b//config/3dball_generalize.yaml