浏览代码

Rename Generalization -> Environment Parameter Randomization (#3646)

* Rename generalization to Environment Parameter Randomization
/bug-failed-api-check
GitHub 5 年前
当前提交
3a771afa
共有 8 个文件被更改,包括 181 次插入184 次删除
  1. 1
      com.unity.ml-agents/CHANGELOG.md
  2. 9
      docs/ML-Agents-Overview.md
  3. 2
      docs/Readme.md
  4. 4
      docs/Training-Curriculum-Learning.md
  5. 5
      docs/Training-ML-Agents.md
  6. 171
      docs/Training-Environment-Parameter-Randomization.md
  7. 173
      docs/Training-Generalized-Reinforcement-Learning-Agents.md
  8. 0
      /config/3dball_randomize.yaml

1
com.unity.ml-agents/CHANGELOG.md


### Minor Changes
- Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
## [0.15.0-preview] - 2020-03-18
### Major Changes

9
docs/ML-Agents-Overview.md


learn more about adding visual observations to an agent
[here](Learning-Environment-Design-Agents.md#multiple-visual-observations).
- **Training with Reset Parameter Sampling** - To train agents to be adapt
to changes in its environment (i.e., generalization), the agent should be exposed
to several variations of the environment. Similar to Curriculum Learning,
- **Training with Environment Parameter Randomization** - If an agent is exposed to several variations of an environment, it will be more robust (i.e. generalize better) to
unseen variations of the environment. Similar to Curriculum Learning,
a way to randomly sample Reset Parameters of the environment during training. See
[Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
a way to randomly sample parameters of the environment during training. See
[Training With Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
to learn more about this feature.
- **Cloud Training on AWS** - To facilitate using the ML-Agents toolkit on

2
docs/Readme.md


* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with LSTM](Feature-Memory.md)
* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
## Inference

4
docs/Training-Curriculum-Learning.md


measure by previous values.
* If `true`, weighting will be 0.75 (new) 0.25 (old).
* `parameters` (dictionary of key:string, value:float array) - Corresponds to
Academy reset parameters to control. Length of each array should be one
Environment parameters to control. Length of each array should be one
Once our curriculum is defined, we have to use the reset parameters we defined
Once our curriculum is defined, we have to use the environment parameters we defined
and modify the environment from the Agent's `OnEpisodeBegin()` function. See
[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
for an example.

5
docs/Training-ML-Agents.md


lessons for curriculum training. See [Curriculum
Training](Training-Curriculum-Learning.md) for more information.
* `--sampler=<file>`: Specify a sampler YAML file for defining the
sampler for generalization training. See [Generalization
Training](Training-Generalized-Reinforcement-Learning-Agents.md) for more information.
sampler for parameter randomization. See [Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md) for more information.
* `--keep-checkpoints=<n>`: Specify the maximum number of model checkpoints to
keep. Checkpoints are saved after the number of steps specified by the
`save-freq` option. Once the maximum number of checkpoints has been reached,

* [Using Recurrent Neural Networks](Feature-Memory.md)
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
You can also compare the
[example environments](Learning-Environment-Examples.md)

171
docs/Training-Environment-Parameter-Randomization.md


# Training With Environment Parameter Randomization
One of the challenges of training and testing agents on the same
environment is that the agents tend to overfit. The result is that the
agents are unable to generalize to any tweaks or variations in the environment.
This is analogous to a model being trained and tested on an identical dataset
in supervised learning. This becomes problematic in cases where environments
are instantiated with varying objects or properties.
To help agents robust and better generalizable to changes in the environment, the agent
can be trained over multiple variations of a given environment. We refer to this approach as **Environment Parameter Randomization**. For those familiar with Reinforcement Learning research, this approach is based on the concept of Domain Randomization (you can read more about it [here](https://arxiv.org/abs/1703.06907)). By using parameter randomization
during training, the agent can be better suited to adapt (with higher performance)
to future unseen variations of the environment.
_Example of variations of the 3D Ball environment._
Ball scale of 0.5 | Ball scale of 4
:-------------------------:|:-------------------------:
![](images/3dball_small.png) | ![](images/3dball_big.png)
To enable variations in the environments, we implemented `Environment Parameters`.
`Environment Parameters` are `Academy.Instance.FloatProperties` that can be read when setting
up the environment. We
also included different sampling methods and the ability to create new kinds of
sampling methods for each `Environment Parameter`. In the 3D ball environment example displayed
in the figure above, the environment parameters are `gravity`, `ball_mass` and `ball_scale`.
## How to Enable Environment Parameter Randomization
We first need to provide a way to modify the environment by supplying a set of `Environment Parameters`
and vary them over time. This provision can be done either deterministically or randomly.
This is done by assigning each `Environment Parameter` a `sampler-type`(such as a uniform sampler),
which determines how to sample an `Environment
Parameter`. If a `sampler-type` isn't provided for a
`Environment Parameter`, the parameter maintains the default value throughout the
training procedure, remaining unchanged. The samplers for all the `Environment Parameters`
are handled by a **Sampler Manager**, which also handles the generation of new
values for the environment parameters when needed.
To setup the Sampler Manager, we create a YAML file that specifies how we wish to
generate new samples for each `Environment Parameters`. In this file, we specify the samplers and the
`resampling-interval` (the number of simulation steps after which environment parameters are
resampled). Below is an example of a sampler file for the 3D ball environment.
```yaml
resampling-interval: 5000
mass:
sampler-type: "uniform"
min_value: 0.5
max_value: 10
gravity:
sampler-type: "multirange_uniform"
intervals: [[7, 10], [15, 20]]
scale:
sampler-type: "uniform"
min_value: 0.75
max_value: 3
```
Below is the explanation of the fields in the above example.
* `resampling-interval` - Specifies the number of steps for the agent to
train under a particular environment configuration before resetting the
environment with a new sample of `Environment Parameters`.
* `Environment Parameter` - Name of the `Environment Parameter` like `mass`, `gravity` and `scale`. This should match the name
specified in the `FloatProperties` of the environment being trained. If a parameter specified in the file doesn't exist in the
environment, then this parameter will be ignored. Within each `Environment Parameter`
* `sampler-type` - Specify the sampler type to use for the `Environment Parameter`.
This is a string that should exist in the `Sampler Factory` (explained
below).
* `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
In the example above, this would correspond to the `intervals`
under the `sampler-type` `"multirange_uniform"` for the `Environment Parameter` called `gravity`.
The key name should match the name of the corresponding argument in the sampler definition.
(See below)
The Sampler Manager allocates a sampler type for each `Environment Parameter` by using the *Sampler Factory*,
which maintains a dictionary mapping of string keys to sampler objects. The available sampler types
to be used for each `Environment Parameter` is available in the Sampler Factory.
### Included Sampler Types
Below is a list of included `sampler-type` as part of the toolkit.
* `uniform` - Uniform sampler
* Uniformly samples a single float value between defined endpoints.
The sub-arguments for this sampler to specify the interval
endpoints are as below. The sampling is done in the range of
[`min_value`, `max_value`).
* **sub-arguments** - `min_value`, `max_value`
* `gaussian` - Gaussian sampler
* Samples a single float value from the distribution characterized by
the mean and standard deviation. The sub-arguments to specify the
gaussian distribution to use are as below.
* **sub-arguments** - `mean`, `st_dev`
* `multirange_uniform` - Multirange uniform sampler
* Uniformly samples a single float value between the specified intervals.
Samples by first performing a weight pick of an interval from the list
of intervals (weighted based on interval width) and samples uniformly
from the selected interval (half-closed interval, same as the uniform
sampler). This sampler can take an arbitrary number of intervals in a
list in the following format:
[[`interval_1_min`, `interval_1_max`], [`interval_2_min`, `interval_2_max`], ...]
* **sub-arguments** - `intervals`
The implementation of the samplers can be found at `ml-agents-envs/mlagents_envs/sampler_class.py`.
### Defining a New Sampler Type
If you want to define your own sampler type, you must first inherit the *Sampler*
base class (included in the `sampler_class` file) and preserve the interface.
Once the class for the required method is specified, it must be registered in the Sampler Factory.
This can be done by subscribing to the *register_sampler* method of the SamplerFactory. The command
is as follows:
`SamplerFactory.register_sampler(*custom_sampler_string_key*, *custom_sampler_object*)`
Once the Sampler Factory reflects the new register, the new sampler type can be used for sample any
`Environment Parameter`. For example, lets say a new sampler type was implemented as below and we register
the `CustomSampler` class with the string `custom-sampler` in the Sampler Factory.
```python
class CustomSampler(Sampler):
def __init__(self, argA, argB, argC):
self.possible_vals = [argA, argB, argC]
def sample_all(self):
return np.random.choice(self.possible_vals)
```
Now we need to specify the new sampler type in the sampler YAML file. For example, we use this new
sampler type for the `Environment Parameter` *mass*.
```yaml
mass:
sampler-type: "custom-sampler"
argB: 1
argA: 2
argC: 3
```
### Training with Environment Parameter Randomization
After the sampler YAML file is defined, we proceed by launching `mlagents-learn` and specify
our configured sampler file with the `--sampler` flag. For example, if we wanted to train the
3D ball agent with parameter randomization using `Environment Parameters` with `config/3dball_randomize.yaml`
sampling setup, we would run
```sh
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_randomize.yaml
--run-id=3D-Ball-randomize --train
```
We can observe progress and metrics via Tensorboard.

173
docs/Training-Generalized-Reinforcement-Learning-Agents.md


# Training Generalized Reinforcement Learning Agents
One of the challenges of training and testing agents on the same
environment is that the agents tend to overfit. The result is that the
agents are unable to generalize to any tweaks or variations in the environment.
This is analogous to a model being trained and tested on an identical dataset
in supervised learning. This becomes problematic in cases where environments
are randomly instantiated with varying objects or properties.
To make agents robust and generalizable to different environments, the agent
should be trained over multiple variations of the environment. Using this approach
for training, the agent will be better suited to adapt (with higher performance)
to future unseen variations of the environment
_Example of variations of the 3D Ball environment._
Ball scale of 0.5 | Ball scale of 4
:-------------------------:|:-------------------------:
![](images/3dball_small.png) | ![](images/3dball_big.png)
## Introducing Generalization Using Reset Parameters
To enable variations in the environments, we implemented `Reset Parameters`.
`Reset Parameters` are `Academy.Instance.FloatProperties` that are used only when
resetting the environment. We
also included different sampling methods and the ability to create new kinds of
sampling methods for each `Reset Parameter`. In the 3D ball environment example displayed
in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`.
## How to Enable Generalization Using Reset Parameters
We first need to provide a way to modify the environment by supplying a set of `Reset Parameters`
and vary them over time. This provision can be done either deterministically or randomly.
This is done by assigning each `Reset Parameter` a `sampler-type`(such as a uniform sampler),
which determines how to sample a `Reset
Parameter`. If a `sampler-type` isn't provided for a
`Reset Parameter`, the parameter maintains the default value throughout the
training procedure, remaining unchanged. The samplers for all the `Reset Parameters`
are handled by a **Sampler Manager**, which also handles the generation of new
values for the reset parameters when needed.
To setup the Sampler Manager, we create a YAML file that specifies how we wish to
generate new samples for each `Reset Parameters`. In this file, we specify the samplers and the
`resampling-interval` (the number of simulation steps after which reset parameters are
resampled). Below is an example of a sampler file for the 3D ball environment.
```yaml
resampling-interval: 5000
mass:
sampler-type: "uniform"
min_value: 0.5
max_value: 10
gravity:
sampler-type: "multirange_uniform"
intervals: [[7, 10], [15, 20]]
scale:
sampler-type: "uniform"
min_value: 0.75
max_value: 3
```
Below is the explanation of the fields in the above example.
* `resampling-interval` - Specifies the number of steps for the agent to
train under a particular environment configuration before resetting the
environment with a new sample of `Reset Parameters`.
* `Reset Parameter` - Name of the `Reset Parameter` like `mass`, `gravity` and `scale`. This should match the name
specified in the academy of the intended environment for which the agent is
being trained. If a parameter specified in the file doesn't exist in the
environment, then this parameter will be ignored. Within each `Reset Parameter`
* `sampler-type` - Specify the sampler type to use for the `Reset Parameter`.
This is a string that should exist in the `Sampler Factory` (explained
below).
* `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
In the example above, this would correspond to the `intervals`
under the `sampler-type` `"multirange_uniform"` for the `Reset Parameter` called `gravity`.
The key name should match the name of the corresponding argument in the sampler definition.
(See below)
The Sampler Manager allocates a sampler type for each `Reset Parameter` by using the *Sampler Factory*,
which maintains a dictionary mapping of string keys to sampler objects. The available sampler types
to be used for each `Reset Parameter` is available in the Sampler Factory.
### Included Sampler Types
Below is a list of included `sampler-type` as part of the toolkit.
* `uniform` - Uniform sampler
* Uniformly samples a single float value between defined endpoints.
The sub-arguments for this sampler to specify the interval
endpoints are as below. The sampling is done in the range of
[`min_value`, `max_value`).
* **sub-arguments** - `min_value`, `max_value`
* `gaussian` - Gaussian sampler
* Samples a single float value from the distribution characterized by
the mean and standard deviation. The sub-arguments to specify the
gaussian distribution to use are as below.
* **sub-arguments** - `mean`, `st_dev`
* `multirange_uniform` - Multirange uniform sampler
* Uniformly samples a single float value between the specified intervals.
Samples by first performing a weight pick of an interval from the list
of intervals (weighted based on interval width) and samples uniformly
from the selected interval (half-closed interval, same as the uniform
sampler). This sampler can take an arbitrary number of intervals in a
list in the following format:
[[`interval_1_min`, `interval_1_max`], [`interval_2_min`, `interval_2_max`], ...]
* **sub-arguments** - `intervals`
The implementation of the samplers can be found at `ml-agents-envs/mlagents_envs/sampler_class.py`.
### Defining a New Sampler Type
If you want to define your own sampler type, you must first inherit the *Sampler*
base class (included in the `sampler_class` file) and preserve the interface.
Once the class for the required method is specified, it must be registered in the Sampler Factory.
This can be done by subscribing to the *register_sampler* method of the SamplerFactory. The command
is as follows:
`SamplerFactory.register_sampler(*custom_sampler_string_key*, *custom_sampler_object*)`
Once the Sampler Factory reflects the new register, the new sampler type can be used for sample any
`Reset Parameter`. For example, lets say a new sampler type was implemented as below and we register
the `CustomSampler` class with the string `custom-sampler` in the Sampler Factory.
```python
class CustomSampler(Sampler):
def __init__(self, argA, argB, argC):
self.possible_vals = [argA, argB, argC]
def sample_all(self):
return np.random.choice(self.possible_vals)
```
Now we need to specify the new sampler type in the sampler YAML file. For example, we use this new
sampler type for the `Reset Parameter` *mass*.
```yaml
mass:
sampler-type: "custom-sampler"
argB: 1
argA: 2
argC: 3
```
### Training with Generalization Using Reset Parameters
After the sampler YAML file is defined, we proceed by launching `mlagents-learn` and specify
our configured sampler file with the `--sampler` flag. For example, if we wanted to train the
3D ball agent with generalization using `Reset Parameters` with `config/3dball_generalize.yaml`
sampling setup, we would run
```sh
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml
--run-id=3D-Ball-generalization --train
```
We can observe progress and metrics via Tensorboard.

/config/3dball_generalize.yaml → /config/3dball_randomize.yaml

正在加载...
取消
保存