[refactor] Move configuration files to single YAML file (#3791)

5 年前 · f86fc81d
--- a/com.unity.ml-agents/CHANGELOG.md
+++ b/com.unity.ml-agents/CHANGELOG.md
  C# style conventions. All public fields and properties now use "PascalCase"
  instead of "camelCase"; for example, `Agent.maxStep` was renamed to
  `Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
+- Curriculum and Parameter Randomization configurations have been merged
+  into the main training configuration file. Note that this means training
+  configuration files are now environment-specific. (#3791)
 - Update Barracuda to 0.7.0-preview which has breaking namespace and assembly name changes.
 - Training artifacts (trained models, summaries) are now found in the `results/`
  directory. (#3829)
--- a/docs/Feature-Memory.md
+++ b/docs/Feature-Memory.md

 ## How to use

-When configuring the trainer parameters in the `config/trainer_config.yaml`
+When configuring the trainer parameters in the config YAML
 file, add the following parameters to the Behavior you want to use.

 ```json
--- a/docs/Getting-Started.md
+++ b/docs/Getting-Started.md
 1. Navigate to the folder where you cloned the `ml-agents` repository. **Note**:
   If you followed the default [installation](Installation.md), then you should
   be able to run `mlagents-learn` from any directory.
-1. Run `mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun`.
-   - `config/trainer_config.yaml` is the path to a default training
-     configuration file that we provide. In includes training configurations for
-     all our example environments, including 3DBall.
+1. Run `mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun`.
+   - `config/ppo/3DBall.yaml` is the path to a default training
+     configuration file that we provide. The `config/ppo` folder includes training configuration
+     files for all our example environments, including 3DBall.
   - `run-id` is a unique name for this training session.
 1. When the message _"Start training by pressing the Play button in the Unity
   Editor"_ is displayed on the screen, you can press the :arrow_forward: button
 the same command again, appending the `--resume` flag:

 ```sh
-mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun --resume
+mlagents-learn config/ppo/3DBall.yaml --run-id=firstRun --resume
 ```

 Your trained model will be at `results/<run-identifier>/<behavior_name>.nn` where
--- a/docs/Learning-Environment-Create-New.md
+++ b/docs/Learning-Environment-Create-New.md
 and include the following hyperparameter values:

 ```yml
-RollerBall:
-  trainer: ppo
-  batch_size: 10
-  beta: 5.0e-3
-  buffer_size: 100
-  epsilon: 0.2
-  hidden_units: 128
-  lambd: 0.95
-  learning_rate: 3.0e-4
-  learning_rate_schedule: linear
-  max_steps: 5.0e4
-  normalize: false
-  num_epoch: 3
-  num_layers: 2
-  time_horizon: 64
-  summary_freq: 10000
-  use_recurrent: false
-  reward_signals:
-    extrinsic:
-      strength: 1.0
-      gamma: 0.99
+behaviors:
+  RollerBall:
+    trainer: ppo
+    batch_size: 10
+    beta: 5.0e-3
+    buffer_size: 100
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 3.0e-4
+    learning_rate_schedule: linear
+    max_steps: 5.0e4
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 64
+    summary_freq: 10000
+    use_recurrent: false
+    reward_signals:
+        extrinsic:
+        strength: 1.0
+        gamma: 0.99
 ```

 Since this example creates a very simple training environment with only a few
--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md
    does not train with the provided default training parameters.**
 - Float Properties: None
 - Benchmark Mean Reward: 0.7
-  - To speed up training, you can enable curiosity by adding the `curiosity`
-    reward signal in `config/trainer_config.yaml`
+  - To train this environment, you can enable curiosity by adding the `curiosity` reward signal
+    in `config/ppo/Hallway.yaml`

 ## Bouncer

--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md
 the directory where you installed the ML-Agents Toolkit, run:

 ```sh
-mlagents-learn ../config/trainer_config.yaml --env=3DBall --run-id=firstRun
+mlagents-learn ../config/ppo/3DBall.yaml --env=3DBall --run-id=firstRun
-ml-agents$ mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run
+ml-agents$ mlagents-learn config/ppo/3DBall.yaml --env=3DBall --run-id=first-run


                        ▄▄▄▓▓▓▓
 latest checkpoint. (**Note:** There is a known bug on Windows that causes the
 saving of the model to fail when you early terminate the training, it's
 recommended to wait until Step has reached the max_steps parameter you set in
-trainer_config.yaml.) You can now embed this trained model into your Agent by
+your config YAML.) You can now embed this trained model into your Agent by
 following the steps below:

 1. Move your model file into
--- a/docs/Migrating.md
+++ b/docs/Migrating.md
 - `WriteAdapter` was renamed to `ObservationWriter`. (#3834)
 - Training artifacts (trained models, summaries) are now found under `results/`
  instead of `summaries/` and `models/`.
+- Trainer configuration, curriculum configuration, and parameter randomization
+  configuration have all been moved to a single YAML file. (#3791)
+

 ### Steps to Migrate

 - Update uses of "camelCase" fields and properties to "PascalCase".
 - If you have a custom `ISensor` implementation, you will need to change the signature of
  its `Write()` method to use `ObservationWriter` instead of `WriteAdapter`.
+- Before upgrading, copy your `Behavior Name` sections from `trainer_config.yaml` into
+  a separate trainer configuration file, under a `behaviors` section. You can move the `default` section too
+  if it's being used. This file should be specific to your environment, and not contain configurations for
+  multiple environments (unless they have the same Behavior Names).
+  - If your training uses [curriculum](Training-Curriculum-Learning.md), move those configurations under
+  the `Behavior Name` section.
+  - If your training uses [parameter randomization](Training-Environment-Parameter-Randomization.md), move
+  the contents of the sampler config to `parameter_randomization` in the main trainer configuration.

 ## Migrating from 0.14 to 0.15

 - Multiply `max_steps` and `summary_freq` in your `trainer_config.yaml` by the
  number of Agents in the scene.
 - Combine curriculum configs into a single file. See
-  [the WallJump curricula](../config/curricula/wall_jump.yaml) for an example of
+  [the WallJump curricula](https://github.com/Unity-Technologies/ml-agents/blob/0.14.1/config/curricula/wall_jump.yaml) for an example of
  the new curriculum config format. A tool like https://www.json2yaml.com may be
  useful to help with the conversion.
 - If you have a model trained which uses RayPerceptionSensor and has non-1.0

 - It is now required to specify the path to the yaml trainer configuration file
  when running `mlagents-learn`. For an example trainer configuration file, see
-  [trainer_config.yaml](../config/trainer_config.yaml). An example of passing a
+  [trainer_config.yaml](https://github.com/Unity-Technologies/ml-agents/blob/0.5.0a/config/trainer_config.yaml). An example of passing a
  trainer configuration to `mlagents-learn` is shown above.
 - The environment name is now passed through the `--env` option.
 - Curriculum learning has been changed. Refer to the
--- a/docs/Reward-Signals.md
+++ b/docs/Reward-Signals.md

 ## Enabling Reward Signals

-Reward signals, like other hyperparameters, are defined in the trainer config `.yaml` file. An
-example is provided in `config/trainer_config.yaml` and `config/gail_config.yaml`. To enable a reward signal, add it to the
+Reward signals, like other hyperparameters, are defined in the trainer config `.yaml` file. Examples of config files
+are provided in `config/ppo/` and `config/imitation/`. To enable a reward signal, add it to the
 `reward_signals:` section under the behavior name. For instance, to enable the extrinsic signal
 in addition to a small curiosity reward and a GAIL reward signal, you would define your `reward_signals` as follows:

--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md
 the height of the wall is what varies. We define this as a `Environment Parameters`
 that can be accessed in `Academy.Instance.EnvironmentParameters`, and by doing
 so it becomes adjustable via the Python API.
-Rather than adjusting it by hand, we will create a YAML file which
+Rather than adjusting it by hand, we will add a section to our YAML configuration file that
-curricula for the Wall Jump environment.
+curricula for the Wall Jump environment. You can find the full file in `config/ppo/WallJump_curriculum.yaml`.
-BigWallJump:
-  measure: progress
-  thresholds: [0.1, 0.3, 0.5]
-  min_lesson_length: 100
-  signal_smoothing: true
-  parameters:
-    big_wall_min_height: [0.0, 4.0, 6.0, 8.0]
-    big_wall_max_height: [4.0, 7.0, 8.0, 8.0]
-SmallWallJump:
-  measure: progress
-  thresholds: [0.1, 0.3, 0.5]
-  min_lesson_length: 100
-  signal_smoothing: true
-  parameters:
-    small_wall_height: [1.5, 2.0, 2.5, 4.0]
+behaviors:
+  BigWallJump:
+    trainer: ppo
+    ... # The rest of the hyperparameters
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    curriculum: # Add this section for curriculum
+      measure: progress
+      thresholds: [0.1, 0.3, 0.5]
+      min_lesson_length: 100
+      signal_smoothing: true
+      parameters:
+        big_wall_min_height: [0.0, 4.0, 6.0, 8.0]
+        big_wall_max_height: [4.0, 7.0, 8.0, 8.0]
+
+  SmallWallJump:
+    trainer: ppo
+    ... # The rest of the hyperparameters
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    curriculum: # Add this section for curriculum
+      measure: progress
+      thresholds: [0.1, 0.3, 0.5]
+      min_lesson_length: 100
+      signal_smoothing: true
+      parameters:
+        small_wall_height: [1.5, 2.0, 2.5, 4.0]
-At the top level of the config is the behavior name. Note that this must be the
+For each Behavior Name described in your training configuration file, we can specify a curriculum
+by adding a `curriculum:` section under that particular Behavior Name.
+Note that these must be the
- The curriculum for each
+The curriculum for each
 behavior has the following parameters:
 * `measure` - What to measure learning progress, and advancement in lessons by.
  * `reward` - Uses a measure received reward.
  cumulative reward of the last `100` episodes exceeds the current threshold.
  The mean reward logged to the console is dictated by the `summary_freq`
  parameter in the
-  [trainer configuration file](Training-ML-Agents.md#training-config-file).
+  [training configuration file](Training-ML-Agents.md#training-config-file).
 * `signal_smoothing` (true/false) - Whether to weight the current progress
  measure by previous values.
  * If `true`, weighting will be 0.75 (new) 0.25 (old).
 to train agents in the Wall Jump environment with curriculum learning, you can run:

 ```sh
-mlagents-learn config/trainer_config.yaml --curriculum=config/curricula/wall_jump.yaml --run-id=wall-jump-curriculum
+mlagents-learn config/ppo/WallJump_curriculum.yaml --run-id=wall-jump-curriculum
 ```

 You can then keep track of the current lessons and progresses via TensorBoard.
--- a/docs/Training-Environment-Parameter-Randomization.md
+++ b/docs/Training-Environment-Parameter-Randomization.md
 are handled by a **Sampler Manager**, which also handles the generation of new
 values for the environment parameters when needed.

-To setup the Sampler Manager, we create a YAML file that specifies how we wish to
-generate new samples for each `Environment Parameters`. In this file, we specify the samplers and the
+To setup the Sampler Manager, we edit our [training configuration file](Training-ML-Agents.md#training-config-file).
+Add a `parameter_randomization` section that specifies how we wish to generate new samples for each `Environment
+Parameters`. In this section, we specify the samplers and the
-resampled). Below is an example of a sampler file for the 3D ball environment.
+resampled). Below is an example of a sampler file for the 3D ball environment. The full file is provided in
+`config/ppo/3DBall_randomize.yaml`.
-resampling-interval: 5000
+behaviors:
+  # Trainer hyperparameters
+
+# New section
+parameter_randomization:
+    resampling-interval: 5000
-mass:
-    sampler-type: "uniform"
-    min_value: 0.5
-    max_value: 10
+    mass:
+        sampler-type: "uniform"
+        min_value: 0.5
+        max_value: 10
-gravity:
-    sampler-type: "multirange_uniform"
-    intervals: [[7, 10], [15, 20]]
+    gravity:
+        sampler-type: "multirange_uniform"
+        intervals: [[7, 10], [15, 20]]
-scale:
-    sampler-type: "uniform"
-    min_value: 0.75
-    max_value: 3
+    scale:
+        sampler-type: "uniform"
+        min_value: 0.75
+        max_value: 3

 ```

        return np.random.choice(self.possible_vals)
 ```

-Now we need to specify the new sampler type in the sampler YAML file. For example, we use this new
+Now we need to specify the new sampler type in the trainer configuration file. For example, we use this new
 sampler type for the `Environment Parameter` *mass*.

 ```yaml

 ### Training with Environment Parameter Randomization

-After the sampler YAML file is defined, we proceed by launching `mlagents-learn` and specify
-our configured sampler file with the `--sampler` flag. For example, if we wanted to train the
-3D ball agent with parameter randomization using `Environment Parameters` with `config/3dball_randomize.yaml`
-sampling setup, we would run
+After the parameter variations are defined in the training config file, we proceed by launching the file with
+`mlagents-learn` as usual. For example, if we wanted to train the
+3D ball agent with parameter randomization using `Environment Parameters` as specified in
+`config/ppo/3DBall_randomize.yaml` sampling setup, we would run
-mlagents-learn config/trainer_config.yaml --sampler=config/3dball_randomize.yaml
--run-id=3D-Ball-randomize
+mlagents-learn config/ppo/3DBall_randomize.yaml --run-id=3D-Ball-randomize
-We can observe progress and metrics via Tensorboard.
+We can observe progress and metrics via Tensorboard as usual.
--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md
       width="375" border="10" />
 </p>

-You can then specify the path to this file as the `demo_path` in your `trainer_config.yaml` file
+You can then specify the path to this file as the `demo_path` in your
+[training configuration file](Training-ML-Agents.md#training-config-file).
 when using BC or GAIL. For instance, for BC:

 ```
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
 This section offers a detailed guide into how to manage the different training
 set-ups withing the toolkit.

-The training config files `config/trainer_config.yaml`,
-`config/sac_trainer_config.yaml`, `config/gail_config.yaml` and
-`config/offline_bc_config.yaml` specifies the training method, the
-hyperparameters, and a few additional values to use when training with Proximal
-Policy Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial
-Imitation Learning) with PPO/SAC, and Behavioral Cloning(BC)/Imitation with
-PPO/SAC. These files are divided into sections. The **default** section defines
-the default values for all the available training with PPO, SAC, GAIL (with
-PPO), and BC. These files are divided into sections. The **default** section
-defines the default values for all the available settings. You can also add new
-sections to override these defaults to train specific Behaviors. Name each of
-these override sections after the appropriate `Behavior Name`. Sections for the
-example environments are included in the provided config file.
+For each training run, create a YAML file that contains the the training method and the
+hyperparameters for each of the Behaviors found in your environment. Example files for
+Policy Optimization (PPO) and Soft Actor-Critic (SAC) are provided in `config/ppo/` and
+`config/sac/`, respectively. Examples for imitation learning through GAIL (Generative Adversarial
+Imitation Learning) and Behavioral Cloning (BC) can be found in `config/imitiation/`.
+
+Each file is divided into sections. The `behaviors` section defines the hyperparameters
+for each Behavior found in your environment. A section should be created for each `Behavior Name`.
+The available parameters for PPO and SAC are listed below. Alternatively, if there are many
+different Behaviors that all use similar hyperparameters, you can create a `default` behavior name
+that specifies all hyperparameters that are not specified in the Behavior-specific sections.
+To use [Curriculum Learning](Training-Curriculum-Learning.md) for a particular Behavior, add a
+section under that `Behavior Name` called `curriculum`.
+See the [Curriculum Learning](Training-Curriculum-Learning.md) page for more information.
+
+To use Parameter Randomization, add a `parameter_randomization` section in the configuration
+file. See the [Parameter Randomization](Training-Environment-Parameter-Randomization.md) docs
+for more information.

 \*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral
 Cloning (Imitation), GAIL = Generative Adversarial Imitation Learning

 You can also compare the
 [example environments](Learning-Environment-Examples.md) to the corresponding
-sections of the `config/trainer_config.yaml` file for each example to see how
-the hyperparameters and other configuration variables have been changed from the
-defaults.
+files in the `config/ppo/` file for each example to see how
+the hyperparameters and other configuration variables have been changed from environment to environment.
--- a/docs/Training-Using-Concurrent-Unity-Instances.md
+++ b/docs/Training-Using-Concurrent-Unity-Instances.md

 ### Buffer Size

-If you are having trouble getting an agent to train, even with multiple concurrent Unity instances, you could increase  `buffer_size` in the `config/trainer_config.yaml` file. A common practice is to multiply `buffer_size` by `num-envs`.
+If you are having trouble getting an agent to train, even with multiple concurrent Unity instances, you could increase  `buffer_size` in the [training configuration file](Training-ML-Agents.md#training-config-file). A common practice is to multiply `buffer_size` by `num-envs`.

 ### Resource Constraints

--- a/gym-unity/README.md
+++ b/gym-unity/README.md

 We provide results from our PPO implementation and the DQN from Baselines as reference.
 Note that all runs used the same greyscale GridWorld as Dopamine. For PPO, `num_layers`
-was set to 2, and all other hyperparameters are the default for GridWorld in `trainer_config.yaml`.
+was set to 2, and all other hyperparameters are the default for GridWorld in `config/ppo/GridWorld.yaml`.
 For Baselines DQN, the provided hyperparameters in the previous section are used. Note
 that Baselines implements certain features (e.g. dueling-Q) that are not enabled
 in Dopamine DQN.
--- a/ml-agents/mlagents/trainers/learn.py
+++ b/ml-agents/mlagents/trainers/learn.py
    load_config,
    TrainerFactory,
    handle_existing_directories,
+    assemble_curriculum_config,
 )
 from mlagents.trainers.stats import (
    TensorboardWriter,
 )
 from mlagents_envs.environment import UnityEnvironment
 from mlagents.trainers.sampler_class import SamplerManager
-from mlagents.trainers.exception import SamplerException
+from mlagents.trainers.exception import SamplerException, TrainerConfigError
 from mlagents_envs.base_env import BaseEnv
 from mlagents.trainers.subprocess_env_manager import SubprocessEnvManager
 from mlagents_envs.side_channel.side_channel import SideChannel
        help="Path to the Unity executable to train",
    )
    argparser.add_argument(
-        "--curriculum",
-        default=None,
-        dest="curriculum_config_path",
-        help="YAML file for defining the lessons for curriculum training",
-    )
-    argparser.add_argument(
-    )
-    argparser.add_argument(
-        "--sampler",
-        default=None,
-        dest="sampler_file_path",
-        help="YAML file for defining the sampler for environment parameter randomization",
    )
    argparser.add_argument(
        "--keep-checkpoints",


 class RunOptions(NamedTuple):
-    trainer_config: Dict
+    behaviors: Dict
    debug: bool = parser.get_default("debug")
    seed: int = parser.get_default("seed")
    env_path: Optional[str] = parser.get_default("env_path")
    lesson: int = parser.get_default("lesson")
    no_graphics: bool = parser.get_default("no_graphics")
    multi_gpu: bool = parser.get_default("multi_gpu")
-    sampler_config: Optional[Dict] = None
+    parameter_randomization: Optional[Dict] = None
    env_args: Optional[List[str]] = parser.get_default("env_args")
    cpu: bool = parser.get_default("cpu")
    width: int = parser.get_default("width")
          configs loaded from files.
        """
        argparse_args = vars(args)
-        trainer_config_path = argparse_args["trainer_config_path"]
-        curriculum_config_path = argparse_args["curriculum_config_path"]
-        argparse_args["trainer_config"] = load_config(trainer_config_path)
-        if curriculum_config_path is not None:
-            argparse_args["curriculum_config"] = load_config(curriculum_config_path)
-        if argparse_args["sampler_file_path"] is not None:
-            argparse_args["sampler_config"] = load_config(
-                argparse_args["sampler_file_path"]
+        config_path = argparse_args["trainer_config_path"]
+        # Load YAML and apply overrides as needed
+        yaml_config = load_config(config_path)
+        try:
+            argparse_args["behaviors"] = yaml_config["behaviors"]
+        except KeyError:
+            raise TrainerConfigError(
+                "Trainer configurations not found. Make sure your YAML file has a section for behaviors."
+
+        argparse_args["parameter_randomization"] = yaml_config.get(
+            "parameter_randomization", None
+        )
-        argparse_args.pop("sampler_file_path")
-        argparse_args.pop("curriculum_config_path")
+
        return RunOptions(**vars(args))


            capture_frame_rate=options.capture_frame_rate,
        )
        env_manager = SubprocessEnvManager(env_factory, engine_config, options.num_envs)
+        curriculum_config = assemble_curriculum_config(options.behaviors)
-            options.curriculum_config, env_manager, options.lesson
+            curriculum_config, env_manager, options.lesson
-            options.sampler_config, run_seed
+            options.parameter_randomization, run_seed
-            options.trainer_config,
+            options.behaviors,
            options.run_id,
            write_path,
            options.keep_checkpoints,
 def try_create_meta_curriculum(
    curriculum_config: Optional[Dict], env: SubprocessEnvManager, lesson: int
 ) -> Optional[MetaCurriculum]:
-    if curriculum_config is None:
+    if curriculum_config is None or len(curriculum_config) <= 0:
        return None
    else:
        meta_curriculum = MetaCurriculum(curriculum_config)
--- a/ml-agents/mlagents/trainers/tests/test_learn.py
+++ b/ml-agents/mlagents/trainers/tests/test_learn.py
 import pytest
+import yaml
 from unittest.mock import MagicMock, patch, mock_open
 from mlagents.trainers import learn
 from mlagents.trainers.trainer_controller import TrainerController
    return parse_command_line(args)


+MOCK_YAML = """
+    behaviors:
+        {}
+    """
+
+MOCK_SAMPLER_CURRICULUM_YAML = """
+    behaviors:
+        behavior1:
+            curriculum:
+                curriculum1
+        behavior2:
+            curriculum:
+                curriculum2
+
+    parameter_randomization:
+        sampler1
+    """
+
+
@patch("mlagents.trainers.learn.write_timing_tree")
@patch("mlagents.trainers.learn.write_run_options")
@patch("mlagents.trainers.learn.handle_existing_directories")
    mock_env.external_brain_names = []
    mock_env.academy_name = "TestAcademyName"
    create_environment_factory.return_value = mock_env
-    trainer_config_mock = MagicMock()
-    load_config.return_value = trainer_config_mock
+    load_config.return_value = yaml.safe_load(MOCK_YAML)

    mock_init = MagicMock(return_value=None)
    with patch.object(TrainerController, "__init__", mock_init):
        )


-@patch("builtins.open", new_callable=mock_open, read_data="{}")
+@patch("builtins.open", new_callable=mock_open, read_data=MOCK_YAML)
 def test_commandline_args(mock_file):

    # No args raises
    # Test with defaults
    opt = parse_command_line(["mytrainerpath"])
-    assert opt.trainer_config == {}
+    assert opt.behaviors == {}
-    assert opt.curriculum_config is None
-    assert opt.sampler_config is None
+    assert opt.parameter_randomization is None
    assert opt.keep_checkpoints == 5
    assert opt.lesson == 0
    assert opt.resume is False
    full_args = [
        "mytrainerpath",
        "--env=./myenvfile",
-        "--curriculum=./mycurriculum",
-        "--sampler=./mysample",
        "--keep-checkpoints=42",
        "--lesson=3",
        "--resume",
    ]

    opt = parse_command_line(full_args)
-    assert opt.trainer_config == {}
+    assert opt.behaviors == {}
-    assert opt.curriculum_config == {}
-    assert opt.sampler_config == {}
+    assert opt.parameter_randomization is None
    assert opt.keep_checkpoints == 42
    assert opt.lesson == 3
    assert opt.run_id == "myawesomerun"
    assert opt.resume is True


-@patch("builtins.open", new_callable=mock_open, read_data="{}")
+@patch("builtins.open", new_callable=mock_open, read_data=MOCK_SAMPLER_CURRICULUM_YAML)
+def test_sampler_configs(mock_file):
+    opt = parse_command_line(["mytrainerpath"])
+    assert opt.parameter_randomization == "sampler1"
+
+
+@patch("builtins.open", new_callable=mock_open, read_data=MOCK_YAML)
 def test_env_args(mock_file):
    full_args = [
        "mytrainerpath",
--- a/ml-agents/mlagents/trainers/tests/test_trainer_util.py
+++ b/ml-agents/mlagents/trainers/tests/test_trainer_util.py
 from unittest.mock import patch

 from mlagents.trainers import trainer_util
-from mlagents.trainers.trainer_util import load_config, _load_config
+from mlagents.trainers.trainer_util import (
+    load_config,
+    _load_config,
+    assemble_curriculum_config,
+)
 from mlagents.trainers.ppo.trainer import PPOTrainer
 from mlagents.trainers.exception import TrainerConfigError, UnityTrainerException
 from mlagents.trainers.brain import BrainParameters
    with pytest.raises(TrainerConfigError):
        fp = io.StringIO(file_contents)
        _load_config(fp)
+
+
+def test_assemble_curriculum_config():
+    file_contents = """
+behavior1:
+    curriculum:
+        foo: 5
+behavior2:
+    curriculum:
+        foo: 6
+    """
+    trainer_config = _load_config(file_contents)
+    curriculum_config = assemble_curriculum_config(trainer_config)
+    assert curriculum_config == {"behavior1": {"foo": 5}, "behavior2": {"foo": 6}}
+
+    # Check that nothing is returned if no curriculum.
+    file_contents = """
+behavior1:
+    foo: 3
+behavior2:
+    foo: 4
+    """
+    trainer_config = _load_config(file_contents)
+    curriculum_config = assemble_curriculum_config(trainer_config)
+    assert curriculum_config == {}
+
+    # Check that method doesn't break if 1st level entity isn't a dict.
+    # Note: this is a malformed configuration.
+    file_contents = """
+behavior1: 3
+behavior2: 4
+    """
+    trainer_config = _load_config(file_contents)
+    curriculum_config = assemble_curriculum_config(trainer_config)
+    assert curriculum_config == {}


 def test_existing_directories(tmp_path):
--- a/ml-agents/mlagents/trainers/trainer_util.py
+++ b/ml-agents/mlagents/trainers/trainer_util.py
    """
    if "default" not in trainer_config and brain_name not in trainer_config:
        raise TrainerConfigError(
-            f'Trainer config must have either a "default" section, or a section for the brain name ({brain_name}). '
-            "See config/trainer_config.yaml for an example."
+            f'Trainer config must have either a "default" section, or a section for the brain name {brain_name}. '
+            "See the config/ directory for examples."
        )

    trainer_parameters = trainer_config.get("default", {}).copy()
        while not isinstance(trainer_config[_brain_key], dict):
            _brain_key = trainer_config[_brain_key]
        trainer_parameters.update(trainer_config[_brain_key])
+
+    if init_path is not None:
+        trainer_parameters["init_path"] = "{basedir}/{name}".format(
+            basedir=init_path, name=brain_name
+        )

    min_lesson_length = 1
    if meta_curriculum:
            "Error parsing yaml file. Please check for formatting errors. "
            "A tool such as http://www.yamllint.com/ can be helpful with this."
        ) from e
+
+
+def assemble_curriculum_config(trainer_config: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Assembles a curriculum config Dict from a trainer config. The resulting
+    dictionary should have a mapping of {brain_name: config}, where config is another
+    Dict that
+    :param trainer_config: Dict of trainer configurations (keys are brain_names).
+    :return: Dict of curriculum configurations. Returns empty dict if none are found.
+    """
+    curriculum_config: Dict[str, Any] = {}
+    for behavior_name, behavior_config in trainer_config.items():
+        # Don't try to iterate non-Dicts. This probably means your config is malformed.
+        if isinstance(behavior_config, dict) and "curriculum" in behavior_config:
+            curriculum_config[behavior_name] = behavior_config["curriculum"]
+    return curriculum_config


 def handle_existing_directories(
--- a/ml-agents/tests/yamato/training_int_tests.py
+++ b/ml-agents/tests/yamato/training_int_tests.py
    # Copy the default training config but override the max_steps parameter,
    # and reduce the batch_size and buffer_size enough to ensure an update step happens.
    override_config_file(
-        "config/trainer_config.yaml",
+        "config/ppo/3DBall.yaml",
        "override.yaml",
        max_steps=100,
        batch_size=10,
--- a/ml-agents/tests/yamato/yamato_utils.py
+++ b/ml-agents/tests/yamato/yamato_utils.py
    """
    with open(src_path) as f:
        configs = yaml.safe_load(f)
+        behavior_configs = configs["behaviors"]
-    for config in configs.values():
+    for config in behavior_configs.values():
        config.update(**kwargs)

    with open(dest_path, "w") as f:
--- a/config/imitation/CrawlerStatic.yaml
+++ b/config/imitation/CrawlerStatic.yaml
+behaviors:
+  CrawlerStatic:
+    trainer: ppo
+    batch_size: 2024
+    beta: 0.005
+    buffer_size: 20240
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    max_steps: 1e7
+    memory_size: 256
+    normalize: true
+    num_epoch: 3
+    num_layers: 3
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    use_recurrent: false
+    reward_signals:
+      gail:
+        strength: 1.0
+        gamma: 0.99
+        encoding_size: 128
+        demo_path: Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo
+    behavioral_cloning:
+      demo_path: Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo
+      strength: 0.5
+      steps: 50000
--- a/config/imitation/FoodCollector.yaml
+++ b/config/imitation/FoodCollector.yaml
+behaviors:
+  FoodCollector:
+    trainer: ppo
+    batch_size: 64
+    beta: 0.005
+    buffer_size: 10240
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 0.0003
+    max_steps: 2.0e6
+    memory_size: 256
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 32
+    summary_freq: 10000
+    use_recurrent: false
+    reward_signals:
+      gail:
+        strength: 0.1
+        gamma: 0.99
+        encoding_size: 128
+        demo_path: Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo
+    behavioral_cloning:
+      demo_path: Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo
+      strength: 1.0
+      steps: 0
--- a/config/imitation/Hallway.yaml
+++ b/config/imitation/Hallway.yaml
+behaviors:
+  Hallway:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.01
+    buffer_size: 1024
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 0.0003
+    max_steps: 1.0e7
+    memory_size: 256
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: true
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+      gail:
+        strength: 0.1
+        gamma: 0.99
+        encoding_size: 128
+        demo_path: Project/Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo
--- a/config/imitation/PushBlock.yaml
+++ b/config/imitation/PushBlock.yaml
+behaviors:
+  PushBlock:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.01
+    buffer_size: 2048
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    max_steps: 1.5e7
+    memory_size: 256
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 60000
+    use_recurrent: false
+    reward_signals:
+      gail:
+        strength: 1.0
+        gamma: 0.99
+        encoding_size: 128
+        demo_path: Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo
--- a/config/imitation/Pyramids.yaml
+++ b/config/imitation/Pyramids.yaml
+behaviors:
+  Pyramids:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.01
+    buffer_size: 2048
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    max_steps: 1.0e7
+    memory_size: 256
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 30000
+    use_recurrent: false
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+      curiosity:
+        strength: 0.02
+        gamma: 0.99
+        encoding_size: 256
+      gail:
+        strength: 0.01
+        gamma: 0.99
+        encoding_size: 128
+        demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
+    behavioral_cloning:
+      demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
+      strength: 0.5
+      steps: 150000
--- a/config/ppo/3DBall.yaml
+++ b/config/ppo/3DBall.yaml
+behaviors:
+  3DBall:
+    trainer: ppo
+    batch_size: 64
+    beta: 0.001
+    buffer_size: 12000
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.99
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 5.0e5
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 12000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/ppo/3DBallHard.yaml
+++ b/config/ppo/3DBallHard.yaml
+behaviors:
+  3DBallHard:
+    trainer: ppo
+    batch_size: 1200
+    beta: 0.001
+    buffer_size: 12000
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 5.0e6
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 12000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/ppo/3DBall_randomize.yaml
+++ b/config/ppo/3DBall_randomize.yaml
+behaviors:
+    3DBall:
+        trainer: ppo
+        batch_size: 64
+        beta: 0.001
+        buffer_size: 12000
+        epsilon: 0.2
+        hidden_units: 128
+        lambd: 0.99
+        learning_rate: 3.0e-4
+        learning_rate_schedule: linear
+        max_steps: 5.0e5
+        memory_size: 128
+        normalize: true
+        num_epoch: 3
+        num_layers: 2
+        time_horizon: 1000
+        sequence_length: 64
+        summary_freq: 12000
+        use_recurrent: false
+        vis_encode_type: simple
+        reward_signals:
+            extrinsic:
+                strength: 1.0
+                gamma: 0.99
+
+parameter_randomization:
+    resampling-interval: 500
+    mass:
+        sampler-type: "uniform"
+        min_value: 0.5
+        max_value: 10
+    gravity:
+        sampler-type: "uniform"
+        min_value: 7
+        max_value: 12
+    scale:
+        sampler-type: "uniform"
+        min_value: 0.75
+        max_value: 3
--- a/config/ppo/Basic.yaml
+++ b/config/ppo/Basic.yaml
+behaviors:
+  Basic:
+    trainer: ppo
+    batch_size: 32
+    beta: 0.005
+    buffer_size: 256
+    epsilon: 0.2
+    hidden_units: 20
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 5.0e5
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 1
+    time_horizon: 3
+    sequence_length: 64
+    summary_freq: 2000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.9
--- a/config/ppo/Bouncer.yaml
+++ b/config/ppo/Bouncer.yaml
+behaviors:
+  Bouncer:
+    trainer: ppo
+    batch_size: 1024
+    beta: 0.005
+    buffer_size: 10240
+    epsilon: 0.2
+    hidden_units: 64
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 4.0e6
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/ppo/CrawlerDynamic.yaml
+++ b/config/ppo/CrawlerDynamic.yaml
+behaviors:
+  CrawlerDynamic:
+    trainer: ppo
+    batch_size: 2024
+    beta: 0.005
+    buffer_size: 20240
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 1e7
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 3
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/ppo/CrawlerStatic.yaml
+++ b/config/ppo/CrawlerStatic.yaml
+behaviors:
+  CrawlerStatic:
+    trainer: ppo
+    batch_size: 2024
+    beta: 0.005
+    buffer_size: 20240
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 1e7
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 3
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/ppo/FoodCollector.yaml
+++ b/config/ppo/FoodCollector.yaml
+behaviors:
+  FoodCollector:
+    trainer: ppo
+    batch_size: 1024
+    beta: 0.005
+    buffer_size: 10240
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 2.0e6
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/ppo/GridWorld.yaml
+++ b/config/ppo/GridWorld.yaml
+behaviors:
+  GridWorld:
+    trainer: ppo
+    batch_size: 32
+    beta: 0.005
+    buffer_size: 256
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 500000
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 1
+    time_horizon: 5
+    sequence_length: 64
+    summary_freq: 20000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.9
--- a/config/ppo/Hallway.yaml
+++ b/config/ppo/Hallway.yaml
+behaviors:
+  Hallway:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.01
+    buffer_size: 1024
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 1.0e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: true
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/ppo/PushBlock.yaml
+++ b/config/ppo/PushBlock.yaml
+behaviors:
+  PushBlock:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.01
+    buffer_size: 2048
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 2.0e6
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 60000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/ppo/Pyramids.yaml
+++ b/config/ppo/Pyramids.yaml
+behaviors:
+  Pyramids:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.01
+    buffer_size: 2048
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 1.0e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 30000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+      curiosity:
+        strength: 0.02
+        gamma: 0.99
+        encoding_size: 256
--- a/config/ppo/Reacher.yaml
+++ b/config/ppo/Reacher.yaml
+behaviors:
+  Reacher:
+    trainer: ppo
+    batch_size: 2024
+    beta: 0.005
+    buffer_size: 20240
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 2e7
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 60000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/ppo/SoccerTwos.yaml
+++ b/config/ppo/SoccerTwos.yaml
+behaviors:
+  SoccerTwos:
+    trainer: ppo
+    batch_size: 2048
+    beta: 0.005
+    buffer_size: 20480
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5.0e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    self_play:
+      window: 10
+      play_against_latest_model_ratio: 0.5
+      save_steps: 50000
+      swap_steps: 50000
+      team_change: 200000
+    curriculum:
+      measure: progress
+      thresholds: [0.05, 0.1]
+      min_lesson_length: 100
+      signal_smoothing: true
+      parameters:
+        ball_touch: [1.0, 0.5, 0.0]
--- a/config/ppo/StrikersVsGoalie.yaml
+++ b/config/ppo/StrikersVsGoalie.yaml
+behaviors:
+  Goalie:
+    trainer: ppo
+    batch_size: 2048
+    beta: 0.005
+    buffer_size: 20480
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5.0e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    self_play:
+      window: 10
+      play_against_latest_model_ratio: 0.5
+      save_steps: 50000
+      swap_steps: 25000
+      team_change: 200000
+
+  Striker:
+    trainer: ppo
+    batch_size: 2048
+    beta: 0.005
+    buffer_size: 20480
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5.0e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    self_play:
+      window: 10
+      play_against_latest_model_ratio: 0.5
+      save_steps: 50000
+      swap_steps: 100000
+      team_change: 200000
--- a/config/ppo/Tennis.yaml
+++ b/config/ppo/Tennis.yaml
+behaviors:
+  Tennis:
+    trainer: ppo
+    batch_size: 1024
+    beta: 0.005
+    buffer_size: 10240
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5.0e7
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    self_play:
+      window: 10
+      play_against_latest_model_ratio: 0.5
+      save_steps: 50000
+      swap_steps: 50000
+      team_change: 100000
--- a/config/ppo/VisualHallway.yaml
+++ b/config/ppo/VisualHallway.yaml
+behaviors:
+  VisualHallway:
+    trainer: ppo
+    batch_size: 64
+    beta: 0.01
+    buffer_size: 1024
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 1.0e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 1
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: true
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/ppo/VisualPushBlock.yaml
+++ b/config/ppo/VisualPushBlock.yaml
+behaviors:
+  VisualPushBlock:
+    trainer: ppo
+    batch_size: 64
+    beta: 0.01
+    buffer_size: 1024
+    epsilon: 0.2
+    hidden_units: 128
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 3.0e6
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 1
+    time_horizon: 64
+    sequence_length: 32
+    summary_freq: 60000
+    use_recurrent: true
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/ppo/VisualPyramids.yaml
+++ b/config/ppo/VisualPyramids.yaml
+behaviors:
+  VisualPyramids:
+    trainer: ppo
+    batch_size: 64
+    beta: 0.01
+    buffer_size: 2024
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 1.0e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 1
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 10000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+      curiosity:
+        strength: 0.01
+        gamma: 0.99
+        encoding_size: 256
--- a/config/ppo/Walker.yaml
+++ b/config/ppo/Walker.yaml
+behaviors:
+  Walker:
+    trainer: ppo
+    batch_size: 2048
+    beta: 0.005
+    buffer_size: 20480
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 2e7
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 3
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/ppo/WallJump.yaml
+++ b/config/ppo/WallJump.yaml
+behaviors:
+  BigWallJump:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.005
+    buffer_size: 2048
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 2e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 20000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+
+  SmallWallJump:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.005
+    buffer_size: 2048
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 5e6
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 20000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/ppo/WallJump_curriculum.yaml
+++ b/config/ppo/WallJump_curriculum.yaml
+behaviors:
+  BigWallJump:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.005
+    buffer_size: 2048
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 2e7
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 20000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    curriculum:
+      measure: progress
+      thresholds: [0.1, 0.3, 0.5]
+      min_lesson_length: 100
+      signal_smoothing: true
+      parameters:
+        big_wall_min_height: [0.0, 4.0, 6.0, 8.0]
+        big_wall_max_height: [4.0, 7.0, 8.0, 8.0]
+
+  SmallWallJump:
+    trainer: ppo
+    batch_size: 128
+    beta: 0.005
+    buffer_size: 2048
+    epsilon: 0.2
+    hidden_units: 256
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 5e6
+    memory_size: 128
+    normalize: false
+    num_epoch: 3
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 20000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    curriculum:
+      measure: progress
+      thresholds: [0.1, 0.3, 0.5]
+      min_lesson_length: 100
+      signal_smoothing: true
+      parameters:
+        small_wall_height: [1.5, 2.0, 2.5, 4.0]
--- a/config/ppo/WormDynamic.yaml
+++ b/config/ppo/WormDynamic.yaml
+behaviors:
+  WormDynamic:
+    trainer: ppo
+    batch_size: 2024
+    beta: 0.005
+    buffer_size: 20240
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 3.5e6
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 3
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/ppo/WormStatic.yaml
+++ b/config/ppo/WormStatic.yaml
+behaviors:
+  WormStatic:
+    trainer: ppo
+    batch_size: 2024
+    beta: 0.005
+    buffer_size: 20240
+    epsilon: 0.2
+    hidden_units: 512
+    lambd: 0.95
+    learning_rate: 0.0003
+    learning_rate_schedule: linear
+    max_steps: 3.5e6
+    memory_size: 128
+    normalize: true
+    num_epoch: 3
+    num_layers: 3
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/sac/3DBall.yaml
+++ b/config/sac/3DBall.yaml
+behaviors:
+  3DBall:
+    trainer: sac
+    batch_size: 64
+    buffer_size: 12000
+    buffer_init_steps: 0
+    hidden_units: 64
+    init_entcoef: 0.5
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5.0e5
+    memory_size: 128
+    normalize: true
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 12000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/sac/3DBallHard.yaml
+++ b/config/sac/3DBallHard.yaml
+behaviors:
+  3DBallHard:
+    trainer: sac
+    batch_size: 256
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 128
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5.0e5
+    memory_size: 128
+    normalize: true
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 12000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/sac/Basic.yaml
+++ b/config/sac/Basic.yaml
+behaviors:
+  Basic:
+    trainer: sac
+    batch_size: 64
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 20
+    init_entcoef: 0.01
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5.0e5
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 10
+    sequence_length: 64
+    summary_freq: 2000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/sac/Bouncer.yaml
+++ b/config/sac/Bouncer.yaml
+behaviors:
+  Bouncer:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 64
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 1.0e6
+    memory_size: 128
+    normalize: true
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 20000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/sac/CrawlerDynamic.yaml
+++ b/config/sac/CrawlerDynamic.yaml
+behaviors:
+  CrawlerDynamic:
+    trainer: sac
+    batch_size: 256
+    buffer_size: 500000
+    buffer_init_steps: 0
+    hidden_units: 512
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5e6
+    memory_size: 128
+    normalize: true
+    steps_per_update: 20
+    num_layers: 3
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/sac/CrawlerStatic.yaml
+++ b/config/sac/CrawlerStatic.yaml
+behaviors:
+  CrawlerStatic:
+    trainer: sac
+    batch_size: 256
+    buffer_size: 500000
+    buffer_init_steps: 2000
+    hidden_units: 512
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 3e6
+    memory_size: 128
+    normalize: true
+    steps_per_update: 20
+    num_layers: 3
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/sac/FoodCollector.yaml
+++ b/config/sac/FoodCollector.yaml
+behaviors:
+  FoodCollector:
+    trainer: sac
+    batch_size: 256
+    buffer_size: 500000
+    buffer_init_steps: 0
+    hidden_units: 128
+    init_entcoef: 0.05
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 2.0e6
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 10000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/sac/GridWorld.yaml
+++ b/config/sac/GridWorld.yaml
+behaviors:
+  GridWorld:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 50000
+    buffer_init_steps: 1000
+    hidden_units: 128
+    init_entcoef: 0.5
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 500000
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 1
+    time_horizon: 5
+    sequence_length: 64
+    summary_freq: 20000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.9
--- a/config/sac/Hallway.yaml
+++ b/config/sac/Hallway.yaml
+behaviors:
+  Hallway:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 128
+    init_entcoef: 0.1
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5.0e6
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 32
+    summary_freq: 10000
+    tau: 0.005
+    use_recurrent: true
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/sac/PushBlock.yaml
+++ b/config/sac/PushBlock.yaml
+behaviors:
+  PushBlock:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 256
+    init_entcoef: 0.05
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 2e6
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 100000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/sac/Pyramids.yaml
+++ b/config/sac/Pyramids.yaml
+behaviors:
+  Pyramids:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 500000
+    buffer_init_steps: 10000
+    hidden_units: 256
+    init_entcoef: 0.01
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 1.0e7
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 16
+    summary_freq: 30000
+    tau: 0.01
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 2.0
+        gamma: 0.99
+      gail:
+        strength: 0.02
+        gamma: 0.99
+        encoding_size: 128
+        use_actions: true
+        demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
--- a/config/sac/Reacher.yaml
+++ b/config/sac/Reacher.yaml
+behaviors:
+  Reacher:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 500000
+    buffer_init_steps: 0
+    hidden_units: 128
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 2e7
+    memory_size: 128
+    normalize: true
+    steps_per_update: 20
+    num_layers: 2
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 60000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/sac/Tennis.yaml
+++ b/config/sac/Tennis.yaml
+behaviors:
+  Tennis:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 256
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 2e7
+    memory_size: 128
+    normalize: true
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 64
+    sequence_length: 64
+    summary_freq: 10000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    self_play:
+      window: 10
+      play_against_current_self_ratio: 0.5
+      save_steps: 50000
+      swap_steps: 50000
--- a/config/sac/VisualHallway.yaml
+++ b/config/sac/VisualHallway.yaml
+behaviors:
+  VisualHallway:
+    trainer: sac
+    batch_size: 64
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 128
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 1.0e7
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 1
+    time_horizon: 64
+    sequence_length: 32
+    summary_freq: 10000
+    tau: 0.005
+    use_recurrent: true
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    gamma: 0.99
--- a/config/sac/VisualPushBlock.yaml
+++ b/config/sac/VisualPushBlock.yaml
+behaviors:
+  VisualPushBlock:
+    trainer: sac
+    batch_size: 64
+    buffer_size: 1024
+    buffer_init_steps: 0
+    hidden_units: 128
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 3.0e6
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 1
+    time_horizon: 64
+    sequence_length: 32
+    summary_freq: 60000
+    tau: 0.005
+    use_recurrent: true
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+    gamma: 0.99
--- a/config/sac/VisualPyramids.yaml
+++ b/config/sac/VisualPyramids.yaml
+behaviors:
+  VisualPyramids:
+    trainer: sac
+    batch_size: 64
+    buffer_size: 500000
+    buffer_init_steps: 1000
+    hidden_units: 256
+    init_entcoef: 0.01
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 1.0e7
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 1
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 10000
+    tau: 0.01
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 2.0
+        gamma: 0.99
+      gail:
+        strength: 0.02
+        gamma: 0.99
+        encoding_size: 128
+        use_actions: true
+        demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
--- a/config/sac/Walker.yaml
+++ b/config/sac/Walker.yaml
+behaviors:
+  Walker:
+    trainer: sac
+    batch_size: 256
+    buffer_size: 500000
+    buffer_init_steps: 0
+    hidden_units: 512
+    init_entcoef: 1.0
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 2e7
+    memory_size: 128
+    normalize: true
+    steps_per_update: 30
+    num_layers: 4
+    time_horizon: 1000
+    sequence_length: 64
+    summary_freq: 30000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.995
--- a/config/sac/WallJump.yaml
+++ b/config/sac/WallJump.yaml
+behaviors:
+  BigWallJump:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 256
+    init_entcoef: 0.1
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 2e7
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 20000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
+
+  SmallWallJump:
+    trainer: sac
+    batch_size: 128
+    buffer_size: 50000
+    buffer_init_steps: 0
+    hidden_units: 256
+    init_entcoef: 0.1
+    learning_rate: 0.0003
+    learning_rate_schedule: constant
+    max_steps: 5e6
+    memory_size: 128
+    normalize: false
+    steps_per_update: 10
+    num_layers: 2
+    time_horizon: 128
+    sequence_length: 64
+    summary_freq: 20000
+    tau: 0.005
+    use_recurrent: false
+    vis_encode_type: simple
+    reward_signals:
+      extrinsic:
+        strength: 1.0
+        gamma: 0.99
--- a/config/gail_config.yaml
+++ b/config/gail_config.yaml
-default:
-    trainer: ppo
-    batch_size: 1024
-    beta: 5.0e-3
-    buffer_size: 10240
-    epsilon: 0.2
-    hidden_units: 128
-    lambd: 0.95
-    learning_rate: 3.0e-4
-    max_steps: 5.0e5
-    memory_size: 256
-    normalize: false
-    num_epoch: 3
-    num_layers: 2
-    time_horizon: 64
-    sequence_length: 64
-    summary_freq: 10000
-    use_recurrent: false
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.99
-
-Pyramids:
-    summary_freq: 30000
-    time_horizon: 128
-    batch_size: 128
-    buffer_size: 2048
-    hidden_units: 512
-    num_layers: 2
-    beta: 1.0e-2
-    max_steps: 1.0e7
-    num_epoch: 3
-    behavioral_cloning:
-        demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
-        strength: 0.5
-        steps: 150000
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.99
-        curiosity:
-            strength: 0.02
-            gamma: 0.99
-            encoding_size: 256
-        gail:
-            strength: 0.01
-            gamma: 0.99
-            encoding_size: 128
-            demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo
-
-CrawlerStatic:
-    normalize: true
-    num_epoch: 3
-    time_horizon: 1000
-    batch_size: 2024
-    buffer_size: 20240
-    max_steps: 1e7
-    summary_freq: 30000
-    num_layers: 3
-    hidden_units: 512
-    behavioral_cloning:
-        demo_path: Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo
-        strength: 0.5
-        steps: 50000
-    reward_signals:
-        gail:
-            strength: 1.0
-            gamma: 0.99
-            encoding_size: 128
-            demo_path: Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo
-
-PushBlock:
-    max_steps: 1.5e7
-    batch_size: 128
-    buffer_size: 2048
-    beta: 1.0e-2
-    hidden_units: 256
-    summary_freq: 60000
-    time_horizon: 64
-    num_layers: 2
-    reward_signals:
-        gail:
-            strength: 1.0
-            gamma: 0.99
-            encoding_size: 128
-            demo_path: Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo
-
-Hallway:
-    use_recurrent: true
-    sequence_length: 64
-    num_layers: 2
-    hidden_units: 128
-    memory_size: 256
-    beta: 1.0e-2
-    num_epoch: 3
-    buffer_size: 1024
-    batch_size: 128
-    max_steps: 1.0e7
-    summary_freq: 10000
-    time_horizon: 64
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.99
-        gail:
-            strength: 0.1
-            gamma: 0.99
-            encoding_size: 128
-            demo_path: Project/Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo
-
-FoodCollector:
-    batch_size: 64
-    max_steps: 2.0e6
-    use_recurrent: false
-    hidden_units: 128
-    learning_rate: 3.0e-4
-    num_layers: 2
-    sequence_length: 32
-    reward_signals:
-        gail:
-            strength: 0.1
-            gamma: 0.99
-            encoding_size: 128
-            demo_path: Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo
-    behavioral_cloning:
-        demo_path: Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo
-        strength: 1.0
-        steps: 0
--- a/config/3dball_randomize.yaml
+++ b/config/3dball_randomize.yaml
-resampling-interval: 5000
-
-mass:
-    sampler-type: "uniform"
-    min_value: 0.5
-    max_value: 10
-
-gravity:
-    sampler-type: "uniform"
-    min_value: 7
-    max_value: 12
-
-scale:
-    sampler-type: "uniform"
-    min_value: 0.75
-    max_value: 3
--- a/config/trainer_config.yaml
+++ b/config/trainer_config.yaml
-default:
-    trainer: ppo
-    batch_size: 1024
-    beta: 5.0e-3
-    buffer_size: 10240
-    epsilon: 0.2
-    hidden_units: 128
-    lambd: 0.95
-    learning_rate: 3.0e-4
-    learning_rate_schedule: linear
-    max_steps: 5.0e5
-    memory_size: 128
-    normalize: false
-    num_epoch: 3
-    num_layers: 2
-    time_horizon: 64
-    sequence_length: 64
-    summary_freq: 10000
-    use_recurrent: false
-    vis_encode_type: simple
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.99
-
-FoodCollector:
-    normalize: false
-    beta: 5.0e-3
-    batch_size: 1024
-    buffer_size: 10240
-    max_steps: 2.0e6
-
-Bouncer:
-    normalize: true
-    max_steps: 4.0e6
-    num_layers: 2
-    hidden_units: 64
-
-PushBlock:
-    max_steps: 2.0e6
-    batch_size: 128
-    buffer_size: 2048
-    beta: 1.0e-2
-    hidden_units: 256
-    summary_freq: 60000
-    time_horizon: 64
-    num_layers: 2
-
-SmallWallJump:
-    max_steps: 5e6
-    batch_size: 128
-    buffer_size: 2048
-    beta: 5.0e-3
-    hidden_units: 256
-    summary_freq: 20000
-    time_horizon: 128
-    num_layers: 2
-    normalize: false
-
-BigWallJump:
-    max_steps: 2e7
-    batch_size: 128
-    buffer_size: 2048
-    beta: 5.0e-3
-    hidden_units: 256
-    summary_freq: 20000
-    time_horizon: 128
-    num_layers: 2
-    normalize: false
-
-Pyramids:
-    summary_freq: 30000
-    time_horizon: 128
-    batch_size: 128
-    buffer_size: 2048
-    hidden_units: 512
-    num_layers: 2
-    beta: 1.0e-2
-    max_steps: 1.0e7
-    num_epoch: 3
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.99
-        curiosity:
-            strength: 0.02
-            gamma: 0.99
-            encoding_size: 256
-
-VisualPyramids:
-    time_horizon: 128
-    batch_size: 64
-    buffer_size: 2024
-    hidden_units: 256
-    num_layers: 1
-    beta: 1.0e-2
-    max_steps: 1.0e7
-    num_epoch: 3
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.99
-        curiosity:
-            strength: 0.01
-            gamma: 0.99
-            encoding_size: 256
-
-3DBall:
-    normalize: true
-    batch_size: 64
-    buffer_size: 12000
-    summary_freq: 12000
-    time_horizon: 1000
-    lambd: 0.99
-    beta: 0.001
-
-3DBallHard:
-    normalize: true
-    batch_size: 1200
-    buffer_size: 12000
-    summary_freq: 12000
-    time_horizon: 1000
-    max_steps: 5.0e6
-    beta: 0.001
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.995
-
-Tennis:
-    normalize: true
-    max_steps: 5.0e7
-    learning_rate_schedule: constant
-    batch_size: 1024
-    buffer_size: 10240
-    hidden_units: 256
-    time_horizon: 1000
-    self_play:
-        window: 10
-        play_against_latest_model_ratio: 0.5
-        save_steps: 50000
-        swap_steps: 50000
-        team_change: 100000
-
-Goalie:
-    normalize: false
-    max_steps: 5.0e7
-    learning_rate_schedule: constant
-    batch_size: 2048
-    buffer_size: 20480
-    hidden_units: 512
-    time_horizon: 1000
-    num_layers: 2
-    self_play:
-        window: 10
-        play_against_latest_model_ratio: 0.5
-        save_steps: 50000
-        swap_steps: 25000
-        team_change: 200000
-
-Striker:
-    normalize: false
-    max_steps: 5.0e7
-    learning_rate_schedule: constant
-    batch_size: 2048
-    buffer_size: 20480
-    hidden_units: 512
-    time_horizon: 1000
-    num_layers: 2
-    self_play:
-        window: 10
-        play_against_latest_model_ratio: 0.5
-        save_steps: 50000
-        swap_steps: 100000
-        team_change: 200000
-
-SoccerTwos:
-    normalize: false
-    max_steps: 5.0e7
-    learning_rate_schedule: constant
-    batch_size: 2048
-    buffer_size: 20480
-    hidden_units: 512
-    time_horizon: 1000
-    num_layers: 2
-    self_play:
-        window: 10
-        play_against_latest_model_ratio: 0.5
-        save_steps: 50000
-        swap_steps: 50000
-        team_change: 200000
-
-CrawlerStatic:
-    normalize: true
-    num_epoch: 3
-    time_horizon: 1000
-    batch_size: 2024
-    buffer_size: 20240
-    max_steps: 1e7
-    summary_freq: 30000
-    num_layers: 3
-    hidden_units: 512
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.995
-
-CrawlerDynamic:
-    normalize: true
-    num_epoch: 3
-    time_horizon: 1000
-    batch_size: 2024
-    buffer_size: 20240
-    max_steps: 1e7
-    summary_freq: 30000
-    num_layers: 3
-    hidden_units: 512
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.995
-
-WormDynamic:
-    normalize: true
-    num_epoch: 3
-    time_horizon: 1000
-    batch_size: 2024
-    buffer_size: 20240
-    max_steps: 3.5e6
-    summary_freq: 30000
-    num_layers: 3
-    hidden_units: 512
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.995
-
-WormStatic:
-    normalize: true
-    num_epoch: 3
-    time_horizon: 1000
-    batch_size: 2024
-    buffer_size: 20240
-    max_steps: 3.5e6
-    summary_freq: 30000
-    num_layers: 3
-    hidden_units: 512
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.995
-
-Walker:
-    normalize: true
-    num_epoch: 3
-    time_horizon: 1000
-    batch_size: 2048
-    buffer_size: 20480
-    max_steps: 2e7
-    summary_freq: 30000
-    num_layers: 3
-    hidden_units: 512
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.995
-
-Reacher:
-    normalize: true
-    num_epoch: 3
-    time_horizon: 1000
-    batch_size: 2024
-    buffer_size: 20240
-    max_steps: 2e7
-    summary_freq: 60000
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.995
-
-Hallway:
-    use_recurrent: true
-    sequence_length: 64
-    num_layers: 2
-    hidden_units: 128
-    memory_size: 128
-    beta: 1.0e-2
-    num_epoch: 3
-    buffer_size: 1024
-    batch_size: 128
-    max_steps: 1.0e7
-    summary_freq: 10000
-    time_horizon: 64
-
-VisualHallway:
-    use_recurrent: true
-    sequence_length: 64
-    num_layers: 1
-    hidden_units: 128
-    memory_size: 128
-    beta: 1.0e-2
-    num_epoch: 3
-    buffer_size: 1024
-    batch_size: 64
-    max_steps: 1.0e7
-    summary_freq: 10000
-    time_horizon: 64
-
-VisualPushBlock:
-    use_recurrent: true
-    sequence_length: 32
-    num_layers: 1
-    hidden_units: 128
-    memory_size: 128
-    beta: 1.0e-2
-    num_epoch: 3
-    buffer_size: 1024
-    batch_size: 64
-    max_steps: 3.0e6
-    summary_freq: 60000
-    time_horizon: 64
-
-GridWorld:
-    batch_size: 32
-    normalize: false
-    num_layers: 1
-    hidden_units: 256
-    beta: 5.0e-3
-    buffer_size: 256
-    max_steps: 500000
-    summary_freq: 20000
-    time_horizon: 5
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.9
-
-Basic:
-    batch_size: 32
-    normalize: false
-    num_layers: 1
-    hidden_units: 20
-    beta: 5.0e-3
-    buffer_size: 256
-    max_steps: 5.0e5
-    summary_freq: 2000
-    time_horizon: 3
-    reward_signals:
-        extrinsic:
-            strength: 1.0
-            gamma: 0.9