GAIL and Pretraining (#2118)

Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes: - A new GAILRewardSignal and GAILModel for GAIL/VAIL - A BCModule component (not a reward signal) to do pretraining during RL - Documentation for both of these - Change to Demo Loader that lets you load multiple demo files in a folder - Example Demo files for all of our tested sample environments (for future regression testing)
6 年前 · 9c50abcf
--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md
 Imitation Learning uses pairs of observations and actions from
 from a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).

+Imitation learning can also be used to help reinforcement learning. Especially in 
+environments with sparse (i.e., infrequent or rare) rewards, the agent may never see
+the reward and thus not learn from it. Curiosity helps the agent explore, but in some cases
+it is easier to just show the agent how to achieve the reward. In these cases, 
+imitation learning can dramatically reduce the time it takes to solve the environment.
+For instance, on the [Pyramids environment](Learning-Environment-Examples.md#pyramids), 
+just 6 episodes of demonstrations can reduce training steps by more than 4 times.
+
+<p align="center">
+  <img src="images/mlagents-ImitationAndRL.png"
+       alt="Using Demonstrations with Reinforcement Learning"
+       width="350" border="0" />
+</p>
+
+ML-Agents provides several ways to learn from demonstrations. For most situations,
+[GAIL](Training-RewardSignals.md#the-gail-reward-signal) is the preferred approach.
+
+* To train using GAIL (Generative Adversarial Imitaiton Learning) you can add the
+  [GAIL reward signal](Training-RewardSignals.md#the-gail-reward-signal). GAIL can be
+  used with or without environment rewards, and works well when there are a limited
+  number of demonstrations. 
+* To help bootstrap reinforcement learning, you can enable 
+  [pretraining](Training-PPO.md#optional-pretraining-using-demonstrations) 
+  on the PPO trainer, in addition to using a small GAIL reward signal. 
+* To train an agent to exactly mimic demonstrations, you can use the 
+  [Behavioral Cloning](Training-BehavioralCloning.md) trainer. Behavioral Cloning can be
+  used offline and online (in-editor), and learns very quickly. However, it usually is ineffective
+  on more complex environments without a large number of demonstrations.
+
 ## Recording Demonstrations

 It is possible to record demonstrations of agent behavior from the Unity Editor, 
       alt="BC Teacher Helper"
       width="375" border="10" />
 </p>
- 
-
-## Training with Behavioral Cloning
-
-There are a variety of possible imitation learning algorithms which can 
-be used, the simplest one of them is Behavioral Cloning. It works by collecting 
-demonstrations from a teacher, and then simply uses them to directly learn a 
-policy, in the same way the supervised learning for image classification 
-or other traditional Machine Learning tasks work.
-
-
-### Offline Training
-
-With offline behavioral cloning, we can use demonstrations (`.demo` files) 
-generated using the `Demonstration Recorder` as the dataset used to train a behavior.
-
-1. Choose an agent you would like to learn to imitate some set of demonstrations. 
-2. Record a set of demonstration using the `Demonstration Recorder` (see above). 
-   For illustrative purposes we will refer to this file as `AgentRecording.demo`. 
-3. Build the scene, assigning the agent a Learning Brain, and set the Brain to 
-   Control in the Broadcast Hub. For more information on Brains, see 
-   [here](Learning-Environment-Design-Brains.md).
-4. Open the `config/offline_bc_config.yaml` file. 
-5. Modify the `demo_path` parameter in the file to reference the path to the 
-   demonstration file recorded in step 2. In our case this is: 
-   `./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
-6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml` 
-   as the config parameter, and include the `--run-id` and `--train` as usual. 
-   Provide your environment as the `--env` parameter if it has been compiled 
-   as standalone, or omit to train in the editor.
-7. (Optional) Observe training performance using TensorBoard.
-
-This will use the demonstration file to train a neural network driven agent 
-to directly imitate the actions provided in the demonstration. The environment 
-will launch and be used for evaluating the agent's performance during training.
-
-### Online Training
-
-It is also possible to provide demonstrations in realtime during training, 
-without pre-recording a demonstration file. The steps to do this are as follows:
-
-1. First create two Brains, one which will be the "Teacher," and the other which
-   will be the "Student." We will assume that the names of the Brain
-   Assets are "Teacher" and "Student" respectively.
-2. The "Teacher" Brain must be a **Player Brain**. You must properly 
-   configure the inputs to map to the corresponding actions.
-3. The "Student" Brain must be a **Learning Brain**.
-4. The Brain Parameters of both the "Teacher" and "Student" Brains must be 
-   compatible with the agent.
-5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub` 
-   and check the `Control` checkbox on the "Student" Brain. 
-6. Link the Brains to the desired Agents (one Agent as the teacher and at least
-   one Agent as a student).
-7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
-   the `trainer` parameter of this entry to `online_bc`, and the
-   `brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
-   Additionally, set `batches_per_epoch`, which controls how much training to do
-   each moment. Increase the `max_steps` option if you'd like to keep training
-   the Agents for a longer period of time.
-8. Launch the training process with `mlagents-learn config/online_bc_config.yaml
-   --train --slow`, and press the :arrow_forward: button in Unity when the
-   message _"Start training by pressing the Play button in the Unity Editor"_ is
-   displayed on the screen
-9. From the Unity window, control the Agent with the Teacher Brain by providing
-   "teacher demonstrations" of the behavior you would like to see.
-10. Watch as the Agent(s) with the student Brain attached begin to behave
-   similarly to the demonstrations.
-11. Once the Student Agents are exhibiting the desired behavior, end the training
-   process with `CTL+C` from the command line.
-12. Move the resulting `*.nn` file into the `TFModels` subdirectory of the
-    Assets folder (or a subdirectory within Assets of your choosing) , and use
-    with `Learning` Brain.
-
-**BC Teacher Helper**
-
-We provide a convenience utility, `BC Teacher Helper` component that you can add
-to the Teacher Agent.
-
-<p align="center">
-  <img src="images/bc_teacher_helper.png"
-       alt="BC Teacher Helper"
-       width="375" border="10" />
-</p>
-
-This utility enables you to use keyboard shortcuts to do the following:
-
-1. To start and stop recording experiences. This is useful in case you'd like to
-   interact with the game _but not have the agents learn from these
-   interactions_. The default command to toggle this is to press `R` on the
-   keyboard.
-
-2. Reset the training buffer. This enables you to instruct the agents to forget
-   their buffer of recent experiences. This is useful if you'd like to get them
-   to quickly learn a new behavior. The default command to reset the buffer is
-   to press `C` on the keyboard.
+ 
--- a/docs/Training-PPO.md
+++ b/docs/Training-PPO.md
 presented to an agent, see [Training with Curriculum
 Learning](Training-Curriculum-Learning.md).

-For information about imitation learning, which uses a different training
-algorithm, see
+For information about imitation learning from demonstrations, see
 [Training with Imitation Learning](Training-Imitation-Learning.md).

 ## Best Practices when training with PPO
 the agent will need to remember in order to successfully complete the task.

 Typical Range: `64` - `512`
+
+## (Optional) Pretraining Using Demonstrations
+
+In some cases, you might want to bootstrap the agent's policy using behavior recorded
+from a player. This can help guide the agent towards the reward. Pretraining adds 
+training operations that mimic a demonstration rather than attempting to maximize reward. 
+It is essentially equivalent to running [behavioral cloning](./Training-BehavioralCloning.md)
+in-line with PPO.
+
+To use pretraining, add a `pretraining` section to the trainer_config. For instance:
+
+```
+    pretraining:
+        demo_path: ./demos/ExpertPyramid.demo
+        strength: 0.5
+        steps: 10000
+```
+
+Below are the avaliable hyperparameters for pretraining.
+
+### Strength
+
+`strength` corresponds to the learning rate of the imitation relative to the learning
+rate of PPO, and roughly corresponds to how strongly we allow the behavioral cloning
+to influence the policy. 
+
+Typical Range: `0.1` - `0.5`
+
+### Demo Path
+
+`demo_path` is the path to your `.demo` file or directory of `.demo` files. 
+See the [imitation learning guide](Training-ImitationLearning.md) for more on `.demo` files.
+
+### Steps
+
+During pretraining, it is often desirable to stop using demonstrations after the agent has 
+"seen" rewards, and allow it to optimize past the available demonstrations and/or generalize
+outside of the provided demonstrations. `steps` corresponds to the training steps over which
+pretraining is active. The learning rate of the pretrainer will anneal over the steps. Set 
+the steps to 0 for constant imitation over the entire training run. 
+
+### (Optional) Batch Size
+
+`batch_size` is the number of demonstration experiences used for one iteration of a gradient
+descent update. If not specified, it will default to the `batch_size` defined for PPO.
+
+Typical Range (Continuous): `512` - `5120`
+
+Typical Range (Discrete): `32` - `512`
+
+### (Optional) Number of Epochs
+
+`num_epoch` is the number of passes through the experience buffer during
+gradient descent. If not specified, it will default to the number of epochs set for PPO.
+
+Typical Range: `3` - `10`
+
+### (Optional) Samples Per Update
+
+`samples_per_update` is the maximum number of samples
+to use during each imitation update. You may want to lower this if your demonstration
+dataset is very large to avoid overfitting the policy on demonstrations. Set to 0 
+to train over all of the demonstrations at each update step.
+
+Default Value: `0` (all)
+
+Typical Range: Approximately equal to PPO's `buffer_size`

 ## Training Statistics

--- a/docs/Training-RewardSignals.md
+++ b/docs/Training-RewardSignals.md
 observation, but also not too small to prevent it from learning to differentiate between
 demonstrated and actual behavior.

-Default Value: 64
+Default Value: `64`
+
 Typical Range: `64` - `256`

 #### Learning Rate

 Default Value: `3e-4`
+
+
+### The GAIL Reward Signal
+
+GAIL, or [Generative Adversarial Imitation Learning](https://arxiv.org/abs/1606.03476), is an 
+imitation learning algorithm that uses an adversarial approach, in a similar vein to GANs 
+(Generative Adversarial Networks). In this framework, a second neural network, the
+discriminator, is taught to distinguish whether an observation/action is from a demonstration, or 
+produced by the agent. This discriminator can the examine a new observation/action and provide it a 
+reward based on how close it believes this new observation/action is to the provided demonstrations. 
+
+At each training step, the agent tries to learn how to maximize this reward. Then, the 
+discriminator is trained to better distinguish between demonstrations and agent state/actions. 
+In this way, while the agent gets better and better at mimicing the demonstrations, the
+discriminator keeps getting stricter and stricter and the agent must try harder to "fool" it. 
+
+This approach, when compared to [Behavioral Cloning](Training-BehavioralCloning.md), requires 
+far fewer demonstrations to be provided. After all, we are still learning a policy that happens
+to be similar to the demonstration, not directly copying the behavior of the demonstrations. It
+is also especially effective when combined with an Extrinsic signal, but can also be used 
+independently to purely learn from demonstration. 
+
+Using GAIL requires recorded demonstrations from your Unity environment. See the 
+[imitation learning guide](Training-Imitation-Learning.md) to learn more about recording demonstrations.
+
+#### Strength 
+
+`strength` is the factor by which to multiply the raw reward. Note that when using GAIL
+with an Extrinsic Signal, this value should be set lower if your demonstrations are 
+suboptimal (e.g. from a human), so that a trained agent will focus on receiving extrinsic 
+rewards instead of exactly copying the demonstrations. Keep the strength below about 0.1 in those cases. 
+
+Typical Range: `0.01` - `1.0`
+
+#### Gamma
+
+`gamma` corresponds to the discount factor for future rewards. 
+
+Typical Range: `0.8` - `0.9`
+
+#### Demo Path
+
+`demo_path` is the path to your `.demo` file or directory of `.demo` files. See the [imitation learning guide]
+(Training-ImitationLearning.md).
+
+#### Encoding Size
+
+`encoding_size` corresponds to the size of the hidden layer used by the discriminator. 
+This value should be small enough to encourage the discriminator to compress the original
+observation, but also not too small to prevent it from learning to differentiate between 
+demonstrated and actual behavior. Dramatically increasing this size will also negatively affect
+training times. 
+
+Default Value: `64`
+
+Typical Range: `64` - `256`
+
+#### Learning Rate
+
+`learning_rate` is the learning rate used to update the discriminator. 
+This should typically be decreased if training is unstable, and the GAIL loss is unstable.
+
+Default Value: `3e-4`
+
+Typical Range: `1e-5` - `1e-3`  
+
+#### Use Actions
+
+`use_actions` determines whether the discriminator should discriminate based on both 
+observations and actions, or just observations. Set to `True` if you want the agent to
+mimic the actions from the demonstrations, and `False` if you'd rather have the agent
+visit the same states as in the demonstrations but with possibly different actions. 
+Setting to `False` is more likely to be stable, especially with imperfect demonstrations,
+but may learn slower. 
+
+Default Value: `false`
+
+#### (Optional) Samples Per Update
+
+`samples_per_update` is the maximum number of samples to use during each discriminator update. You may 
+want to lower this if your buffer size is very large to avoid overfitting the discriminator on current data. 
+If set to 0, we will use the minimum of buffer size and the number of demonstration samples. 
+
+Default Value: `0`
+
+Typical Range: Approximately equal to [`buffer_size`](Training-PPO.md)
+
+#### (Optional) Variational Discriminator Bottleneck
+
+`use_vail` enables a [variational bottleneck](https://arxiv.org/abs/1810.00821) within the 
+GAIL discriminator. This forces the discriminator to learn a more general representation 
+and reduces its tendency to be "too good" at discriminating, making learning more stable. 
+However, it does increase training time. Enable this if you notice your imitation learning is
+unstable, or unable to learn the task at hand. 
+
+Default Value: `false`
--- a/ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py
+++ b/ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py
        """
        Creates the Curiosity reward generator
        :param policy: The Learning Policy
-        :param encoding_size: The size of the Curiosity encoding
-        :param signal_strength: The scaling parameter for the reward. The scaled reward will be the unscaled
+        :param strength: The scaling parameter for the reward. The scaled reward will be the unscaled
+        :param gamma: The time discounting factor used for this reward.
+        :param encoding_size: The size of the hidden encoding layer for the ICM
+        :param learning_rate: The learning rate for the ICM.
+        :param num_epoch: The number of epochs to train over the training buffer for the ICM. 
        """
        super().__init__(policy, strength, gamma)
        self.model = CuriosityModel(
--- a/ml-agents/mlagents/trainers/components/reward_signals/reward_signal_factory.py
+++ b/ml-agents/mlagents/trainers/components/reward_signals/reward_signal_factory.py
 from mlagents.trainers.components.reward_signals.extrinsic.signal import (
    ExtrinsicRewardSignal,
 )
+from mlagents.trainers.components.reward_signals.gail.signal import GAILRewardSignal
 from mlagents.trainers.components.reward_signals.curiosity.signal import (
    CuriosityRewardSignal,
 )
 NAME_TO_CLASS: Dict[str, Type[RewardSignal]] = {
    "extrinsic": ExtrinsicRewardSignal,
    "curiosity": CuriosityRewardSignal,
+    "gail": GAILRewardSignal,
 }


--- a/ml-agents/mlagents/trainers/demo_loader.py
+++ b/ml-agents/mlagents/trainers/demo_loader.py
 import pathlib
 import logging
 import os
+from typing import List, Tuple
-from mlagents.envs.communicator_objects import *
+from mlagents.envs.communicator_objects import (
+    AgentInfoProto,
+    BrainParametersProto,
+    DemonstrationMetaProto,
+)
 from google.protobuf.internal.decoder import _DecodeVarint32  # type: ignore


-def make_demo_buffer(brain_infos, brain_params, sequence_length):
+def make_demo_buffer(
+    brain_infos: List[BrainInfo], brain_params: BrainParameters, sequence_length: int
+) -> Buffer:
    # Create and populate buffer using experiences
    demo_buffer = Buffer()
    for idx, experience in enumerate(brain_infos):
    return demo_buffer


-def demo_to_buffer(file_path, sequence_length):
+def demo_to_buffer(
+    file_path: str, sequence_length: int
+) -> Tuple[BrainParameters, Buffer]:
    """
    Loads demonstration file and uses it to fill training buffer.
    :param file_path: Location of demonstration file (.demo).
    return brain_params, demo_buffer


-def load_demonstration(file_path):
+def load_demonstration(file_path: str) -> Tuple[BrainParameters, List[BrainInfo], int]:
    """
    Loads and parses a demonstration file.
    :param file_path: Location of demonstration file (.demo).
        all_files = os.listdir(file_path)
        for _file in all_files:
            if _file.endswith(".demo"):
-                file_paths.append(_file)
+                file_paths.append(os.path.join(file_path, _file))
+        if not all_files:
+            raise ValueError("There are no '.demo' files in the provided directory.")
+        file_extension = pathlib.Path(file_path).suffix
+        if file_extension != ".demo":
+            raise ValueError(
+                "The file is not a '.demo' file. Please provide a file with the "
+                "correct extension."
+            )
-    file_extension = pathlib.Path(file_path).suffix
-    if file_extension != ".demo":
-        raise ValueError(
-            "The file is not a '.demo' file. Please provide a file with the "
-            "correct extension."
-        )
+    total_expected = 0
-        total_expected = 0
-                total_expected = meta_data_proto.number_steps
+                total_expected += meta_data_proto.number_steps
                pos = INITIAL_POS
            if obs_decoded == 1:
                brain_param_proto = BrainParametersProto()
--- a/ml-agents/mlagents/trainers/ppo/policy.py
+++ b/ml-agents/mlagents/trainers/ppo/policy.py
 from mlagents.trainers.components.reward_signals.reward_signal_factory import (
    create_reward_signal,
 )
+from mlagents.trainers.components.bc.module import BCModule

 logger = logging.getLogger("mlagents.trainers")

                self.reward_signals[reward_signal] = create_reward_signal(
                    self, reward_signal, config
                )
+
+            # Create pretrainer if needed
+            if "pretraining" in trainer_params:
+                BCModule.check_config(trainer_params["pretraining"])
+                self.bc_module = BCModule(
+                    self,
+                    policy_learning_rate=trainer_params["learning_rate"],
+                    default_batch_size=trainer_params["batch_size"],
+                    default_num_epoch=trainer_params["num_epoch"],
+                    **trainer_params["pretraining"],
+                )
+            else:
+                self.bc_module = None

        if load:
            self._load_graph()
--- a/ml-agents/mlagents/trainers/ppo/trainer.py
+++ b/ml-agents/mlagents/trainers/ppo/trainer.py
            )
            for stat, val in update_stats.items():
                self.stats[stat].append(val)
+        if self.policy.bc_module:
+            update_stats = self.policy.bc_module.update()
+            for stat, val in update_stats.items():
+                self.stats[stat].append(val)
        self.training_buffer.reset_update_buffer()
        self.trainer_metrics.end_policy_update()

--- a/ml-agents/mlagents/trainers/tests/mock_brain.py
+++ b/ml-agents/mlagents/trainers/tests/mock_brain.py
 import pytest
 import numpy as np

+from mlagents.trainers.buffer import Buffer
+

 def create_mock_brainparams(
    number_visual_observations=0,
    mock_env.return_value.brain_names = ["MockBrain"]
    mock_env.return_value.reset.return_value = {"MockBrain": mock_braininfo}
    mock_env.return_value.step.return_value = {"MockBrain": mock_braininfo}
+
+
+def simulate_rollout(env, policy, buffer_init_samples):
+    brain_info_list = []
+    for i in range(buffer_init_samples):
+        brain_info_list.append(env.step()[env.brain_names[0]])
+    buffer = create_buffer(brain_info_list, policy.brain, policy.sequence_length)
+    return buffer
+
+
+def create_buffer(brain_infos, brain_params, sequence_length):
+    buffer = Buffer()
+    # Make a buffer
+    for idx, experience in enumerate(brain_infos):
+        if idx > len(brain_infos) - 2:
+            break
+        current_brain_info = brain_infos[idx]
+        next_brain_info = brain_infos[idx + 1]
+        buffer[0].last_brain_info = current_brain_info
+        buffer[0]["done"].append(next_brain_info.local_done[0])
+        buffer[0]["rewards"].append(next_brain_info.rewards[0])
+        for i in range(brain_params.number_visual_observations):
+            buffer[0]["visual_obs%d" % i].append(
+                current_brain_info.visual_observations[i][0]
+            )
+            buffer[0]["next_visual_obs%d" % i].append(
+                current_brain_info.visual_observations[i][0]
+            )
+        if brain_params.vector_observation_space_size > 0:
+            buffer[0]["vector_obs"].append(current_brain_info.vector_observations[0])
+            buffer[0]["next_vector_in"].append(
+                current_brain_info.vector_observations[0]
+            )
+        buffer[0]["actions"].append(next_brain_info.previous_vector_actions[0])
+        buffer[0]["prev_action"].append(current_brain_info.previous_vector_actions[0])
+        buffer[0]["masks"].append(1.0)
+        buffer[0]["advantages"].append(1.0)
+        buffer[0]["action_probs"].append(np.ones(buffer[0]["actions"][0].shape))
+        buffer[0]["actions_pre"].append(np.ones(buffer[0]["actions"][0].shape))
+        buffer[0]["random_normal_epsilon"].append(
+            np.ones(buffer[0]["actions"][0].shape)
+        )
+        buffer[0]["action_mask"].append(np.ones(buffer[0]["actions"][0].shape))
+        buffer[0]["memory"].append(np.ones(8))
+
+    buffer.append_update_buffer(0, batch_size=None, training_length=sequence_length)
+    return buffer
--- a/ml-agents/mlagents/trainers/tests/test_demo_loader.py
+++ b/ml-agents/mlagents/trainers/tests/test_demo_loader.py

    demo_buffer = make_demo_buffer(brain_infos, brain_parameters, 1)
    assert len(demo_buffer.update_buffer["actions"]) == total_expected - 1
+
+
+def test_load_demo_dir():
+    path_prefix = os.path.dirname(os.path.abspath(__file__))
+    brain_parameters, brain_infos, total_expected = load_demonstration(
+        path_prefix + "/test_demo_dir"
+    )
+    assert brain_parameters.brain_name == "Ball3DBrain"
+    assert brain_parameters.vector_observation_space_size == 8
+    assert len(brain_infos) == total_expected
+
+    demo_buffer = make_demo_buffer(brain_infos, brain_parameters, 1)
+    assert len(demo_buffer.update_buffer["actions"]) == total_expected - 1
--- a/ml-agents/mlagents/trainers/tests/test_reward_signals.py
+++ b/ml-agents/mlagents/trainers/tests/test_reward_signals.py
 from mlagents.trainers.ppo.models import PPOModel
 from mlagents.trainers.ppo.trainer import discount_rewards
 from mlagents.trainers.ppo.policy import PPOPolicy
+from mlagents.trainers.demo_loader import make_demo_buffer
 from mlagents.envs import UnityEnvironment
 from mlagents.envs.mock_communicator import MockCommunicator



@pytest.fixture
+def gail_dummy_config():
+    return {
+        "gail": {
+            "strength": 0.1,
+            "gamma": 0.9,
+            "encoding_size": 128,
+            "demo_path": os.path.dirname(os.path.abspath(__file__)) + "/test.demo",
+        }
+    }
+
+
+@pytest.fixture
+
+
+VECTOR_ACTION_SPACE = [2]
+VECTOR_OBS_SPACE = 8
+DISCRETE_ACTION_SPACE = [2]
+BUFFER_INIT_SAMPLES = 20
+NUM_AGENTS = 12


 def create_ppo_policy_mock(
    if not use_visual:
        mock_brain = mb.create_mock_brainparams(
            vector_action_space_type="discrete" if use_discrete else "continuous",
-            vector_action_space_size=[2],
-            vector_observation_space_size=8,
+            vector_action_space_size=DISCRETE_ACTION_SPACE
+            if use_discrete
+            else VECTOR_ACTION_SPACE,
+            vector_observation_space_size=VECTOR_OBS_SPACE,
-            num_agents=12,
-            num_vector_observations=8,
-            num_vector_acts=2,
+            num_agents=NUM_AGENTS,
+            num_vector_observations=VECTOR_OBS_SPACE,
+            num_vector_acts=sum(
+                DISCRETE_ACTION_SPACE if use_discrete else VECTOR_ACTION_SPACE
+            ),
-            vector_action_space_size=[2],
+            vector_action_space_size=DISCRETE_ACTION_SPACE
+            if use_discrete
+            else VECTOR_ACTION_SPACE,
-            num_agents=12,
+            num_agents=NUM_AGENTS,
-            num_vector_acts=2,
+            num_vector_acts=sum(
+                DISCRETE_ACTION_SPACE if use_discrete else VECTOR_ACTION_SPACE
+            ),
            discrete=use_discrete,
        )
    mb.setup_mock_unityenvironment(mock_env, mock_brain, mock_braininfo)
    return env, policy


-@mock.patch("mlagents.envs.UnityEnvironment")
-def test_curiosity_cc_evaluate(mock_env, dummy_config, curiosity_dummy_config):
-    env, policy = create_ppo_policy_mock(
-        mock_env, dummy_config, curiosity_dummy_config, False, False, False
-    )
+def reward_signal_eval(env, policy, reward_signal_name):
-    scaled_reward, unscaled_reward = policy.reward_signals["curiosity"].evaluate(
+    # Test evaluate
+    rsig_result = policy.reward_signals[reward_signal_name].evaluate(
-    assert scaled_reward.shape == (12,)
-    assert unscaled_reward.shape == (12,)
+    assert rsig_result.scaled_reward.shape == (NUM_AGENTS,)
+    assert rsig_result.unscaled_reward.shape == (NUM_AGENTS,)
+
+
+def reward_signal_update(env, policy, reward_signal_name):
+    buffer = mb.simulate_rollout(env, policy, BUFFER_INIT_SAMPLES)
+    out = policy.reward_signals[reward_signal_name].update(buffer.update_buffer, 2)
+    assert type(out) is dict
-def test_curiosity_dc_evaluate(mock_env, dummy_config, curiosity_dummy_config):
+def test_gail_cc(mock_env, dummy_config, gail_dummy_config):
-        mock_env, dummy_config, curiosity_dummy_config, False, True, False
+        mock_env, dummy_config, gail_dummy_config, False, False, False
-    brain_infos = env.reset()
-    brain_info = brain_infos[env.brain_names[0]]
-    next_brain_info = env.step()[env.brain_names[0]]
-    scaled_reward, unscaled_reward = policy.reward_signals["curiosity"].evaluate(
-        brain_info, next_brain_info
+    reward_signal_eval(env, policy, "gail")
+    reward_signal_update(env, policy, "gail")
+
+
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_gail_dc(mock_env, dummy_config, gail_dummy_config):
+    env, policy = create_ppo_policy_mock(
+        mock_env, dummy_config, gail_dummy_config, False, True, False
-    assert scaled_reward.shape == (12,)
-    assert unscaled_reward.shape == (12,)
+    reward_signal_eval(env, policy, "gail")
+    reward_signal_update(env, policy, "gail")
+
+
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_gail_visual(mock_env, dummy_config, gail_dummy_config):
+    gail_dummy_config["gail"]["demo_path"] = (
+        os.path.dirname(os.path.abspath(__file__)) + "/testdcvis.demo"
+    )
+    env, policy = create_ppo_policy_mock(
+        mock_env, dummy_config, gail_dummy_config, False, True, True
+    )
+    reward_signal_eval(env, policy, "gail")
+    reward_signal_update(env, policy, "gail")
+
+
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_gail_rnn(mock_env, dummy_config, gail_dummy_config):
+    env, policy = create_ppo_policy_mock(
+        mock_env, dummy_config, gail_dummy_config, True, False, False
+    )
+    reward_signal_eval(env, policy, "gail")
+    reward_signal_update(env, policy, "gail")
+
+
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_curiosity_cc(mock_env, dummy_config, curiosity_dummy_config):
+    env, policy = create_ppo_policy_mock(
+        mock_env, dummy_config, curiosity_dummy_config, False, False, False
+    )
+    reward_signal_eval(env, policy, "curiosity")
+    reward_signal_update(env, policy, "curiosity")
+
+
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_curiosity_dc(mock_env, dummy_config, curiosity_dummy_config):
+    env, policy = create_ppo_policy_mock(
+        mock_env, dummy_config, curiosity_dummy_config, False, True, False
+    )
+    reward_signal_eval(env, policy, "curiosity")
+    reward_signal_update(env, policy, "curiosity")
-def test_curiosity_visual_evaluate(mock_env, dummy_config, curiosity_dummy_config):
+def test_curiosity_visual(mock_env, dummy_config, curiosity_dummy_config):
-    brain_infos = env.reset()
-    brain_info = brain_infos[env.brain_names[0]]
-    next_brain_info = env.step()[env.brain_names[0]]
-    scaled_reward, unscaled_reward = policy.reward_signals["curiosity"].evaluate(
-        brain_info, next_brain_info
-    )
-    assert scaled_reward.shape == (12,)
-    assert unscaled_reward.shape == (12,)
+    reward_signal_eval(env, policy, "curiosity")
+    reward_signal_update(env, policy, "curiosity")
-def test_curiosity_rnn_evaluate(mock_env, dummy_config, curiosity_dummy_config):
+def test_curiosity_rnn(mock_env, dummy_config, curiosity_dummy_config):
-    brain_infos = env.reset()
-    brain_info = brain_infos[env.brain_names[0]]
-    next_brain_info = env.step()[env.brain_names[0]]
-    scaled_reward, unscaled_reward = policy.reward_signals["curiosity"].evaluate(
-        brain_info, next_brain_info
+    reward_signal_eval(env, policy, "curiosity")
+    reward_signal_update(env, policy, "curiosity")
+
+
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_extrinsic(mock_env, dummy_config, curiosity_dummy_config):
+    env, policy = create_ppo_policy_mock(
+        mock_env, dummy_config, curiosity_dummy_config, False, False, False
-    assert scaled_reward.shape == (12,)
-    assert unscaled_reward.shape == (12,)
+    reward_signal_eval(env, policy, "extrinsic")
+    reward_signal_update(env, policy, "extrinsic")


 if __name__ == "__main__":
--- a/docs/Training-BehavioralCloning.md
+++ b/docs/Training-BehavioralCloning.md
+# Training with Behavioral Cloning
+
+There are a variety of possible imitation learning algorithms which can 
+be used, the simplest one of them is Behavioral Cloning. It works by collecting 
+demonstrations from a teacher, and then simply uses them to directly learn a 
+policy, in the same way the supervised learning for image classification 
+or other traditional Machine Learning tasks work.
+
+## Offline Training
+
+With offline behavioral cloning, we can use demonstrations (`.demo` files) 
+generated using the `Demonstration Recorder` as the dataset used to train a behavior.
+
+1. Choose an agent you would like to learn to imitate some set of demonstrations. 
+2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)). 
+   For illustrative purposes we will refer to this file as `AgentRecording.demo`. 
+3. Build the scene, assigning the agent a Learning Brain, and set the Brain to 
+   Control in the Broadcast Hub. For more information on Brains, see 
+   [here](Learning-Environment-Design-Brains.md).
+4. Open the `config/offline_bc_config.yaml` file. 
+5. Modify the `demo_path` parameter in the file to reference the path to the 
+   demonstration file recorded in step 2. In our case this is: 
+   `./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
+6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml` 
+   as the config parameter, and include the `--run-id` and `--train` as usual. 
+   Provide your environment as the `--env` parameter if it has been compiled 
+   as standalone, or omit to train in the editor.
+7. (Optional) Observe training performance using TensorBoard.
+
+This will use the demonstration file to train a neural network driven agent 
+to directly imitate the actions provided in the demonstration. The environment 
+will launch and be used for evaluating the agent's performance during training.
+
+## Online Training
+
+It is also possible to provide demonstrations in realtime during training, 
+without pre-recording a demonstration file. The steps to do this are as follows:
+
+1. First create two Brains, one which will be the "Teacher," and the other which
+   will be the "Student." We will assume that the names of the Brain
+   Assets are "Teacher" and "Student" respectively.
+2. The "Teacher" Brain must be a **Player Brain**. You must properly 
+   configure the inputs to map to the corresponding actions.
+3. The "Student" Brain must be a **Learning Brain**.
+4. The Brain Parameters of both the "Teacher" and "Student" Brains must be 
+   compatible with the agent.
+5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub` 
+   and check the `Control` checkbox on the "Student" Brain. 
+6. Link the Brains to the desired Agents (one Agent as the teacher and at least
+   one Agent as a student).
+7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
+   the `trainer` parameter of this entry to `online_bc`, and the
+   `brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
+   Additionally, set `batches_per_epoch`, which controls how much training to do
+   each moment. Increase the `max_steps` option if you'd like to keep training
+   the Agents for a longer period of time.
+8. Launch the training process with `mlagents-learn config/online_bc_config.yaml
+   --train --slow`, and press the :arrow_forward: button in Unity when the
+   message _"Start training by pressing the Play button in the Unity Editor"_ is
+   displayed on the screen
+9. From the Unity window, control the Agent with the Teacher Brain by providing
+   "teacher demonstrations" of the behavior you would like to see.
+10. Watch as the Agent(s) with the student Brain attached begin to behave
+   similarly to the demonstrations.
+11. Once the Student Agents are exhibiting the desired behavior, end the training
+   process with `CTL+C` from the command line.
+12. Move the resulting `*.nn` file into the `TFModels` subdirectory of the
+    Assets folder (or a subdirectory within Assets of your choosing) , and use
+    with `Learning` Brain.
+
+**BC Teacher Helper**
+
+We provide a convenience utility, `BC Teacher Helper` component that you can add
+to the Teacher Agent.
+
+<p align="center">
+  <img src="images/bc_teacher_helper.png"
+       alt="BC Teacher Helper"
+       width="375" border="10" />
+</p>
+
+This utility enables you to use keyboard shortcuts to do the following:
+
+1. To start and stop recording experiences. This is useful in case you'd like to
+   interact with the game _but not have the agents learn from these
+   interactions_. The default command to toggle this is to press `R` on the
+   keyboard.
+
+2. Reset the training buffer. This enables you to instruct the agents to forget
+   their buffer of recent experiences. This is useful if you'd like to get them
+   to quickly learn a new behavior. The default command to reset the buffer is
+   to press `C` on the keyboard.
--- a/docs/images/mlagents-ImitationAndRL.png
+++ b/docs/images/mlagents-ImitationAndRL.png
--- a/ml-agents/mlagents/trainers/tests/test_bcmodule.py
+++ b/ml-agents/mlagents/trainers/tests/test_bcmodule.py
+import unittest.mock as mock
+import pytest
+import mlagents.trainers.tests.mock_brain as mb
+
+import numpy as np
+import yaml
+import os
+
+from mlagents.trainers.ppo.policy import PPOPolicy
+
+
+@pytest.fixture
+def dummy_config():
+    return yaml.safe_load(
+        """
+        trainer: ppo
+        batch_size: 32
+        beta: 5.0e-3
+        buffer_size: 512
+        epsilon: 0.2
+        hidden_units: 128
+        lambd: 0.95
+        learning_rate: 3.0e-4
+        max_steps: 5.0e4
+        normalize: true
+        num_epoch: 5
+        num_layers: 2
+        time_horizon: 64
+        sequence_length: 64
+        summary_freq: 1000
+        use_recurrent: false
+        memory_size: 8
+        pretraining:
+          demo_path: ./demos/ExpertPyramid.demo
+          strength: 1.0
+          steps: 10000000
+        reward_signals:
+          extrinsic:
+            strength: 1.0
+            gamma: 0.99
+        """
+    )
+
+
+def create_mock_3dball_brain():
+    mock_brain = mb.create_mock_brainparams(
+        vector_action_space_type="continuous",
+        vector_action_space_size=[2],
+        vector_observation_space_size=8,
+    )
+    return mock_brain
+
+
+def create_mock_banana_brain():
+    mock_brain = mb.create_mock_brainparams(
+        number_visual_observations=1,
+        vector_action_space_type="discrete",
+        vector_action_space_size=[3, 3, 3, 2],
+        vector_observation_space_size=0,
+    )
+    return mock_brain
+
+
+def create_ppo_policy_with_bc_mock(
+    mock_env, mock_brain, dummy_config, use_rnn, demo_file
+):
+    mock_braininfo = mb.create_mock_braininfo(num_agents=12, num_vector_observations=8)
+    mb.setup_mock_unityenvironment(mock_env, mock_brain, mock_braininfo)
+    env = mock_env()
+
+    trainer_parameters = dummy_config
+    model_path = env.brain_names[0]
+    trainer_parameters["model_path"] = model_path
+    trainer_parameters["keep_checkpoints"] = 3
+    trainer_parameters["use_recurrent"] = use_rnn
+    trainer_parameters["pretraining"]["demo_path"] = (
+        os.path.dirname(os.path.abspath(__file__)) + "/" + demo_file
+    )
+    policy = PPOPolicy(0, mock_brain, trainer_parameters, False, False)
+    return env, policy
+
+
+# Test default values
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_bcmodule_defaults(mock_env, dummy_config):
+    # See if default values match
+    mock_brain = create_mock_3dball_brain()
+    env, policy = create_ppo_policy_with_bc_mock(
+        mock_env, mock_brain, dummy_config, False, "test.demo"
+    )
+    assert policy.bc_module.num_epoch == dummy_config["num_epoch"]
+    assert policy.bc_module.batch_size == dummy_config["batch_size"]
+    env.close()
+    # Assign strange values and see if it overrides properly
+    dummy_config["pretraining"]["num_epoch"] = 100
+    dummy_config["pretraining"]["batch_size"] = 10000
+    env, policy = create_ppo_policy_with_bc_mock(
+        mock_env, mock_brain, dummy_config, False, "test.demo"
+    )
+    assert policy.bc_module.num_epoch == 100
+    assert policy.bc_module.batch_size == 10000
+    env.close()
+
+
+# Test with continuous control env and vector actions
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_bcmodule_update(mock_env, dummy_config):
+    mock_brain = create_mock_3dball_brain()
+    env, policy = create_ppo_policy_with_bc_mock(
+        mock_env, mock_brain, dummy_config, False, "test.demo"
+    )
+    stats = policy.bc_module.update()
+    for _, item in stats.items():
+        assert isinstance(item, np.float32)
+    env.close()
+
+
+# Test with RNN
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_bcmodule_rnn_update(mock_env, dummy_config):
+    mock_brain = create_mock_3dball_brain()
+    env, policy = create_ppo_policy_with_bc_mock(
+        mock_env, mock_brain, dummy_config, True, "test.demo"
+    )
+    stats = policy.bc_module.update()
+    for _, item in stats.items():
+        assert isinstance(item, np.float32)
+    env.close()
+
+
+# Test with discrete control and visual observations
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_bcmodule_dc_visual_update(mock_env, dummy_config):
+    mock_brain = create_mock_banana_brain()
+    env, policy = create_ppo_policy_with_bc_mock(
+        mock_env, mock_brain, dummy_config, False, "testdcvis.demo"
+    )
+    stats = policy.bc_module.update()
+    for _, item in stats.items():
+        assert isinstance(item, np.float32)
+    env.close()
+
+
+# Test with discrete control, visual observations and RNN
+@mock.patch("mlagents.envs.UnityEnvironment")
+def test_bcmodule_rnn_dc_update(mock_env, dummy_config):
+    mock_brain = create_mock_banana_brain()
+    env, policy = create_ppo_policy_with_bc_mock(
+        mock_env, mock_brain, dummy_config, True, "testdcvis.demo"
+    )
+    stats = policy.bc_module.update()
+    for _, item in stats.items():
+        assert isinstance(item, np.float32)
+    env.close()
+
+
+if __name__ == "__main__":
+    pytest.main()
--- a/ml-agents/mlagents/trainers/tests/testdcvis.demo
+++ b/ml-agents/mlagents/trainers/tests/testdcvis.demo
--- a/demos/Expert3DBall.demo
+++ b/demos/Expert3DBall.demo
+BallDemo� -bfB**0:3DBallBrain7
+ f&C�v���x��?�@�Q��"P���������<
+ ��<����x��?�;|@�Q���"{�"n��>@��==���=P���������<
+ �=���x��?0r@�Q���"��":MG?��J?=���=P���������<
+ ���=e]	=x��?�a@�Q��Z<�"�hw�{�>=���=P���������<
+ ��m={׾;x��?�BK@�Q���"{�"�ѫ��٠�=���=P���������<
+ q�<�H=x��?|a.@�Q������"3����K�>=���=P���������<
+ ��_�ޚJ=x��?�8@�Q��Z��"������E>=���=P���������<
+ �l=qR=x��?���?�Q��r���"�R?6�o<=���=P���������<
+ �')=F-�x��?0FH?�Q���"��"o�=��=���=P���������<
+ ���=A��袧?��@?�:����J���{Ѿ"���>�Ҵ�=���=P���������<
+ �'H=����(��?-5?0k�Q�(�����t^޾"�dr���N>=���=P���������<
+ ���;8��9��?�T?P���,-�z�H�Q���"�%���t>=���=P���������<
+ ގn���Y=��? )?�j���!����5�˾"��5�l?=���=P���������<
+ ��V��9�<�Շ?0�?h�$�;������햾"܂=@	��=���=P���������<
+ �.����<P[�?��?+�~/ӾA=?�}�"�A�=6���=���=P���������<
+ ��B�%�1=�{?�"?p /�xϿ�3l�<.J�"��>p-c>=���=P���������<
+ ��U=4(=�Wp?�3A?x�0�1NϾ;������"s;<?�;
+�=���=P���������<
+ .�<��|<P�e?�Z1?M4�1NϾ�w�����"�~�ÿ��=���=P���������<
+ �/��>F= �Z?P�+?��6�����5���k�"?+���>=���=P���������<
+ ψ);��=��N?p�5?�j5��w��k޻="�3"=dO�>=���=P���������<
+ t�v<}��=�C?�9?�2�,����Z:>"��>�}U�=���=P���������<
+ }nE<Z��<P7?`=)?�=/�,������Z:>"I�����=���=P���������<
+ z�ڼ� �<�+?Pa?�+�����J���=>"PA㾻�<=���=P���������<
+ ^�0=˙X=��?�s8?(."�J��
+���9�>"��M?�*�>=���=P���������<
+ ��S<M�<>?��'?Pk�J羙,��9�>"���?��=���=P���������<
+ 1g��N�	=?�_"?��k���g��NJ�>"7
Z�(a�==���=P���������<
+ �ݡ�q�&� ��>`&?h(
+�k����W��NJ�>"ne+���Y�=���=P���������<
+ ^2����%�� �> �?x�(���*h��Q�>"R1�xµ>=���=P���������<
+ �-�@T%;�Y�>��?@9��	���[8���>"#�c?�q>=���=P���������<
+ :<�絼���>n?�z��ٱ��ݘ��j�>"�&�=�\��=���=P���������<
+ �����ʻ��>@�?�ݾ�/��2��;[Zr>"�X;���<>=���=P���������<
+ r�Fa\�`��>
?pGѾ@���`���U>"�g��'��=���=P���������<
+ �*������>��?pA˾�
+m�q�K<]��="0u=HwS>=���=P���������<
+ ZE&=?�� Ǔ>`�?�_Ⱦ���o���U="�?��S==���=P���������<
+ ��=�C	=��>�:%?��ľ`���������="�����=?=���=P���������<
+ ,C<"�<�
+e>��?0V���ª��qp�SL�="k+���=���=P���������<
+ qeӻ��;�xB>?����e��7�M��B�="�ZY��)M�=���=P���������<
+ ڿF���,=�$> ? ����G��Vo<.y">"����&P�>=���=P���������<
+ ���.>-=�T
+>P ?0���y���!���]>"+��>����=���=P���������<
+ ]���<J<d�=��?���жd��TS��du>"������=���=P���������<
+ ��<�u����=��?0���%�k�u1���>>"�k?��=���=P���������<
+ (i<���;�=0s?1���4�����j��="���� `l?=���=P���������<
+ �.�$x�&=�w?����D���F��%�="|�����=���=P���������<
+ �"�����:�7<`?M��au�OH<v��="���h<�>=���=P���������<
+ G-��K6�LJ���?p���\l��@�_T="^�7>���=���=P���������<
+ ږ��<
+;y��[?pҀ��{I���O<K�V�"� ���[	?=���=P���������<
+ m��<^ˑ<�T��x? �����L�'�*�P<"��'?��2>=���=P���������<
+ ��1�:�z�j��#?ీ���.��u��Q�"LqJ�(xf�=���=P���������<
+ �w�����;���<?�����������;�ף�"�2|>��G?=���=P���������<
+ ��%����;z��@j?�݈�k=ϽV{K����"v�h>���=���=P���������<
+ ���;[��;�wӽ�? a��Ea׽�:@�3��"��=�;A<=���=P���������<
+ ��I��'��P`?P���0瞽�d�;	ɇ�"����;��=���=P���������<
+ �2�;l>y<�G��0}?�䒾�΀���^:Wxe�"�@?�zd>=���=P���������<
+ ����P:�+����?�e���&���(;]�@�"׀޾/2&�=���=P���������<
+ k3�9yk�=���$?�(���B��Ah�	�;"t�>�x9?=���=P���������<
+ c<�����0??��������ܾ9�9<"�a�=PC�=���=P���������<
+ n/,��������?�Y��s��:l�����;"x��K�<=���=P���������<
+ �F<���5��?�;��ю;��~�����"
+?�3��=���=P���������<
+ ��e��E<=@���^!?P~��^a�;�N��WO<"����]H?=���=P���������<
+ ^U3=d�;��p�?`���?��������<"#C)?�;��=���=P���������<
+ �B�75�<�����?@돾�T��`	�o�[="o�n��K?>=���=P���������<
+ >�żc"#��
+���?�k��nn<����="�>++8�=���=P���������<
+ >��&�<�����?�*�����<-�K�_(<"B�B>z�2?=���=P���������<
+ �)P<�8[��T��?@��z<�<�ɾ
E��"�jm>d[�=���=P���������<
+ ���h�	=!�P9?@ҏ�ރ�;)~Z;{�K�"B�*��?=���=P���������<
+ ��E�%6�;@����?0t���==>Ն���Լ"i������=���=P���������<
+ �=���<�0��K?�B��Y=\�m�W��"�?��3>=���=P���������<
+ {���\nf=&�� $?������<
+u
+��v="\�B����>=���=P���������<
+ �79�A��L����?�⋾�]v=���n�p="�LU���Y�=���=P���������<
+ �I4�Z����bڽ�d?+��ԡ�=g�����<"���>;Fe�=���=P���������<
+ D�0= g�:�ŽI?̉��K�=�]�;��;"��?��>=���=P���������<
+ �V	=G�ﺀU��@2?�v����=�U��Q�;"km佌M*�=���=P���������<
+ ��E�RǾ<�����J? T���JZ=�ͻ���<"�n��ϙ>=���=P���������<
+ !x�<ST=��`�?`��jx�=g%�L�="�H<?�7�==���=P���������<
+ BJd�*Z��ғ�0�?p�����=Sb�sU:="�jT��6=�=���=P���������<
+ (����^=�a�@�!?���&>�/Z�X��="���>�?=���=P���������<
+ ����[ɻ���? gx��{8>�����Ț="+�����1�=���=P���������<
+ D^�={��<���&?@�o�A
+>�a<�,�="�?{�>=���=P���������<
+ 3��<=�Y��;� �?��e����=�8���="w��l�پ=���=P���������<
+ 撧�v�� =���?`�_�'��=3d)���<"��L@�=���=P���������<
+ �����_�<�?��f�̅�=e����h��"G�>@x�>=���=P���������<
+ /
d=�zN=�k<�!?�l��I�=�b<��q�"��<?�?=���=P���������<
+ C�V=a�1=�<�� ?�
o���:�e=���:"��ơ��=���=P���������<
+ ��+=�.��w<��? n�:{;���r��<"c��m�=���=P���������<
+ 9Oj�{;�;�	?`l�?���`=�7��<">B����==���=P���������<
+ �x<9�%��@���{?@�j�J��x�����<"K�?<n�n�=���=P���������<
+ @��;�*����N?��i�o��������ǻ"8�K=�ⴾ=���=P���������<
+ rlܼ��ǼЎ��<?`�m�����u�:�9V�"�鹾�_�==���=P���������<
+ qHt=_�0=n߼�6?*q���ν���!i��"�?��C?=���=P���������<
+ ft�;��b���`1?`t����?~��}��"(n!���%�=���=P���������<
+ +����<�I�0�?�v���ڽ�d�;�p�"�r��*k?=���=P���������<
+ �*�:o�P3?��v�����x�������"�9�!���=���=P���������<
+  n�;�a̼O����?@xw�Er�7�%�U���"BU�>�x��=���=P���������<
+ {^�<�lA����PN?��{��Ӷ��zֻ{1L�"�Y�>Ȥ>=���=P���������<
+ �)��Ƙ<�G���^?�-��6������;�N'�"8>7��8�>=���=P���������<
+ �p��a�������?`��Ui�����&��"M��{ы�=���=P���������<
+ 	�^��ئ����`�?A��	%�<J�9�=�\�"_?�e6�=���=P���������<
+ H�!;de�=�ң��/#? ���9�=��:R>@�"1K>�?=���=P���������<
+ �9�9�
+=���P�?�샾[=ŜQ���="������þ=���=P���������<
+ ��<a�<������?@���	��< ýz��="��N>�Q�=���=P���������<
+ �α��TW������? �x��<�{��~��="y쾳z¾=���=P���������<
+ gF#<"�;�����?tq�M�
=m�8�I�="�>�>��D>=���=P���������<
+ nI�:r���g���?�k����<�[����<"�VԽr�B�=���=P���������<
+ G��<@`�����`z?@�n��{;�X9rm��"�~�>|*1?=���=P���������<
+ d]���=���ׂ���?�ar���R;�}�JH�""�����x�=���=P���������<
+ �؇�Qϥ<�{���?��w�R��<6�t:��=�"�(;��?=���=P���������<
+ �U�9z~�<nm�@+?��z��=�4¼����"o�G>eU�=���=P���������<
+ �-��K
+;2[���?�{�>�^=��_���"Z;��f�C�=���=P���������<
+ I�;
+�:=8?���?��y����=#�2�s�="rɓ>�	�>=���=P���������<
+ ��
+<0W&�,$�j?��v��K=N���>=_<"Njn=�A}�=���=P���������<
+ h�̻���<
�0R?��w�(/k=�kչ���"y|!���Q?=���=P���������<
+ �1=���.��P�?��x���=^���}�"�6�>{�,�=���=P���������<
+ (��<�=4���?�z�����Ln��"V"����:?=���=P���������<
+ �'�<�����
�`?� {�e�A��8���<\�"��Լ�*�=���=P���������<
+ xr��"=��@� ?�%{��B���@,;()�"����7?=���=P���������<
+ c��=Z�<���?�y�Y�U<)2.�k+�<"j��>o��=���=P���������<
+ ��=rx<0��Q?�t�+�y���d="�?{�(�=���=P���������<
+ �����ig:'�?��n���λK�Ǽ�#�="���b�!�=���=P���������<
+ �|M=%`l;`� �?`�g���%�h���0�="B?{8�<=���=P���������<
+ �b������4�`�?`�a��_b��8���O:="�,�S翾=���=P���������<
+ >��<��0<ZM��F?@_^�����n�M���="�>J��>=���=P���������<
+ �y��@�n��_�`�? [�䙴�T\l;aR�<"�n��3��=���=P���������<
+ ®Y���<XZ��?`X�n��<����="�?���>=���=P���������<
+ ����m��<�K���? 
S��6=3�p�Wp="V!�� ��=���=P���������<
+ �;H<�֤�:9���?�AM���*=׭��lM="<�>þվ=���=P���������<
+ ӎ;<P�;�.� �?`�I����<���9t="v%ջC�>=���=P���������<
+ c�4<�¦�)���?��F�Z�;������<"�Q������=���=P���������<
+ aL�<�f���.�`!? �E���м�������;"țT>�i>=���=P���������<
+ ��:�&�<t?��9?`
+D�.���W�m�<"�����^O>=���=P���������<
+ �X��n��I��+?@vC�)��:^ʽ����"�� �p?[�=���=P���������<
+ Jo�����6� >?`K��c�=�>�B�Ž"�vm>~t>=���=P���������<
+ :1�<Nӟ<�� L?��U����=]��;}�ν"B�?�@$?=���=P���������<
+ �f��5A <����
+?`�^����=NV��y��"�nʾD�ٽ=���=P���������<
+ �-6=�i�<�ɼ��? �e��L6=���;KPZ�"��1?�rA>=���=P���������<
+ KŻN��,���s?�xk���=U�g��@��""H�p,�=���=P���������<
+ -�=]�b=,��`� ?`tp��d<��;��E�"O��>�?=���=P���������<
+ �dd=J-�6��p�?��u����A4����s�"��>��=���=P���������<
+ �
+@=�9"<r��p�?@�~��d�!!+�~��"@7ǽ���>=���=P���������<
+ ����`&�E6���?����	��tü�V��"��I���b�=���=P���������<
+ U�����<�d�`/? s����ٽ&ɨ;!��"_�M=ҁV>=���=P���������<
+ �
(�W��<H���
?`��c|���<[���"����v�S>=���=P���������<
+ �����<7���?����[�e�ڻ�����"��w=0�߽=���=P���������<
+ C4�V-��Y���?`�����|;Aǒ���Y�"5��>�8�=���=P���������<
+ �:��ϼüц��+?P����\<�@����۽"�1�=6��==���=P���������<
+ ⱀ��bO=Yk�``$?�$���v�=�{�Ґ��"Z:5���\?=���=P���������<
+ ֌�NK�<�L�`?�ǜ�*ղ=1����h�"g�?��þ=���=P���������<
+ ?�=|��;�+��
+? 垾)I�=ṽ���"s�>E$�=���=P���������<
+ �!L<��
+�����?�2���g0=!�V��n�"Yg�cM޾=���=P���������<
+ 䡰�RS��M�В?����+O=�"p�b�ý"k�ƾ&~�>=���=P���������<
+ �l�<�*�<Fݼ�e?@ ���<=���;pؙ�"Ӹ�>f��>=���=P���������<
+ �P$<>��<T����?���	��<#�����B�"�M�����=���=P���������<
+ �����	<$����?�䮾��d=B������"0
�.�۽=���=P���������<
+ ��H=��7=x�"?h��6Q=��;��k<"�?���>=���=P���������<
+ b���ɼ�:�P-?�����. =/���T��;"2�(�@K�=���=P���������<
+ �P=
�K=|H���"?�q��ϫY:����1�<"<.?�RY?=���=P���������<
+ �v=P=(s���?`����9�����="nG/��OQ�=���=P���������<
+ ��<2rx����0s?0���zA��mu��aڊ="C6a�ԋ	�=���=P���������<
+ sL;r{�����p�?�x��W6��Dƨ�S�<"�2�Oo5�=���=P���������<
+ ;ļ^�R=����"?`ȡ��+��B�ĺ�(D="�����Sq?=���=P���������<
+ ��$��<�)��?�)������*�8��="�0�Zϝ�=���=P���������<
+ �O\;��q��4�@�?P���O7���Ɇ��I�="f�>ʎ��=���=P���������<
+ t.H���9V;��?@ȕ���P�Ha�9�I�="P�5���1>=���=P���������<
+ -�����<=�pH? ���$����L��!�=".��={�M>=���=P���������<
+ [Z9=ok��@�J?�u���]��S̾�EP="0?��=���=P���������<
+ �b�<�|R��e��\?pq��qֽ�Y�����"B膾�P>=���=P���������<
+ #f*��L<�����^?�����ݽ�U�;n�ν"�澾z�;?=���=P���������<
+ �>��hE��ћ��N?&���Y����L��ֽ"��Ծ����=���=P���������<
+ M��<��8=�����O ?A���k\�Aǐ;vܖ�"7�F?Wr'?=���=P���������<
+ �`���t;�&��`?{�����/�35a�"�T��]��=���=P���������<
+ �
+b=}o��O���? H���g��6X��W3d�"�?���=���=P���������<
+ r��)��<��̽�q?����N���b�;��"C�a���>=���=P���������<
+ ���6=|;��۽��?�1��$j��
+	�G���"^�����=���=P���������<
+ b�;�<�,���?�Ө��#�~pc�+��"&��>��O==���=P���������<
+ E�w��� �����p�?�:��.���Jx�k6�"�����,U�=���=P���������<
+ ��λiF=��Z"?����y��p��`�<"�K�=��*?=���=P���������<
+ �m'����<�����{ ?p���y
+�<�-�����="��˾�U��=���=P���������<
+ K�
+<�<�;@��@'?&]�<Զh�0�="�o?*�U�=���=P���������<
+ 	~���ʢ�
��`�?p���&M=��幫e�="tH����ؽ=���=P���������<
+ ��^<�w5<�,۽��? ����;
=lzj�i��="\��>:>=���=P���������<
+ i6h<U�=�@׽�?���/�z<�<�����="<+܂>=���=P���������<
+ }���N|��sӽP�?p�����<��V�{1�="�{ξ��#�=���=P���������<
+ �l�<8o��`ͽ�?0���t(�<�zV��O="���>ԕӾ=���=P���������<
+ ��=Hm,<��ѽ@7?Pv����������1�;"Ôa>d�I?=���=P���������<
+ "U��N�<�ؽ@�?P�������ٹ��<"\��x>=���=P���������<
+ ���<�0"<�޽`>?0	��������#D="��>R}�=���=P���������<
+ �G�:L�:��D��0?������J��ό9="ЏG���w�=���=P���������<
+ B�����x<��ｐ�?�!��^�vѸ�CM="Ȼ��a�>=���=P���������<
+ T�� >E������?`���·�����\M�<"� >�J:�=���=P���������<
+ Gk
+�J]�����P�?�,��O��<�Zf�(�I�"^i��+ґ>=���=P���������<
+ b_=��\;����x?�#���	�;�q:��q�"�iA?�'�>=���=P���������<
+ ��:?yV=�"��� ? ؅��},��C�:��U�"Ӆ���?=���=P���������<
+ �����@<X��p�?ළ�;H�:�;8��e<"�/i�����=���=P���������<
+ ~�;�f;,����?������3<�%ƽ���<"��>4ɽ=���=P���������<
+ .�ռ
+��z���/?`׃�``�<Av��oK��"n��6�ݾ=���=P���������<
+ R�;B�==m���|?`Y��2)=��I�f�d<"��>�Ml?=���=P���������<
+ �
S���;��ܽ�&?����P�D=�8��H="0�p��*��=���=P���������<
+ ���<�e开�ҽ��?𘀾�,=j�H�&��<"{j�>|���=���=P���������<
+ >s����/<�Ƚ�V?p����V=�M��:$9"�F
+���>=���=P���������<
+ �$�<P���.���~?瀾0,B=��b���(�"�S?J���=���=P���������<
+ |1r<
+{=�Q���b!? �~��p<|2��v�<"62t��q?=���=P���������<
+ �D9�q�;���@L? �{���@<xY���.="�C@�F(,�=���=P���������<
+ �_R�ND<�����?�Nv��I�<8���~mc="f�����==���=P���������<
+ Z;$"���.��@t?.q�v�<	�I�ګ!="��)>D��=���=P���������<
+ �b|���<����@�?`�m��S=�,s���$="o�D�pu2?=���=P���������<
+ ��%�dU=Ś�@" ?��e����=n�T�ゲ="������==���=P���������<
+ D��<��������E?�X]�RV�=rݧ�՝�="�-?�J�=���=P���������<
+ S�Q��+	��|�0�?`[�{#~=�¡�֑;"࿾քw�=���=P���������<
+ x)�<�����e��1?�,]��V1=K�^�"|V�>�Ta==���=P���������<
+ ��&= �{��a��I?c�[�����'����"_�>�>=���=P���������<
+ ���;��R<no���?��i�CE���:s��"��ž���>=���=P���������<
+ �Y�;��P=W�� 
 ?`l�n.��{:$M�"�n���>=���=P���������<
+ �E��O��<�ׇ��z?��j���޼��ҽ�?�<"HH��iġ�=���=P���������<
+ |��:��W��ь���?��h�{��Ed��2��"��>L_�=���=P���������<
+ �ޝ<�{������?`7p��-�����T��"�W>��+>=���=P���������<
+ pà��G=U���� ?`�w��P��"
+<����"�&ھ�?=���=P���������<
+ �����1=����`_ ?`1{�H�f ��hU�"8�3=�̓�=���=P���������<
+ >@��6$�=��p�)? �o�r~�<筼��="�m��|0?=���=P���������<
+ h=�U�<E��`�?`�e�gC�<UC���="*`c?��=���=P���������<
+ �`���s�������?`cY��Gi�B�r;���="��W�7ɋ�=���=P���������<
+ c�׼��=�����?@K�E�мQ���d>"�Dؽ~�?=���=P���������<
+ ��ȼ%������o?`b<�j�ź�d��L
+>"vD<��K�=���=P���������<
+ ���<=���|���?��1����
���^	�="��"?��>=���=P���������<
+ i�ɻ��?��0����?O*�?��N��:��="n�׾���==���=P���������<
+ r�_�7�n����?,%����-�1�\�="�o��
��=���=P���������<
+ ?_3;���;����?��#�N�J�
���T<"��F>+K�>=���=P���������<
+ B��K������?@�$���]=f'-�D�޼"0�\�B��=���=P���������<
+ �m^�ʙ<t����?�Q)����=����A�"3`-?K+?=���=P���������<
+ .��<w;�܆��T?@�-�B%�=�B��y �"�>�㐽=���=P���������<
+ �f�D��<�`���?7/��J�=!>R�?97�"�$v���>=���=P���������<
+ e��<�6�=�)� � ?�5+��Q>�����=" �s?���>=���=P���������<
+ T�)=���v��@z?�g$����=�������="C�+>��=���=P���������<
+ dŁ;�\��ü0:? �*	O=$ �:��="
+ܾV��;=���=P���������<
+ a\;~�<@��0?�_��A>="ƀ�|��<"�ȕ�i��>=���=P���������<
+ X>ռ��<�Y�`�?���qM�=G»��T="�;����6>=���=P���������<
+ �=[=�ɼ����?������<�	D�x1"="��h?��
�=���=P���������<
+ h�"�ࡆ�����?@��U�;yw|9#�!<"�.:�XK�==���=P���������<
+ ,s=Z��<��0S?���RE��Iz�А�<"�v?��>=���=P���������<
+ X��^�]<T>���?�3
��_�&Ϊ��n'="s�ؾ[���=���=P���������<
+ �2�*ڨ;v��?@0�z9���ꦼ)TX="�k���ý=���=P���������<
+ j��<���ʘ���?��GC����V��"���>��V�=���=P���������<
+ 3����I��v̼�?`
+��Ȃ��b��BP_�"d$���E9?=���=P���������<
+ ;|Ӽ���H����?@��g+�t�����"	ݍ��+O�=���=P���������<
+ ^h7��xV=|�0C?@T�*���[�;�)8�"X�.>�\?=���=P���������<
+ 9��@����`�?�2���|9ܯX�bg�"�ֽ��=���=P���������<
+ OE<��=��,?��!���<���;k�m�"|
+�>G?=���=P���������<
+ �̂<%1�<���M?��%���/�%HE��>ͼ"֠5=X�=���=P���������<
+ ®_<W��<���?`&�4��1�� <"�׼��"==���=P���������<
+ 9ߕ�)%���@�? 3&����<Pǻ���"��� ;�=���=P���������<
+ &�F�~�<��7?��(�Ï�=k��E���"H�4?vlH?=���=P���������<
+ 5�Z=���n��P:?`8+���-=�?��
+$�"��??��D�=���=P���������<
+ �;=�n<����@?��3��/�ҹ���ݽ"8K���ٰ�=���=P���������<
+ D�뻹���漀�?��B���p���2��*�"����==���=P���������<
+ S�X;���<v
+�@o?@�S��<j���<�Y%�"���=L�:?=���=P���������<
+ ��D<ߥ:��#�Ћ?��c���������"�z�=���=���=P���������<
+ 
ѻ�A�={A� "?�km�&B���2R<����"@�W�L>;?=���=P���������<
+ (R�<�ݧ�v_�=?� w�4���U޾s�̽"3��>��v�=���=P���������<
+ �F%���;=�}��F!?`R��(���[<o¤�"P�=��|B?=���=P���������<
+ E=k���p����?�僾�浽Ϡ��
+"�]?�	Y�=���=P���������<
+ 6V�grh<����?�쉾o���#D<����"���>=���=P���������<
+ �"d;+@=i��Pg?�ێ��j����;z���"g#?X[>=���=P���������<
+ ߻k2<����D?`d��q�]��6���z�"+���
+��=���=P���������<
+ C0��s�@<Z���H?��������;pf6�"6���2==���=P���������<
+ �T����;�½��?֖�Ǵ�;���?"�"�P^�.���=���=P���������<
+ s_�<������03?ט�F��eY��@�W�"�9D?�Rؾ=���=P���������<
+ Z���P:�ǽp=?0R���C����]:-寽"�
+����>=���=P���������<
+ ��D��#Y<�ý��?PJ�����<��"�$%��"�뷾X>=���=P���������<
+ �k=�h�;����}?�夾4�i<� �;��"�y?O��=���=P���������<
+ �Sq;f�=ý@ ? m��W���w�:q+Ҽ"pHþ���>=���=P���������<
+ �T��c3=����$?�2��B��<P�����="��"�kS==���=P���������<
+ c�߼�׎<����P_?����k�Y=�����g="�|�>�7��=���=P���������<
+ v=��f=&����!?�^�����<�03��*�="�?Sc�>=���=P���������<
+ ��K=,��<���@�?>��'[ּZq�c�>"�����;=���=P���������<
+ �4�;�ο��׼��V?���*p��?!&���>"�y��r��=���=P���������<
+ ���<���:�ν��?�P���Ľ���S>"��d>�p�>=���=P���������<
+ ��i�D%G������?�Q��m�½��ǽ�	�="^36�=���=P���������<
+ B���0����p�?@�~���d�U/�;���<"TPO�k7{==���=P���������<
+ ��=��������?�~�:s����4�����"��L?��w>=���=P���������<
+ <Ê��X�<����z?�7��A�ŽE!:\pF�"&����C	?=���=P���������<
+ ���J���`� �?�h��yڊ��C����b�"�˾bį�=���=P���������<
+ \��;������P;?�逾��k���8�����"A�>���=���=P���������<
+ ��ͼ��K;%���? ����
+��1
;s���"������x>=���=P���������<
+ �:���h��@��p0?P���3a��3"�,K�"ێ=�^��=���=P���������<
+ `lE����
+���?����
+@=��Ļ���"�k��q_>=���=P���������<
+ o����<��p�?@_����=�N��6��"�5�>Xr�>=���=P���������<
+ *�^=�~�<��
+�0?���>Pa=3˾�4b�"�b]?+B�==���=P���������<
+ �2�<�=��	�`]?����5>N:U�h@S<"O�޾��T==���=P���������<
+ �K�JA<��	�PE?�s���;�U��� ="���0�v�=���=P���������<
+ ��;�^.��	��5?0Ӌ���;T����P�<"�WX>'&�=���=P���������<
+ ��(� !��@	�@�?�t��!�;������"�B>�I�>=���=P���������<
+ �c�JS=@��0?"?�錾��<�{��C�P<"�i:��"?=���=P���������<
+ )P���wǼ���J?`\��5W=0�¾�5�;"�fx���^�=���=P���������<
+ ���M5<=����k ?�����=^F�Բ�<"�#0>J�N?=���=P���������<
+ м�<�x�;����8?�1��ͰK=�t��%�	="s٩>�D��=���=P���������<
+ ܕM��p=�6轀s?`��==�ʻ���="G?ھ(��>=���=P���������<
+ wT =҈N�����Р?pP���ã</��N�T="��?��=���=P���������<
+ t��<��л����!?�����cx�qwa�n
+��"��&�2�?=���=P���������<
+ �<�#�;�����?���y=�f;m�Ҽ"�r.��>=���=P���������<
+ �6�;K|<^��`e?@��������\��U&�"���>^��==���=P���������<
+ 	�Ҽ��P=�����"?焾1h�n0="ӈ���{�>=���=P���������<
+ �-��eD��L�K?������xI���6="��7>��8�=���=P���������<
+ �d@����<����� ? ���bC=$����m="B�׾�2�>=���=P���������<
+ �H;k��<���@?�
+x����=�H��F�=""�?�Q'�=���=P���������<
+ ʈ�<��ͼ���?`�n��8=�큾�="�>�j�=���=P���������<
+ �����=�ܽ�=?�f��;=�d��O�="��*�X�)?=���=P���������<
+ 1蛻��v���ѽ��?��\��m=�H���?�="�q�>�p
�=���=P���������<
+ !y�<�s<�Ƚ��? kT��t=/�湮��="��>���>=���=P���������<
+ \���,2��½pH?@�K����<iE��="DgѾ�ZǾ=���=P���������<
+ �L����<����z?`-C��I=�묻*;�="�;c=l
+?=���=P���������<
+ ��߼q.�;����?@�8�$t=���/�="�����h��=���=P���������<
+ 
��;_����(��=?�(/���=ߢH�h#z="���>۱K�=���=P���������<
+ ��r;������`�? C0��k=�����B��"Ɠ�����>=���=P���������<
+ ���<=K��g���?��5��=�.��"n�W>c�>=���=P���������<
+ 9(��%��<����d?�<�)�!=/Y�:_l�"(s쾾�?=���=P���������<
+ �X><iip=�|�P�?�R=��X =���9�Q�;"V�>���>=���=P���������<
+ �zH��.�<�l�@C?��:�/11=.����4="����>߾=���=P���������<
+ �2��i��ZS�`�?@=5�:1�=����)O="�Q����=���=P���������<
+ ���<_e=�1�~?`�.��ߜ=��Ǻb�="��?�s?=���=P���������<
+ �K���S����?�(�w�==�)�W�2="+����=���=P���������<
+ ��c<ںN�fۼ@�?�n)��ƪ=�n3�z�3�"vh�>�0�<=���=P���������<
+ ӹ�<���������?��0�Ij=���:���"`+�=L�?=���=P���������<
+ _��;d��=���P�"?@�1�s&=��;���"tL��?=���=P���������<
+ O��<k,=�X� ?�z/���<iE���F="_4=>�'��=���=P���������<
+ ����p1=0�P2?`%��q=U��>�="��+5L<=���=P���������<
+ L&��3��ϻ�?���е[=�}-�)>"��{>?�=���=P���������<
+ E�J=˭<�0'���?�W
�k�a<����տ�="�L/?�q;=���=P���������<
+ �C_�Ac��?��v?���\A��*9�LN="��9��?=���=P���������<
+ 1�<�M���X�?���h=���J�s�<"�ϲ>���=���=P���������<
+ l�W=L��;��p�?@��J̈́��滟�D<"���>���>=���=P���������<
+ �5<.�^�����?@����Ƚ��P�;�;"0��b�m�=���=P���������<
+ x8˻���;�ۼ�	? ���9ɽ���:��9"��0��h>=���=P���������<
+ P?��䭥�����?�[��㞽�֕�-F��"��*���=���=P���������<
+ 
+��<}3*�3��?@�M鵽����,
+�"��?�#�>=���=P���������<
+ @\X�?�,<�Z�0�?�#�&⿽�;&�-�"����)?=���=P���������<
+ ��輇��<\x��G?�2�E\��8<.z�"�1���D>=���=P���������<
+ 9M��]:;�����? @�K���Z׽���"�C�=(���=���=P���������<
+ ��,�=N�;����@/? �L���t�N#��!���"��j>'�<=���=P���������<
+ &c���
+=q���?2W�,����;���"�'T��&�>=���=P���������<
+ �j3=�h�;!���D?�_�K����$�]{��"j�?;檾=���=P���������<
+ 4Q�F�Q<D��@?��f�3����A�:.?~�"k���==���=P���������<
+ ��<���������?�l�����[�ӽ�Sl�"�%�>p�*�=���=P���������<
+ �p�f?<�����|? �q�|{����<�lF�"��x�>=���=P���������<
+ ����=���$?�oq�G�>�\����q?<"���>�6?=���=P���������<
+ *)��(sH;�*���~?@�o�jH�<LG��o�<"��:>>:5�=���=P���������<
+ ��V=wb<���`�?�kl� ���í"�W)(="\�T?�>=���=P���������<
+ T�Z���@��#��P�?@�i�PBǼ<����;"�~@�|b3�=���=P���������<
+ �$><��z<�ý��?�vk���ȼޟ��ߢu�"R�>�k7?=���=P���������<
+ ͎Լ=�"=��ƽ��?`ki��[�'7ֺ��<"<�ܾKw�>=���=P���������<
+ ��,�-9���Ƚ`�?�>g�y߻MrľLW�;"�d�>s�x�=���=P���������<
+ �Fȼ%M���ƽ�o?@�m�?��<�N��R���"4ߋ�nP">=���=P���������<
+ Y9�<��J��mý0M?`^x�G<�k�ux�"��?�T�=���=P���������<
+ k�^=��C<�ν��?�[����y��<��q�"T��>�4?=���=P���������<
+ u�<��>=���p�?`4��*˽�b
+<D���"WH�h��>=���=P���������<
+ �����=�����%?��������(�;�ۼ"P����G�>=���=P���������<
+ ���<����@�� �?Ps�������jy��ۼ".B�>��=���=P���������<
+ JO���l�;��	�� ? ��psH�y�;��K�"��k?�>=���=P���������<
+ �=��5�<�	�� ?�)�����<��4��"�zt>.��==���=P���������<
+ x*;͗�<��P�?@��]x8=����7R#�"ز?Hz�==���=P���������<
+ )6=�=<)��@?`���.E�<��(�gU�<"8�>;O�=���=P���������<
+ 9�����<����?�����t;HO6�h="pz����=���=P���������<
+ P�;J�3��j��9?`F��@�;hL���="on�>�߽=���=P���������<
+ �y<�<���`�?PG���SF�{'���,="S��=�6�==���=P���������<
+ A���ڻ����9?p��խ��k):
+�'="��پ�@+�=���=P���������<
+ ���<��c����?@I������7���bq�<"\�>�p��=���=P���������<
+ ���<�}Y<@�2?0�����
+�D_�����<"��;"��>=���=P���������<
+ n�`���<��	�`?�����e�����S="�۶��Jw==���=P���������<
+ �s��h����}�Ѓ?0т������xd="�Լ=>2��=���=P���������<
+ �=Ɍ������?@����l4��,R����<"kR�>����=���=P���������<
+ vGռ���<@2��l?�����N��
;��c<"��-�rN?=���=P���������<
+ !w��l��<`��?��|�|����ђ���,="ku�=��{==���=P���������<
+ �j&<�G����p�?@;y���ҼH�ھfI@<"2`�>Lpx�=���=P���������<
+ �nJ���<���?*���><y�9mqo�"0/���S?=���=P���������<
+ Ah�O�~��Q���?p���=�-5�/�d�"�>�O��=���=P���������<
+ �b)<���<@���X?�B��=�&=m�';�@
+�"_i�>u�>=���=P���������<
+ ��U=i�<�����? F��Ҹi�9
սp��"���>��x�=���=P���������<
+ �GS<T��r�@�?�ކ���&�5�u!ü"�Y����=���=P���������<
+ �p���<����?߈��o���;��
�"K����?=���=P���������<
+ :�c�C����00?๊��_ļFDM�� -�"�gA<H��=���=P���������<
+ �w���%=����#!?� ����<���:�o˼"�>ٽ[�4?=���=P���������<
+ �����4`=�d���#?p܊�95=�һ��0="��e<��)>=���=P���������<
+ �*W���<�0� ?����p��=#1��C��="p��F���=���=P���������<
+ ��
=�k"<����<?�����=�a�T��="�?�]��=���=P���������<
+ '�<��üY���?��|�r�'=p޽ æ="�cW��Ǿ=���=P���������<
+ ��
+�������pj?`�w�8n=��	��W9="|���W�<=���=P���������<
+ j�_�P�?<����?��s�	y�=9+a���2="��j>+�>=���=P���������<
+ @��<��<Խ��?��m�g��=���ރ="�?�>��==���=P���������<
+ p�8=C3���ǽ�X?��g�"�=k�h���5="���>���=���=P���������<
+ �z�<,f0;��̽`p?�f�����g�溯�0<"kQO�+��>=���=P���������BallDemo� -bfB
--- a/demos/Expert3DBallHard.demo
+++ b/demos/Expert3DBallHard.demo
--- a/demos/ExpertBanana.demo
+++ b/demos/ExpertBanana.demo
--- a/demos/ExpertBasic.demo
+++ b/demos/ExpertBasic.demo
+ExpertBasic\ -�]?*:
BasicLearningc
+P�?"P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������h
+P�?"@=
+�#�P���������j
+P�?"@=�p}?@P���������ExpertBasic\ -�]?
--- a/demos/ExpertBouncer.demo
+++ b/demos/ExpertBouncer.demo
+
ExpertBouncer� -䪌A ***0:BouncerLearningc
+H:�?�
+o?
+�������܍@ݶY�"P���������h
+H:�?�
+o?
+�������܍@ݶY�I��>��_?��������܍@ݶY�",k̽���>\��="��P���������h
+H:�?�
+o?
+�������܍@ݶY�I��>��_?��������܍@ݶY����?��t?�EB����T9@�O��"ք�=XDH��.B>=?�?P���������h
+HI��>��_?��������܍@ݶY����?��t?�EB����T9@�O��f���J�@s��>���T9@�O��"����N=L.�>=s��P���������h
+H���?��t?�EB����T9@�O��f���J�@s��>���T9@�O󾼁��ՠ)?��
+@�a���iG@4���"�i.=ƚk�2o�==n�?P���������h
+Hf���J�@s��>���T9@�O󾼁��ՠ)?��
+@�a���iG@4���d��,�W?^9��& �?!�@@�)@"���C���'�=|�}?P���������h
+H����ՠ)?��
+@�a���iG@4���d��,�W?^9��& �?!�@@�)@MK�@6�a?3`�@�'@����@hK��"��\?G���?=�Kx?P���������h
+Hd��,�W?^9��& �?!�@@�)@MK�@6�a?3`�@�'@����@hK��r����z?uS��[��@P�@b�c�"Np*�6�?���}�=�y?P���������h
+HMK�@6�a?3`�@�'@����@hK��r����z?uS��[��@P�@b�c�%i�@��S?!���[��@P�@b�c�"Z�E?s#��.�==6A&�P���������h
+Hr����z?uS��[��@P�@b�c�%i�@��S?!���[��@P�@b�c���@t�C?zN_�Xq�+��@r�c�"L
+=�EJ=�)�>=c]?P���������h
+H%i�@��S?!���[��@P�@b�c���@t�C?zN_�Xq�+��@r�c�&����[N?�;?Y��@�j@拞�"��ˡx�vJ�==��{?P���������h
+H��@t�C?zN_�Xq�+��@r�c�&����[N?�;?Y��@�j@拞���@��Q?#��,����6@�Fd�"�?��[��!�=��y?P���������h
+H&����[N?�;?Y��@�j@拞���@��Q?#��,����6@�Fd�������c?/���,����6@�Fd�"�C
���{�;��<=��ǻP���������h
+H��@��Q?#��,����6@�Fd�������c?/���,����6@�Fd��Z>�SJ�?
f@z@@4��@��Ŀ"�����@?P?=kc�?P���������h
+H������c?/���,����6@�Fd��Z>�SJ�?
f@z@@4��@��Ŀ��k@�D?C�R�����@���@":9�>�P��bܘ�=��~?P���������h
+H�Z>�SJ�?
f@z@@4��@��Ŀ��k@�D?C�R�����@���@����?"P�@q�q@�;�@���@"'d���
��1��>=�?P���������h
+H��k@�D?C�R�����@���@����?"P�@q�q@�;�@���@W�A�\F?+͠@�R.@ɩl@iX��"�U3?{�T�EZ�==V�}?P���������h
+H����?"P�@q�q@�;�@���@W�A�\F?+͠@�R.@ɩl@iX��cV�A^?[������@j��?"��X��~B��a�=�P{?P���������h
+HW�A�\F?+͠@�R.@ɩl@iX��cV�A^?[������@j��?ܯ���W�?���@�U�>�Y�@�h��"�o������r�?==#�?P���������j
+H�М�@
�j�8@KX�@G@"=x�y?@P���������h
+H�М�@
�j�8@KX�@G@��v@�A?���j�8@KX�@G@"�*�>n�kYv<=�~ҺP���������h
+H�М�@
�j�8@KX�@G@��v@�A?���j�8@KX�@G@Ը��Y�z?�R�@�f����@�>"���nzD=�L�>=��~?P���������h
+H��v@�A?���j�8@KX�@G@Ը��Y�z?�R�@�f����@�>�5� �K?h��?U{
@���@���"^Xͽ8>��(�=��?P���������h
+HԸ��Y�z?�R�@�f����@�>�5� �K?h��?U{
@���@���HA:YX?����u������@d��@"��@?��>���=a�|?P���������h
+H�5� �K?h��?U{
@���@���HA:YX?����u������@d��@�q��?
+Z?f�@��n��k@�DK@"��u����=�
?="1�?P���������h
+HHA:YX?����u������@d��@�q��?
+Z?f�@��n��k@�DK@q4�����?��@�-�@�A@�ת�"�d�<�1m�����=�{?P���������h
+H�q��?
+Z?f�@��n��k@�DK@q4�����?��@�-�@�A@�ת�)�A��]?�]�O�ֿ�Y@�Ou?"�?><!��z=�=A{{?P���������h
+Hq4�����?��@�-�@�A@�ת�)�A��]?�]�O�ֿ�Y@�Ou?�A��2>S?_Ԩ���>�{`@��'�"�k<������<=F�}?P���������h
+H)�A��]?�]�O�ֿ�Y@�Ou?�A��2>S?_Ԩ���>�{`@��'�����n?A�^>���?I�1@�"G�"զ=b�=Rr==5�?P���������h
+H�A��2>S?_Ԩ���>�{`@��'�����n?A�^>���?I�1@�"G�,z�?U?�g�Ռ���HY@F��@"���={�C�Ă��=�z?P���������h
+H����n?A�^>���?I�1@�"G�,z�?U?�g�Ռ���HY@F��@ɶ�>�tG?��5@Ռ���HY@F��@"p�4��������>=�bW�P���������h
+H,z�?U?�g�Ռ���HY@F��@ɶ�>�tG?��5@Ռ���HY@F��@�uֿH�r?�1�@��@e4D@#K��"s+�6r���\>=�?P���������h
+Hɶ�>�tG?��5@Ռ���HY@F��@�uֿH�r?�1�@��@e4D@#K���� ??'�@���@e4D@#K��"ɏ>.�P��^�=6ro�P���������h
+H�uֿH�r?�1�@��@e4D@#K���� ??'�@���@e4D@#K���J?��e?����Կ@n�@����"KF <֬p� ���=���?P���������h
+H�� ??'�@���@e4D@#K���J?��e?����Կ@n�@���������0x?i�9�q:@��@hs@".���fw8�7�=�a?P���������h
+H�J?��e?����Կ@n�@���������0x?i�9�q:@��@hs@t�A(�X?%�@��&��)-@�`�?"�ee?Ĝ��<?=2?z?P���������h
+H�����0x?i�9�q:@��@hs@t�A(�X?%�@��&��)-@�`�?����ɱJ?I�����&��)-@�`�?"���,���%g�=+w��P���������h
+Ht�A(�X?%�@��&��)-@�`�?����ɱJ?I�����&��)-@�`�?Y�@�j~?�??l���@��Z@":�?bz���T�>=��}?P���������h
+H����ɱJ?I�����&��)-@�`�?Y�@�j~?�??l���@��Z@\�Y=�Q�?)%@8�>��@O���"�|!�����c�>=��?P���������j
+H]^h@@T@�@�Dw=@����"=W�~?@P���������h
+H]^h@@T@�@�Dw=@������K�`��@Sq��@�Dw=@����"*�$�Ьl��j�=:Z�P���������h
+H]^h@@T@�@�Dw=@������K�`��@Sq��@�Dw=@����mRn�'6[?���P@��@���"6�>G���fȆ�=�?P���������h
+H��K�`��@Sq��@�Dw=@����mRn�'6[?���P@��@���,o�@��@�΄?P@��@���"
#?r�p���k?=�P��P���������h
+HmRn�'6[?���P@��@���,o�@��@�΄?P@��@���*=�?*ps?J{����?<d�@��q@"�??�@^��1��=�?P���������h
+H,o�@��@�΄?P@��@���*=�?*ps?J{����?<d�@��q@�<��cG?˜�=���?<d�@��q@"�������<=1J�P���������h
+H*=�?*ps?J{����?<d�@��q@�<��cG?˜�=���?<d�@��q@A��?��Q?�S�@�e�Q�@;���"J�
>�q_�k��>=	?P���������h
+H�<��cG?˜�=���?<d�@��q@A��?��Q?�S�@�e�Q�@;���]���p?K���I#]@��D@�р�"��������7�b�=�[|?P���������h
+HA��?��Q?�S�@�e�Q�@;���]���p?K���I#]@��D@�р����@��p?�\>nw&�a@d2��"��?Ј��H�>=��}?P���������h
+H]���p?K���I#]@��D@�р����@��p?�\>nw&�a@d2���r���DJ?
+�&@�@^�@p��="�N��f���E4>=?�|?P���������h
+H���@��p?�\>nw&�a@d2���r���DJ?
+�&@�@^�@p��=5�z@�M?T���;�T���@�a�>"��?P�������=^�}?P���������h
+H�r���DJ?
+�&@�@^�@p��=5�z@�M?T���;�T���@�a�>x��~�[?Ф��;�T���@�a�>"�K��F��<�|>=��#�P���������h
+H5�z@�M?T���;�T���@�a�>x��~�[?Ф��;�T���@�a�>�H��b�Q?<�-@"i@�kH@i澿"(���v)ֽk$�>=��~?P���������h
+Hx��~�[?Ф��;�T���@�a�>�H��b�Q?<�-@"i@�kH@i澿m��@J�~?�#�?"i@�kH@i澿"�?$D�0ާ�=�u��P���������h
+H�H��b�Q?<�-@"i@�kH@i澿m��@J�~?�#�?"i@�kH@i澿plG@XH?m������?%�@"�?"�w���T�Ry�=HP?P���������h
+Hm��@J�~?�#�?"i@�kH@i澿plG@XH?m������?%�@"�?�7��N�e?�?���@oh7@~��@"z���Bv����4>="?P���������h
+HplG@XH?m������?%�@"�?�7��N�e?�?���@oh7@~��@���@0�|?$��@�(��l%@"n�?"�A)?�S��L��>="�|?P���������h
+H�7��N�e?�?���@oh7@~��@���@0�|?$��@�(��l%@"n�?�]����x?\""�z��@��@���@"��k���~����=�f{?P���������h
+H���@0�|?$��@�(��l%@"n�?�]����x?\""�z��@��@���@T��@�v�?�j@K�Q��P@0�H@"�?�RY�k��>=�#{?P���������h
+H�]����x?\""�z��@��@���@T��@�v�?�j@K�Q��P@0�H@�y����D?��ƿK�Q��P@0�H@"FW���W��仾=�r�P���������j
+H�Ŀ@�=�@G7�T�@���"=,9?@P���������h
+H�Ŀ@�=�@G7�T�@�俖+��V�Q?^�������"K@w!�?"�����`���@�=�}?P���������h
+H�Ŀ@�=�@G7�T�@�俖+��V�Q?^�������"K@w!�?�-���'J?jT%@�y����@)���"p��>�V���/?=�}?P���������h
+H�+��V�Q?^�������"K@w!�?�-���'J?jT%@�y����@)�迊5�6BF?Qd�>�y����@)���"�)���c=���=��e�P���������h
+H�-���'J?jT%@�y����@)�迊5�6BF?Qd�>�y����@)���������u?W�����e�ߑ@~Y�"�+�^��=����=�A?P���������h
+H�5�6BF?Qd�>�y����@)���������u?W�����e�ߑ@~Y��B��*Q?�:@+(T���@���>"���>6��=�+?=2&�?P���������h
+H������u?W�����e�ߑ@~Y��B��*Q?�:@+(T���@���>1t��j�s?c'@n�$���e@/�Կ"k��I�>+���=x?P���������h
+H�B��*Q?�:@+(T���@���>1t��j�s?c'@n�$���e@/�Կ����3�@b�W�n�$���e@/�Կ"/��>��B=O&�=�l��P���������h
+H1t��j�s?c'@n�$���e@/�Կ����3�@b�W�n�$���e@/�ԿS��@��=?�Lz�n�$���e@/�Կ"kV�>�s��v��=�y��P���������h
+H����3�@b�W�n�$���e@/�ԿS��@��=?�Lz�n�$���e@/�Կt��ru_?������ݿ��@�_Z@"[�꾞�r�"��>=r�~?P���������h
+HS��@��=?�Lz�n�$���e@/�Կt��ru_?������ݿ��@�_Z@��z�@J��?J�(�cV@<��>"{�¼�8�=[��>=?P���������h
+Ht��ru_?������ݿ��@�_Z@��z�@J��?J�(�cV@<��>	�ANJs?�A(@J�(�cV@<��>"��?OP1��k2==�q��P���������h
+H��z�@J��?J�(�cV@<��>	�ANJs?�A(@J�(�cV@<��>K��"1{?�I
+�|R�>�}�@��
�"���h1�����=�{?P���������h
+H	�ANJs?�A(@J�(�cV@<��>K��"1{?�I
+�|R�>�}�@��
�e?�?F�z?sݢ?ƨ.���~@7ޅ�"��?09��s?�>=
+�}?P���������h
+HK��"1{?�I
+�|R�>�}�@��
�e?�?F�z?sݢ?ƨ.���~@7ޅ�<��f�8?D5��t@���@R:�@"�1���$�=V�#�=H�}?P���������h
+He?�?F�z?sݢ?ƨ.���~@7ޅ�<��f�8?D5��t@���@R:�@C�@�uS?	/�@���>�{�@s�"h�/?�v��?=�yy?P���������h
+H<��f�8?D5��t@���@R:�@C�@�uS?	/�@���>�{�@s�@����?�c0�?�o@���"O�;�xd-=N�=�G�?P���������h
+HC�@�uS?	/�@���>�{�@s�@����?�c0�?�o@����H�?�֝@=d,�?�o@���"x�6?��+�+��;=�!�P���������h
+H@����?�c0�?�o@����H�?�֝@=d,�?�o@���6��>�3T?f��� �VT�@|��>"S7����۾��&>=J��?P���������h
+H�H�?�֝@=d,�?�o@���6��>�3T?f��� �VT�@|��>`ҿ�\\?M6�?{*>�y�#@�e�"?�8½�E�==n�?P���������j
+HV�#@@�']?���=勲@م�?"=�"?@P���������h
+HV�#@@�']?���=勲@م�?�Ț��Bh?M�3�\w@1�@�J!�"�����~�𦀾=Z��?P���������h
+HV�#@@�']?���=勲@م�?�Ț��Bh?M�3�\w@1�@�J!�|n�?��b?�^?l<�>I;@Dr�>"Z�E>+�¼�]h>=O�?P���������h
+H�Ț��Bh?M�3�\w@1�@�J!�|n�?��b?�^?l<�>I;@Dr�>C�����s?j��@3Tt@`i�@�4@"��f�z��=���>=+y?P���������h
+H|n�?��b?�^?l<�>I;@Dr�>C�����s?j��@3Tt@`i�@�4@���@��B?L�]�\���@T	@"��?�J�=��=@D}?P���������h
+HC�����s?j��@3Tt@`i�@�4@���@��B?L�]�\���@T	@�2��8�Y?�Y�?�$��$�@�?b�"Jp.��Q�0��>=�@}?P���������h
+H���@��B?L�]�\���@T	@�2��8�Y?�Y�?�$��$�@�?b�����KtL?��)�q�0����@o�?"��+���������=�|?P���������h
+H�2��8�Y?�Y�?�$��$�@�?b�����KtL?��)�q�0����@o�?hS���xF?]�_@,��@���@t�ľ"T�>j0>+��>=(?P���������h
+H����KtL?��)�q�0����@o�?hS���xF?]�_@,��@���@t�ľ�N	A�Z?ȕ�_�@w%@��@"2W1?�^�<�	�=�|?P���������h
+HhS���xF?]�_@,��@���@t�ľ�N	A�Z?ȕ�_�@w%@��@Y��@�[�?y��?4�>/��@�9�"|�Ͻ��9���>=#(?P���������h
+H�N	A�Z?ȕ�_�@w%@��@Y��@�[�?y��?4�>/��@�9�#ɼ��aV?�5�?4�>/��@�9�"�&`�ߤ��V{�:=�3S�P���������h
+HY��@�[�?y��?4�>/��@�9�#ɼ��aV?�5�?4�>/��@�9�v@�`?��������2�@��$�"ҫ?[h0�7(ھ=B�}?P���������h
+H#ɼ��aV?�5�?4�>/��@�9�v@�`?��������2�@��$��|���Hv?������?ש@T���"f�`�rN>=߫~?P���������h
+Hv@�`?��������2�@��$��|���Hv?������?ש@T�𾂥�@�A|?s��@L��?/�@�Ί�"@+I?��Ľ���>=�U|?P���������h
+H�|���Hv?������?ש@T�𾂥�@�A|?s��@L��?/�@�Ί�.Hv?��F?�����i�>$�@K �@"RfD�=7A�=�~?P���������h
+H���@�A|?s��@L��?/�@�Ί�.Hv?��F?�����i�>$�@K �@�����46?�E�@X�[�2	�@���?"�ҾmS>	'?=�q}?P���������h
+H.Hv?��F?�����i�>$�@K �@�����46?�E�@X�[�2	�@���?�����F?d�>����n�@�h�?"V�q>�Q�=D��=�<?P���������h
+H�����46?�E�@X�[�2	�@���?�����F?d�>����n�@�h�?~�Y?�9>?��?*n=@��P�"�.��J�s�==��?P���������h
+H�����F?d�>����n�@�h�?~�Y?�9>?��?*n=@��P�.W|@�_?4�ܿ�͆?7@}�m@"��?��
+��^/�=̀~?P���������h
+H~�Y?�9>?��?*n=@��P�.W|@�_?4�ܿ�͆?7@}�m@f��H�f?>~�@��@@�L�@�'��"8k���Ӏ�gW�>=�~?P���������j
+Ho���@+���O�X9I@Z�?"=k;|?@P���������h
+Ho���@+���O�X9I@Z�?�$��@�"j��O�X9I@Z�?"ު��O:�����>=s,K�P���������h
+Ho���@+���O�X9I@Z�?�$��@�"j��O�X9I@Z�?�5����\?�N;@�y9@&u�@��b�"i���Ƚ;�9>=�?P���������h
+H�$��@�"j��O�X9I@Z�?�5����\?�N;@�y9@&u�@��b�P�I@l��?� �g/@٦X@*I�"FM,?�����+^�=J�z?P���������h
+H�5����\?�N;@�y9@&u�@��b�P�I@l��?� �g/@٦X@*I�`�A��i?�:?^)�7��@:-�?"��>�#��v+2?=�g}?P���������h
+HP�I@l��?� �g/@٦X@*I�`�A��i?�:?^)�7��@:-�?�ȱ�6M?B4~@GVw@��@��]�"��PU��
+�p>=��>@P���������h
+H`�A��i?�:?^)�7��@:-�?�ȱ�6M?B4~@GVw@��@��]��D�@:��@����GVw@��@��]�"��l?����e�=�q��P���������h
+H�ȱ�6M?B4~@GVw@��@��]��D�@:��@����GVw@��@��]��-A0�J?犈�GVw@��@��]�"��>8Y����==���P���������h
+H�D�@:��@����GVw@��@��]��-A0�J?犈�GVw@��@��]�d�p@�N?��I��`�����@�o@"�����=�����==��~?P���������h
+H�-A0�J?犈�GVw@��@��]�d�p@�N?��I��`�����@�o@�]���r?GB�@M�˿�j�@k߇�"fi`���p��i4?=�]z?P���������h
+Hd�p@�N?��I��`�����@�o@�]���r?GB�@M�˿�j�@k߇�b�@Tnb?,Ux��j
+@�&�@��s@"H�1?ƑI=3�-�=��{?P���������h
+H�]���r?GB�@M�˿�j�@k߇�b�@Tnb?,Ux��j
+@�&�@��s@7�@0�v?+Fu@�j
+@�&�@��s@"���>��%���?=��ŻP���������h
+Hb�@Tnb?,Ux��j
+@�&�@��s@7�@0�v?+Fu@�j
+@�&�@��s@��k�2��@d�c@��Q�qw@�b�@"_j6����=�Hܼ=��}?P���������h
+H7�@0�v?+Fu@�j
+@�&�@��s@��k�2��@d�c@��Q�qw@�b�@p�'@��*?/m&A,*�@d@my@"��%>�:.���>=�?P���������h
+H��k�2��@d�c@��Q�qw@�b�@p�'@��*?/m&A,*�@d@my@qHU@a�?9�Ѿ�[@���@w$?";M=@��3�B�=r}?P���������h
+Hp�'@��*?/m&A,*�@d@my@qHU@a�?9�Ѿ�[@���@w$?���@$�z?AB�@��~@O��@ƚP�"�Y�>x������>=/^~?P���������h
+HqHU@a�?9�Ѿ�[@���@w$?���@$�z?AB�@��~@O��@ƚP������/8?x�,?��~@O��@ƚP�"���V�=���=+���P���������h
+H���@$�z?AB�@��~@O��@ƚP������/8?x�,?��~@O��@ƚP�C�@�C?�>���T����@5�>�"l��>���<vݠ�=J�~?P���������h
+H�����/8?x�,?��~@O��@ƚP�C�@�C?�>���T����@5�>�����&X?�ړ�g?t��@K@"ga����=G�=o�|?P���������h
+HC�@�C?�>���T����@5�>�����&X?�ړ�g?t��@K@��@��d?��@:���J��@l�@"W�g?8�V�oe_?=�y?P���������j
+H�祿@���>Ǝ[@���@�ԁ�"=�y?@P���������h
+H�祿@���>Ǝ[@���@�ԁ�?Y�?	?�@H�t��@�;@�:��"
+�t>	6���e��=�-?P���������h
+H�祿@���>Ǝ[@���@�ԁ�?Y�?	?�@H�t��@�;@�:��h�0@� +?��[@ٹ��r@j��@"���=0Vr��G�>=��~?P���������h
+H?Y�?	?�@H�t��@�;@�:��h�0@� +?��[@ٹ��r@j��@�J��}@?�ʜ@��@]7b@<�ž"{D����Q�(��==AP?P���������h
+Hh�0@� +?��[@ٹ��r@j��@�J��}@?�ʜ@��@]7b@<�ž�?�g�?�>z?���@���@"�I�>���o���=O�~?P���������h
+H�J��}@?�ʜ@��@]7b@<�ž�?�g�?�>z?���@���@��P���V?�D�@��2�Ѫ�@�&��"��@��|��X�>=��~?P���������h
+H�?�g�?�>z?���@���@��P���V?�D�@��2�Ѫ�@�&��1r;��b??Y��7�@�>�@�8
+?"H�
+��j�=�Ͼ=0?P���������h
+H��P���V?�D�@��2�Ѫ�@�&��1r;��b??Y��7�@�>�@�8
+?�4m@���?xv"@,׽?%�@ܶ�>"��>bν��>=��~?P���������h
+H1r;��b??Y��7�@�>�@�8
+?�4m@���?xv"@,׽?%�@ܶ�>R�?EOE?u~��Օ�;%@ar�@"d�!���>�uо=�?@P���������h
+H�4m@���?xv"@,׽?%�@ܶ�>R�?EOE?u~��Օ�;%@ar�@�_P���f?��A�"�@���@t{:�"7�	��o-��\o?=Q/|?P���������h
+HR�?EOE?u~��Օ�;%@ar�@�_P���f?��A�"�@���@t{:��J�@�zg?�7o��>��3�@�?"��>��=��z?P���������h
+H�_P���f?��A�"�@���@t{:��J�@�zg?�7o��>��3�@�?����2J?e� ?�I>�s@�_׿"�'���H�>=��}?P���������h
+H�J�@�zg?�7o��>��3�@�?����2J?e� ?�I>�s@�_׿N�>�/]?ՙ�?�I>�s@�_׿"@ڦ>p��<L��==r���P���������h
+H����2J?e� ?�I>�s@�_׿N�>�/]?ՙ�?�I>�s@�_׿���?z�]?j�-����?,�@�\f@"���=�l=���=��?P���������h
+HN�>�/]?ՙ�?�I>�s@�_׿���?z�]?j�-����?,�@�\f@B(@.�^?��?gC�@y�@���"�m=.W�=��>=.��?P���������h
+H���?z�]?j�-����?,�@�\f@B(@.�^?��?gC�@y�@���8�@�pG?�Y��Dߓ��c�@8�5>"�_�=��=9H�=ø?P���������h
+HB(@.�^?��?gC�@y�@���8�@�pG?�Y��Dߓ��c�@8�5>�}4���G?��?�;����@���@"�����=c�=>=��{?P���������h
+H8�@�pG?�Y��Dߓ��c�@8�5>�}4���G?��?�;����@���@�	.@D6\?�,Ả�?���@0c<�"�?�YR�| �>=S�z?P���������h
+H�}4���G?��?�;����@���@�	.@D6\?�,Ả�?���@0c<��3m@n�a?C����^����@��\�"�q�=�Ž��=(�>@P���������h
+H�	.@D6\?�,Ả�?���@0c<��3m@n�a?C����^����@��\��C��Z?"'R��y�@c@	uǿ"Є���bb���q>=u�~?P���������j
+H,]�@@q�R@S�:@@g�0@"=Ď|?@P���������h
+H,]�@@q�R@S�:@@g�0@ӿ�@cM�@��1@S�:@@g�0@";V��ª�(�p�=��<�P���������h
+H,]�@@q�R@S�:@@g�0@ӿ�@cM�@��1@S�:@@g�0@�.-@ڒ@��@S�:@@g�0@"�⽪���S�=�q��P���������h
+Hӿ�@cM�@��1@S�:@@g�0@�.-@ڒ@��@S�:@@g�0@L�@A��@�%@2��@PD�?"O��G������<=���?P���������h
+H�.-@ڒ@��@S�:@@g�0@L�@A��@�%@2��@PD�?i����M?�}*>k���U�@�#@"�b<�d�۾K���=;?P���������h
+HL�@A��@�%@2��@PD�?i����M?�}*>k���U�@�#@����O?�t�@��&�d;a@�~o�"�'���f���-�>=�t�?P���������h
+Hi����M?�}*>k���U�@�#@����O?�t�@��&�d;a@�~o�0m��xB?��-���&�d;a@�~o�"�z>_����վ=)���P���������h
+H����O?�t�@��&�d;a@�~o�0m��xB?��-���&�d;a@�~o�%���K�a?*���k忺;�@1i�@"��K�$�d��=�?P���������h
+H0m��xB?��-���&�d;a@�~o�%���K�a?*���k忺;�@1i�@x���۳n?g��@�����Q@ ]�?":$�W�~��?="\{?P���������h
+H%���K�a?*���k忺;�@1i�@x���۳n?g��@�����Q@ ]�?J��Tj??��
+@
+�
+?�b@�;S�"Β������<��=m�?P���������h
+Hx���۳n?g��@�����Q@ ]�?J��Tj??��
+@
+�
+?�b@�;S��8�r��@P�\?�s¾v@��@"�N?Pp��6��=��}?P���������h
+HJ��Tj??��
+@
+�
+?�b@�;S��8�r��@P�\?�s¾v@��@b���@�u9@�s¾v@��@"F4H=�;�e >=��L�P���������h
+H�8�r��@P�\?�s¾v@��@b���@�u9@�s¾v@��@�@�[?��x����hK�@歝�"�ӎ>��P�{�q�=f��?P���������h
+Hb���@�u9@�s¾v@��@�@�[?��x����hK�@歝�E:��~o?v������hK�@歝�"�� ��[������=B�D�P���������h
+H�@�[?��x����hK�@歝�E:��~o?v������hK�@歝�`nۿD�J?�{!�b����g@M{o@"`��>3"�b��>=Z��?P���������h
+HE:��~o?v������hK�@歝�`nۿD�J?�{!�b����g@M{o@C�?|q?'�@�? �@�3�"�'">�5H�:�?=f~?P���������
ExpertBouncer� -䪌A
--- a/demos/ExpertCrawlerSta.demo
+++ b/demos/ExpertCrawlerSta.demo
--- a/demos/ExpertGrid.demo
+++ b/demos/ExpertGrid.demo
--- a/demos/ExpertHallway.demo
+++ b/demos/ExpertHallway.demo
--- a/demos/ExpertPush.demo
+++ b/demos/ExpertPush.demo
--- a/demos/ExpertPyramid.demo
+++ b/demos/ExpertPyramid.demo
--- a/demos/ExpertReacher.demo
+++ b/demos/ExpertReacher.demo
--- a/demos/ExpertSoccerGoal.demo
+++ b/demos/ExpertSoccerGoal.demo
--- a/demos/ExpertSoccerStri.demo
+++ b/demos/ExpertSoccerStri.demo
--- a/demos/ExpertTennis.demo
+++ b/demos/ExpertTennis.demo
--- a/demos/ExpertWalker.demo
+++ b/demos/ExpertWalker.demo
--- a/ml-agents/mlagents/trainers/components/bc/init.py
+++ b/ml-agents/mlagents/trainers/components/bc/init.py
+from .module import BCModule
--- a/ml-agents/mlagents/trainers/components/bc/model.py
+++ b/ml-agents/mlagents/trainers/components/bc/model.py
+import tensorflow as tf
+import numpy as np
+from mlagents.trainers.models import LearningModel
+
+
+class BCModel(object):
+    def __init__(
+        self,
+        policy_model: LearningModel,
+        learning_rate: float = 3e-4,
+        anneal_steps: int = 0,
+    ):
+        """
+        Tensorflow operations to perform Behavioral Cloning on a Policy model
+        :param policy_model: The policy of the learning algorithm
+        :param lr: The initial learning Rate for behavioral cloning
+        :param anneal_steps: Number of steps over which to anneal BC training
+        """
+        self.policy_model = policy_model
+        self.expert_visual_in = self.policy_model.visual_in
+        self.obs_in_expert = self.policy_model.vector_in
+        self.make_inputs()
+        self.create_loss(learning_rate, anneal_steps)
+
+    def make_inputs(self) -> None:
+        """
+        Creates the input layers for the discriminator
+        """
+        self.done_expert = tf.placeholder(shape=[None, 1], dtype=tf.float32)
+        self.done_policy = tf.placeholder(shape=[None, 1], dtype=tf.float32)
+
+        if self.policy_model.brain.vector_action_space_type == "continuous":
+            action_length = self.policy_model.act_size[0]
+            self.action_in_expert = tf.placeholder(
+                shape=[None, action_length], dtype=tf.float32
+            )
+            self.expert_action = tf.identity(self.action_in_expert)
+        else:
+            action_length = len(self.policy_model.act_size)
+            self.action_in_expert = tf.placeholder(
+                shape=[None, action_length], dtype=tf.int32
+            )
+            self.expert_action = tf.concat(
+                [
+                    tf.one_hot(
+                        self.action_in_expert[:, i], self.policy_model.act_size[i]
+                    )
+                    for i in range(len(self.policy_model.act_size))
+                ],
+                axis=1,
+            )
+
+    def create_loss(self, learning_rate: float, anneal_steps: int) -> None:
+        """
+        Creates the loss and update nodes for the BC module
+        :param learning_rate: The learning rate for the optimizer
+        :param anneal_steps: Number of steps over which to anneal the learning_rate
+        """
+        selected_action = self.policy_model.output
+        action_size = self.policy_model.act_size
+        if self.policy_model.brain.vector_action_space_type == "continuous":
+            self.loss = tf.reduce_mean(
+                tf.squared_difference(selected_action, self.expert_action)
+            )
+        else:
+            log_probs = self.policy_model.all_log_probs
+            action_idx = [0] + list(np.cumsum(action_size))
+            entropy = tf.reduce_sum(
+                (
+                    tf.stack(
+                        [
+                            tf.nn.softmax_cross_entropy_with_logits_v2(
+                                labels=tf.nn.softmax(
+                                    log_probs[:, action_idx[i] : action_idx[i + 1]]
+                                ),
+                                logits=log_probs[:, action_idx[i] : action_idx[i + 1]],
+                            )
+                            for i in range(len(action_size))
+                        ],
+                        axis=1,
+                    )
+                ),
+                axis=1,
+            )
+            self.loss = tf.reduce_mean(
+                -tf.log(tf.nn.softmax(log_probs) + 1e-7) * self.expert_action
+            )
+
+        if anneal_steps > 0:
+            self.annealed_learning_rate = tf.train.polynomial_decay(
+                learning_rate,
+                self.policy_model.global_step,
+                anneal_steps,
+                0.0,
+                power=1.0,
+            )
+        else:
+            self.annealed_learning_rate = learning_rate
+
+        optimizer = tf.train.AdamOptimizer(learning_rate=self.annealed_learning_rate)
+        self.update_batch = optimizer.minimize(self.loss)
--- a/ml-agents/mlagents/trainers/components/bc/module.py
+++ b/ml-agents/mlagents/trainers/components/bc/module.py
+from typing import Dict, Any
+import numpy as np
+
+from mlagents.trainers.tf_policy import TFPolicy
+from .model import BCModel
+from mlagents.trainers.demo_loader import demo_to_buffer
+from mlagents.trainers.trainer import UnityTrainerException
+
+
+class BCModule:
+    def __init__(
+        self,
+        policy: TFPolicy,
+        policy_learning_rate: float,
+        default_batch_size: int,
+        default_num_epoch: int,
+        strength: float,
+        demo_path: str,
+        steps: int,
+        batch_size: int = None,
+        num_epoch: int = None,
+        samples_per_update: int = 0,
+    ):
+        """
+        A BC trainer that can be used inline with RL, especially for pretraining.
+        :param policy: The policy of the learning model
+        :param policy_learning_rate: The initial Learning Rate of the policy. Used to set an appropriate learning rate for the pretrainer.
+        :param default_batch_size: The default batch size to use if batch_size isn't provided.
+        :param default_num_epoch: The default num_epoch to use if num_epoch isn't provided.
+        :param strength: The proportion of learning rate used to update through BC.
+        :param steps: The number of steps to anneal BC training over. 0 for continuous training.
+        :param demo_path: The path to the demonstration file.
+        :param batch_size: The batch size to use during BC training.
+        :param num_epoch: Number of epochs to train for during each update.
+        :param samples_per_update: Maximum number of samples to train on during each pretraining update.
+        """
+        self.policy = policy
+        self.current_lr = policy_learning_rate * strength
+        self.model = BCModel(policy.model, self.current_lr, steps)
+        _, self.demonstration_buffer = demo_to_buffer(demo_path, policy.sequence_length)
+
+        self.batch_size = batch_size if batch_size else default_batch_size
+        self.num_epoch = num_epoch if num_epoch else default_num_epoch
+        self.n_sequences = max(
+            min(
+                self.batch_size, len(self.demonstration_buffer.update_buffer["actions"])
+            )
+            // policy.sequence_length,
+            1,
+        )
+
+        self.has_updated = False
+        self.use_recurrent = self.policy.use_recurrent
+        self.samples_per_update = samples_per_update
+        self.out_dict = {
+            "loss": self.model.loss,
+            "update": self.model.update_batch,
+            "learning_rate": self.model.annealed_learning_rate,
+        }
+
+    @staticmethod
+    def check_config(config_dict: Dict[str, Any]) -> None:
+        """
+        Check the pretraining config for the required keys.
+        :param config_dict: Pretraining section of trainer_config
+        """
+        param_keys = ["strength", "demo_path", "steps"]
+        for k in param_keys:
+            if k not in config_dict:
+                raise UnityTrainerException(
+                    "The required pre-training hyper-parameter {0} was not defined. Please check your \
+                    trainer YAML file.".format(
+                        k
+                    )
+                )
+
+    def update(self) -> Dict[str, Any]:
+        """
+        Updates model using buffer.
+        :param max_batches: The maximum number of batches to use per update.
+        :return: The loss of the update.
+        """
+        # Don't continue training if the learning rate has reached 0, to reduce training time.
+        if self.current_lr <= 0:
+            return {"Losses/Pretraining Loss": 0}
+
+        batch_losses = []
+        possible_demo_batches = (
+            len(self.demonstration_buffer.update_buffer["actions"]) // self.n_sequences
+        )
+        possible_batches = possible_demo_batches
+
+        max_batches = self.samples_per_update // self.n_sequences
+
+        n_epoch = self.num_epoch
+        for _ in range(n_epoch):
+            self.demonstration_buffer.update_buffer.shuffle()
+            if max_batches == 0:
+                num_batches = possible_batches
+            else:
+                num_batches = min(possible_batches, max_batches)
+            for i in range(num_batches):
+                demo_update_buffer = self.demonstration_buffer.update_buffer
+                start = i * self.n_sequences
+                end = (i + 1) * self.n_sequences
+                mini_batch_demo = demo_update_buffer.make_mini_batch(start, end)
+                run_out = self._update_batch(mini_batch_demo, self.n_sequences)
+                loss = run_out["loss"]
+                self.current_lr = run_out["learning_rate"]
+                batch_losses.append(loss)
+        self.has_updated = True
+        update_stats = {"Losses/Pretraining Loss": np.mean(batch_losses)}
+        return update_stats
+
+    def _update_batch(
+        self, mini_batch_demo: Dict[str, Any], n_sequences: int
+    ) -> Dict[str, Any]:
+        """
+        Helper function for update_batch.
+        """
+        feed_dict = {
+            self.policy.model.batch_size: n_sequences,
+            self.policy.model.sequence_length: self.policy.sequence_length,
+        }
+        if self.policy.model.brain.vector_action_space_type == "continuous":
+            feed_dict[self.model.action_in_expert] = mini_batch_demo["actions"].reshape(
+                [-1, self.policy.model.brain.vector_action_space_size[0]]
+            )
+            feed_dict[self.policy.model.epsilon] = np.random.normal(
+                size=(1, self.policy.model.act_size[0])
+            )
+        else:
+            feed_dict[self.model.action_in_expert] = mini_batch_demo["actions"].reshape(
+                [-1, len(self.policy.model.brain.vector_action_space_size)]
+            )
+            feed_dict[self.policy.model.action_masks] = np.ones(
+                (
+                    self.n_sequences,
+                    sum(self.policy.model.brain.vector_action_space_size),
+                )
+            )
+        if self.policy.model.brain.vector_observation_space_size > 0:
+            apparent_obs_size = (
+                self.policy.model.brain.vector_observation_space_size
+                * self.policy.model.brain.num_stacked_vector_observations
+            )
+            feed_dict[self.policy.model.vector_in] = mini_batch_demo[
+                "vector_obs"
+            ].reshape([-1, apparent_obs_size])
+        for i, _ in enumerate(self.policy.model.visual_in):
+            visual_obs = mini_batch_demo["visual_obs%d" % i]
+            if self.policy.sequence_length > 1 and self.policy.use_recurrent:
+                (_batch, _seq, _w, _h, _c) = visual_obs.shape
+                feed_dict[self.policy.model.visual_in[i]] = visual_obs.reshape(
+                    [-1, _w, _h, _c]
+                )
+            else:
+                feed_dict[self.policy.model.visual_in[i]] = visual_obs
+        if self.use_recurrent:
+            feed_dict[self.policy.model.memory_in] = np.zeros(
+                [self.n_sequences, self.policy.m_size]
+            )
+            if not self.policy.model.brain.vector_action_space_type == "continuous":
+                feed_dict[self.policy.model.prev_action] = mini_batch_demo[
+                    "prev_action"
+                ].reshape([-1, len(self.policy.model.act_size)])
+
+        network_out = self.policy.sess.run(
+            list(self.out_dict.values()), feed_dict=feed_dict
+        )
+        run_out = dict(zip(list(self.out_dict.keys()), network_out))
+        return run_out
--- a/ml-agents/mlagents/trainers/components/reward_signals/gail/init.py
+++ b/ml-agents/mlagents/trainers/components/reward_signals/gail/init.py
+from .signal import GAILRewardSignal
--- a/ml-agents/mlagents/trainers/components/reward_signals/gail/model.py
+++ b/ml-agents/mlagents/trainers/components/reward_signals/gail/model.py
+from typing import Tuple, List
+
+import tensorflow as tf
+from mlagents.trainers.models import LearningModel
+
+
+class GAILModel(object):
+    def __init__(
+        self,
+        policy_model: LearningModel,
+        h_size: int = 128,
+        learning_rate: float = 3e-4,
+        encoding_size: int = 64,
+        use_actions: bool = False,
+        use_vail: bool = False,
+    ):
+        """
+        The initializer for the GAIL reward generator.
+        https://arxiv.org/abs/1606.03476
+        :param policy_model: The policy of the learning algorithm
+        :param h_size: Size of the hidden layer for the discriminator
+        :param learning_rate: The learning Rate for the discriminator
+        :param encoding_size: The encoding size for the encoder
+        :param use_actions: Whether or not to use actions to discriminate
+        :param use_vail: Whether or not to use a variational bottleneck for the
+        discriminator. See https://arxiv.org/abs/1810.00821.
+        """
+        self.h_size = h_size
+        self.z_size = 128
+        self.alpha = 0.0005
+        self.mutual_information = 0.5
+        self.policy_model = policy_model
+        self.encoding_size = encoding_size
+        self.use_vail = use_vail
+        self.use_actions = use_actions  # True # Not using actions
+        self.make_beta()
+        self.make_inputs()
+        self.create_network()
+        self.create_loss(learning_rate)
+
+    def make_beta(self) -> None:
+        """
+        Creates the beta parameter and its updater for GAIL
+        """
+        self.beta = tf.get_variable(
+            "gail_beta",
+            [],
+            trainable=False,
+            dtype=tf.float32,
+            initializer=tf.ones_initializer(),
+        )
+        self.kl_div_input = tf.placeholder(shape=[], dtype=tf.float32)
+        new_beta = tf.maximum(
+            self.beta + self.alpha * (self.kl_div_input - self.mutual_information), 1e-7
+        )
+        self.update_beta = tf.assign(self.beta, new_beta)
+
+    def make_inputs(self) -> None:
+        """
+        Creates the input layers for the discriminator
+        """
+        self.done_expert = tf.placeholder(shape=[None, 1], dtype=tf.float32)
+        self.done_policy = tf.placeholder(shape=[None, 1], dtype=tf.float32)
+
+        if self.policy_model.brain.vector_action_space_type == "continuous":
+            action_length = self.policy_model.act_size[0]
+            self.action_in_expert = tf.placeholder(
+                shape=[None, action_length], dtype=tf.float32
+            )
+            self.expert_action = tf.identity(self.action_in_expert)
+        else:
+            action_length = len(self.policy_model.act_size)
+            self.action_in_expert = tf.placeholder(
+                shape=[None, action_length], dtype=tf.int32
+            )
+            self.expert_action = tf.concat(
+                [
+                    tf.one_hot(
+                        self.action_in_expert[:, i], self.policy_model.act_size[i]
+                    )
+                    for i in range(len(self.policy_model.act_size))
+                ],
+                axis=1,
+            )
+
+        encoded_policy_list = []
+        encoded_expert_list = []
+
+        if self.policy_model.vec_obs_size > 0:
+            self.obs_in_expert = tf.placeholder(
+                shape=[None, self.policy_model.vec_obs_size], dtype=tf.float32
+            )
+            if self.policy_model.normalize:
+                encoded_expert_list.append(
+                    self.policy_model.normalize_vector_obs(self.obs_in_expert)
+                )
+                encoded_policy_list.append(
+                    self.policy_model.normalize_vector_obs(self.policy_model.vector_in)
+                )
+            else:
+                encoded_expert_list.append(self.obs_in_expert)
+                encoded_policy_list.append(self.policy_model.vector_in)
+
+        if self.policy_model.vis_obs_size > 0:
+            self.expert_visual_in: List[tf.Tensor] = []
+            visual_policy_encoders = []
+            visual_expert_encoders = []
+            for i in range(self.policy_model.vis_obs_size):
+                # Create input ops for next (t+1) visual observations.
+                visual_input = self.policy_model.create_visual_input(
+                    self.policy_model.brain.camera_resolutions[i],
+                    name="visual_observation_" + str(i),
+                )
+                self.expert_visual_in.append(visual_input)
+
+                encoded_policy_visual = self.policy_model.create_visual_observation_encoder(
+                    self.policy_model.visual_in[i],
+                    self.encoding_size,
+                    LearningModel.swish,
+                    1,
+                    "stream_{}_visual_obs_encoder".format(i),
+                    False,
+                )
+
+                encoded_expert_visual = self.policy_model.create_visual_observation_encoder(
+                    self.expert_visual_in[i],
+                    self.encoding_size,
+                    LearningModel.swish,
+                    1,
+                    "stream_{}_visual_obs_encoder".format(i),
+                    True,
+                )
+                visual_policy_encoders.append(encoded_policy_visual)
+                visual_expert_encoders.append(encoded_expert_visual)
+            hidden_policy_visual = tf.concat(visual_policy_encoders, axis=1)
+            hidden_expert_visual = tf.concat(visual_expert_encoders, axis=1)
+            encoded_policy_list.append(hidden_policy_visual)
+            encoded_expert_list.append(hidden_expert_visual)
+
+        self.encoded_expert = tf.concat(encoded_expert_list, axis=1)
+        self.encoded_policy = tf.concat(encoded_policy_list, axis=1)
+
+    def create_encoder(
+        self, state_in: tf.Tensor, action_in: tf.Tensor, done_in: tf.Tensor, reuse: bool
+    ) -> Tuple[tf.Tensor, tf.Tensor]:
+        """
+        Creates the encoder for the discriminator
+        :param state_in: The encoded observation input
+        :param action_in: The action input
+        :param done_in: The done flags input
+        :param reuse: If true, the weights will be shared with the previous encoder created
+        """
+        with tf.variable_scope("GAIL_model"):
+            if self.use_actions:
+                concat_input = tf.concat([state_in, action_in, done_in], axis=1)
+            else:
+                concat_input = state_in
+
+            hidden_1 = tf.layers.dense(
+                concat_input,
+                self.h_size,
+                activation=LearningModel.swish,
+                name="d_hidden_1",
+                reuse=reuse,
+            )
+
+            hidden_2 = tf.layers.dense(
+                hidden_1,
+                self.h_size,
+                activation=LearningModel.swish,
+                name="d_hidden_2",
+                reuse=reuse,
+            )
+
+            z_mean = None
+            if self.use_vail:
+                # Latent representation
+                z_mean = tf.layers.dense(
+                    hidden_2,
+                    self.z_size,
+                    reuse=reuse,
+                    name="z_mean",
+                    kernel_initializer=LearningModel.scaled_init(0.01),
+                )
+
+                self.noise = tf.random_normal(tf.shape(z_mean), dtype=tf.float32)
+
+                # Sampled latent code
+                self.z = z_mean + self.z_sigma * self.noise * self.use_noise
+                estimate_input = self.z
+            else:
+                estimate_input = hidden_2
+
+            estimate = tf.layers.dense(
+                estimate_input,
+                1,
+                activation=tf.nn.sigmoid,
+                name="d_estimate",
+                reuse=reuse,
+            )
+            return estimate, z_mean
+
+    def create_network(self) -> None:
+        """
+        Helper for creating the intrinsic reward nodes
+        """
+        if self.use_vail:
+            self.z_sigma = tf.get_variable(
+                "sigma_vail",
+                self.z_size,
+                dtype=tf.float32,
+                initializer=tf.ones_initializer(),
+            )
+            self.z_sigma_sq = self.z_sigma * self.z_sigma
+            self.z_log_sigma_sq = tf.log(self.z_sigma_sq + 1e-7)
+            self.use_noise = tf.placeholder(
+                shape=[1], dtype=tf.float32, name="NoiseLevel"
+            )
+        self.expert_estimate, self.z_mean_expert = self.create_encoder(
+            self.encoded_expert, self.expert_action, self.done_expert, reuse=False
+        )
+        self.policy_estimate, self.z_mean_policy = self.create_encoder(
+            self.encoded_policy,
+            self.policy_model.selected_actions,
+            self.done_policy,
+            reuse=True,
+        )
+        self.discriminator_score = tf.reshape(
+            self.policy_estimate, [-1], name="GAIL_reward"
+        )
+        self.intrinsic_reward = -tf.log(1.0 - self.discriminator_score + 1e-7)
+
+    def create_loss(self, learning_rate: float) -> None:
+        """
+        Creates the loss and update nodes for the GAIL reward generator
+        :param learning_rate: The learning rate for the optimizer
+        """
+        self.mean_expert_estimate = tf.reduce_mean(self.expert_estimate)
+        self.mean_policy_estimate = tf.reduce_mean(self.policy_estimate)
+
+        self.discriminator_loss = -tf.reduce_mean(
+            tf.log(self.expert_estimate + 1e-7)
+            + tf.log(1.0 - self.policy_estimate + 1e-7)
+        )
+
+        if self.use_vail:
+            # KL divergence loss (encourage latent representation to be normal)
+            self.kl_loss = tf.reduce_mean(
+                -tf.reduce_sum(
+                    1
+                    + self.z_log_sigma_sq
+                    - 0.5 * tf.square(self.z_mean_expert)
+                    - 0.5 * tf.square(self.z_mean_policy)
+                    - tf.exp(self.z_log_sigma_sq),
+                    1,
+                )
+            )
+            self.loss = (
+                self.beta * (self.kl_loss - self.mutual_information)
+                + self.discriminator_loss
+            )
+        else:
+            self.loss = self.discriminator_loss
+        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
+        self.update_batch = optimizer.minimize(self.loss)
--- a/ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py
+++ b/ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py
+from typing import Any, Dict, List
+import logging
+import numpy as np
+import tensorflow as tf
+
+from mlagents.envs.brain import BrainInfo
+from mlagents.trainers.buffer import Buffer
+from mlagents.trainers.components.reward_signals import RewardSignal, RewardSignalResult
+from mlagents.trainers.tf_policy import TFPolicy
+from .model import GAILModel
+from mlagents.trainers.demo_loader import demo_to_buffer
+
+LOGGER = logging.getLogger("mlagents.trainers")
+
+
+class GAILRewardSignal(RewardSignal):
+    def __init__(
+        self,
+        policy: TFPolicy,
+        strength: float,
+        gamma: float,
+        demo_path: str,
+        num_epoch: int = 3,
+        encoding_size: int = 64,
+        learning_rate: float = 3e-4,
+        samples_per_update: int = 0,
+        use_actions: bool = False,
+        use_vail: bool = False,
+    ):
+        """
+        The GAIL Reward signal generator. https://arxiv.org/abs/1606.03476
+        :param policy: The policy of the learning model
+        :param strength: The scaling parameter for the reward. The scaled reward will be the unscaled
+        reward multiplied by the strength parameter
+        :param gamma: The time discounting factor used for this reward.
+        :param demo_path: The path to the demonstration file
+        :param encoding_size: The size of the the hidden layers of the discriminator
+        :param learning_rate: The Learning Rate used during GAIL updates.
+        :param samples_per_update: The maximum number of samples to update during GAIL updates.
+        :param use_actions: Whether or not to use the actions for the discriminator.
+        :param use_vail: Whether or not to use a variational bottleneck for the discriminator. 
+        See https://arxiv.org/abs/1810.00821.
+        """
+        super().__init__(policy, strength, gamma)
+        self.num_epoch = num_epoch
+        self.samples_per_update = samples_per_update
+
+        self.model = GAILModel(
+            policy.model, 128, learning_rate, encoding_size, use_actions, use_vail
+        )
+        _, self.demonstration_buffer = demo_to_buffer(demo_path, policy.sequence_length)
+        self.has_updated = False
+
+    def evaluate(
+        self, current_info: BrainInfo, next_info: BrainInfo
+    ) -> RewardSignalResult:
+        if len(current_info.agents) == 0:
+            return []
+
+        feed_dict: Dict[tf.Tensor, Any] = {
+            self.policy.model.batch_size: len(next_info.vector_observations),
+            self.policy.model.sequence_length: 1,
+        }
+        if self.model.use_vail:
+            feed_dict[self.model.use_noise] = [0]
+
+        feed_dict = self.policy.fill_eval_dict(feed_dict, brain_info=current_info)
+        feed_dict[self.model.done_policy] = np.reshape(next_info.local_done, [-1, 1])
+        if self.policy.use_continuous_act:
+            feed_dict[
+                self.policy.model.selected_actions
+            ] = next_info.previous_vector_actions
+        else:
+            feed_dict[
+                self.policy.model.action_holder
+            ] = next_info.previous_vector_actions
+        if self.policy.use_recurrent:
+            if current_info.memories.shape[1] == 0:
+                current_info.memories = self.policy.make_empty_memory(
+                    len(current_info.agents)
+                )
+            feed_dict[self.policy.model.memory_in] = current_info.memories
+        unscaled_reward = self.policy.sess.run(
+            self.model.intrinsic_reward, feed_dict=feed_dict
+        )
+        scaled_reward = unscaled_reward * float(self.has_updated) * self.strength
+        return RewardSignalResult(scaled_reward, unscaled_reward)
+
+    @classmethod
+    def check_config(
+        cls, config_dict: Dict[str, Any], param_keys: List[str] = None
+    ) -> None:
+        """
+        Checks the config and throw an exception if a hyperparameter is missing. GAIL requires strength and gamma 
+        at minimum. 
+        """
+        param_keys = ["strength", "gamma", "demo_path"]
+        super().check_config(config_dict, param_keys)
+
+    def update(self, update_buffer: Buffer, n_sequences: int) -> Dict[str, float]:
+        """
+        Updates model using buffer.
+        :param update_buffer: The policy buffer containing the trajectories for the current policy.
+        :param n_sequences: The number of sequences from demo and policy used in each mini batch.
+        :return: The loss of the update.
+        """
+        batch_losses = []
+        # Divide by 2 since we have two buffers, so we have roughly the same batch size
+        n_sequences = max(n_sequences // 2, 1)
+        possible_demo_batches = (
+            len(self.demonstration_buffer.update_buffer["actions"]) // n_sequences
+        )
+        possible_policy_batches = len(update_buffer["actions"]) // n_sequences
+        possible_batches = min(possible_policy_batches, possible_demo_batches)
+
+        max_batches = self.samples_per_update // n_sequences
+
+        kl_loss = []
+        policy_estimate = []
+        expert_estimate = []
+        z_log_sigma_sq = []
+        z_mean_expert = []
+        z_mean_policy = []
+
+        n_epoch = self.num_epoch
+        for _epoch in range(n_epoch):
+            self.demonstration_buffer.update_buffer.shuffle()
+            update_buffer.shuffle()
+            if max_batches == 0:
+                num_batches = possible_batches
+            else:
+                num_batches = min(possible_batches, max_batches)
+            for i in range(num_batches):
+                demo_update_buffer = self.demonstration_buffer.update_buffer
+                policy_update_buffer = update_buffer
+                start = i * n_sequences
+                end = (i + 1) * n_sequences
+                mini_batch_demo = demo_update_buffer.make_mini_batch(start, end)
+                mini_batch_policy = policy_update_buffer.make_mini_batch(start, end)
+                run_out = self._update_batch(mini_batch_demo, mini_batch_policy)
+                loss = run_out["gail_loss"]
+
+                policy_estimate.append(run_out["policy_estimate"])
+                expert_estimate.append(run_out["expert_estimate"])
+                if self.model.use_vail:
+                    kl_loss.append(run_out["kl_loss"])
+                    z_log_sigma_sq.append(run_out["z_log_sigma_sq"])
+                    z_mean_policy.append(run_out["z_mean_policy"])
+                    z_mean_expert.append(run_out["z_mean_expert"])
+
+                batch_losses.append(loss)
+        self.has_updated = True
+
+        print_list = ["n_epoch", "beta", "policy_estimate", "expert_estimate"]
+        print_vals = [
+            n_epoch,
+            self.policy.sess.run(self.model.beta),
+            np.mean(policy_estimate),
+            np.mean(expert_estimate),
+        ]
+        if self.model.use_vail:
+            print_list += [
+                "kl_loss",
+                "z_mean_expert",
+                "z_mean_policy",
+                "z_log_sigma_sq",
+            ]
+            print_vals += [
+                np.mean(kl_loss),
+                np.mean(z_mean_expert),
+                np.mean(z_mean_policy),
+                np.mean(z_log_sigma_sq),
+            ]
+        LOGGER.debug(
+            "GAIL Debug:\n\t\t"
+            + "\n\t\t".join(
+                "{0}: {1}".format(_name, _val)
+                for _name, _val in zip(print_list, print_vals)
+            )
+        )
+        update_stats = {"Losses/GAIL Loss": np.mean(batch_losses)}
+        return update_stats
+
+    def _update_batch(
+        self,
+        mini_batch_demo: Dict[str, np.ndarray],
+        mini_batch_policy: Dict[str, np.ndarray],
+    ) -> Dict[str, float]:
+        """
+        Helper method for update.
+        :param mini_batch_demo: A mini batch of expert trajectories
+        :param mini_batch_policy: A mini batch of trajectories sampled from the current policy
+        :return: Output from update process.
+        """
+        feed_dict: Dict[tf.Tensor, Any] = {
+            self.model.done_expert: mini_batch_demo["done"].reshape([-1, 1]),
+            self.model.done_policy: mini_batch_policy["done"].reshape([-1, 1]),
+        }
+
+        if self.model.use_vail:
+            feed_dict[self.model.use_noise] = [1]
+
+        if self.policy.use_continuous_act:
+            feed_dict[self.policy.model.selected_actions] = mini_batch_policy[
+                "actions"
+            ].reshape([-1, self.policy.model.act_size[0]])
+            feed_dict[self.model.action_in_expert] = mini_batch_demo["actions"].reshape(
+                [-1, self.policy.model.act_size[0]]
+            )
+        else:
+            feed_dict[self.policy.model.action_holder] = mini_batch_policy[
+                "actions"
+            ].reshape([-1, len(self.policy.model.act_size)])
+            feed_dict[self.model.action_in_expert] = mini_batch_demo["actions"].reshape(
+                [-1, len(self.policy.model.act_size)]
+            )
+
+        if self.policy.use_vis_obs > 0:
+            for i in range(len(self.policy.model.visual_in)):
+                policy_obs = mini_batch_policy["visual_obs%d" % i]
+                if self.policy.sequence_length > 1 and self.policy.use_recurrent:
+                    (_batch, _seq, _w, _h, _c) = policy_obs.shape
+                    feed_dict[self.policy.model.visual_in[i]] = policy_obs.reshape(
+                        [-1, _w, _h, _c]
+                    )
+                else:
+                    feed_dict[self.policy.model.visual_in[i]] = policy_obs
+
+                demo_obs = mini_batch_demo["visual_obs%d" % i]
+                if self.policy.sequence_length > 1 and self.policy.use_recurrent:
+                    (_batch, _seq, _w, _h, _c) = demo_obs.shape
+                    feed_dict[self.model.expert_visual_in[i]] = demo_obs.reshape(
+                        [-1, _w, _h, _c]
+                    )
+                else:
+                    feed_dict[self.model.expert_visual_in[i]] = demo_obs
+        if self.policy.use_vec_obs:
+            feed_dict[self.policy.model.vector_in] = mini_batch_policy[
+                "vector_obs"
+            ].reshape([-1, self.policy.vec_obs_size])
+            feed_dict[self.model.obs_in_expert] = mini_batch_demo["vector_obs"].reshape(
+                [-1, self.policy.vec_obs_size]
+            )
+
+        out_dict = {
+            "gail_loss": self.model.loss,
+            "update_batch": self.model.update_batch,
+            "policy_estimate": self.model.policy_estimate,
+            "expert_estimate": self.model.expert_estimate,
+        }
+        if self.model.use_vail:
+            out_dict["kl_loss"] = self.model.kl_loss
+            out_dict["z_log_sigma_sq"] = self.model.z_log_sigma_sq
+            out_dict["z_mean_expert"] = self.model.z_mean_expert
+            out_dict["z_mean_policy"] = self.model.z_mean_policy
+
+        run_out = self.policy.sess.run(out_dict, feed_dict=feed_dict)
+        if self.model.use_vail:
+            self.update_beta(run_out["kl_loss"])
+        return run_out
+
+    def update_beta(self, kl_div: float) -> None:
+        """
+        Updates the Beta parameter with the latest kl_divergence value.
+        The larger Beta, the stronger the importance of the kl divergence in the loss function.
+        :param kl_div: The KL divergence
+        """
+        self.policy.sess.run(
+            self.model.update_beta, feed_dict={self.model.kl_div_input: kl_div}
+        )
--- a/ml-agents/mlagents/trainers/tests/test_demo_dir/test.demo
+++ b/ml-agents/mlagents/trainers/tests/test_demo_dir/test.demo
+Test9 -��@**0:Ball3DBrain7
+ �k?����<�
���@HZ��"P���������<
+ �k?����<�
���;|@HZ���"{�"=���=P���������<
+ �k?����<�
��0r@HZ���"��"=���=P���������<
+ �k?����<�
���a@HZ��Z<�"=���=P���������<
+ �k?����<�
���BK@HZ���"{�"=���=P���������<
+ �k?����<�
��|a.@HZ������"=���=P���������<
+ �k?����<�
���8@HZ��Z��"=���=P���������<
+ �k?����<�
�����?HZ��r���"=���=P���������<
+ �k?����<�
��0FH?HZ���"��"=���=P���������<
+ �k?����<�뵾�+?D#���>����7-�>"=���=P���������<
+ �k?����<`��P�*?�
+����">ש��`Ъ>"=���=P���������<
+ �k?����<@w���*?����A�3>b���_L�>"=���=P���������<
+ �k?����< ���0s)?8����ID>
+hƼ���>"=���=P���������<
+ �k?����<����`�(?�!����T>H׼�R�>"=���=P���������<
+ �k?����<���� (?�I����e>.9�����>"=���=P���������<
+ �k?����<@u�PV'?�v��Rv>����U$?"=���=P���������<
+ �k?����<�r[��&?��h��>]��x�	?"=���=P���������<
+ �k?����<�%@�@�%?hBZ�8׋>�Y
�g�?"=���=P���������<
+ �k?����<@.#��$?�K�A(�>����Y?"=���=P���������<
+ �k?����<��P�#?P;�`��>�0��$?"=���=P���������<
+ �k?����<��Ƚ0�"?�*��Τ>m�&�3�,?"=���=P���������<
+ �k?����<������!?HF�'�>�/�T�5?"=���=P���������<
+ �k?����<L��p� ?И�Rr�>�g7��A>?"=���=P���������<
+ �k?����< �;�b?���6˽>(�?�\G?"=���=P���������<
+ �k?����<R6=�'?�?���>`6H�/�O?"=���=P���������<
+ �k?����<�d�=��? ����l�><�P��rX?"=���=P���������<
+ �k?����<y>��?�H����>
Y��a?"=���=P���������<
+ �k?����<�i,>P&?�ٽ�>3~a��i?"=���=P���������<
+ �k?����<@Z>��?HN��K�>��i�ӆr?"=���=P���������<
+ �k?����<���>p7?�p�=���>�Rr��K{?"=���=P���������<
+ �k?����<��>��?`�<>��>��z��?"=���=P���������<
+ �k?����<`d�>�?�s�>�!?����CZ�?"=���=P���������<
+ �k?����<���>@l?@=�>�G?B�����?"=���=P���������<
+ �k?����<y�>P�?b?�m?�扽�
�?"=���=P���������<
+ �k?����<��?�?�?]�?����f�?"=���=P���������<
+ �k?����<@�?@(
?��<?��?�I�����?"=���=P���������<
+ �k?����<�� ?L?�c[?��?�x��<�?"=���=P���������<
+ �k?����<��/?�b	?�!{?e?6���zp�?"=���=P���������<
+ �k?����<@W??�k?�ލ?�&?[ߞ�hȤ?"=���=P���������<
+ �k?����<NO?0g?��?�J!?m�� �?"=���=P���������<
+ �k?����<��_?pU?hȯ?�n%?�=��Nw�?"=���=P���������<
+ �k?����<�yp?P6?�c�?=�)?�f��Eα?"=���=P���������<
+ �k?����<׀?��>ln�?t�-?]����$�?"=���=P���������<
+ �k?����<X��?���>��?W�1?�ó�A{�?"=���=P���������<
+ �k?����<���?@�>���?��5?����BѾ?"=���=P���������<
+ �k?����<���?�g�>@1:?�%���&�?"=���=P���������<
+ �k?����<�Q�?���>_�@*?>?�L��T|�?"=���=P���������<
+ �k?����<���? ��>@�`B?A�Ľf��?"=���=P���������<
+ �k?����<�˸?]�>hd$@�AF?*{��2�?"=���=P���������<
+ �k?����<���? ��>��.@C?6�K�r-�?"=���=P���������<
+ �k?����<XN�? U�>P\9@9A?��ڿC��?"=���=P���������>
+ yM�����=4g=�@,��?"=��@P���������<
+ yM�����=4g=�;|@,��?�"{�"=���=P���������<
+ yM�����=4g=0r@,��?�"��"=���=P���������<
+ yM�����=4g=�a@,��?Z<�"=���=P���������<
+ yM�����=4g=�BK@,��?�"{�"=���=P���������<
+ yM�����=4g=|a.@,��?����"=���=P���������Test9 -��@
--- a/ml-agents/mlagents/trainers/tests/test_demo_dir/test2.demo
+++ b/ml-agents/mlagents/trainers/tests/test_demo_dir/test2.demo
+Test9 -��@**0:Ball3DBrain7
+ �k?����<�
���@HZ��"P���������<
+ �k?����<�
���;|@HZ���"{�"=���=P���������<
+ �k?����<�
��0r@HZ���"��"=���=P���������<
+ �k?����<�
���a@HZ��Z<�"=���=P���������<
+ �k?����<�
���BK@HZ���"{�"=���=P���������<
+ �k?����<�
��|a.@HZ������"=���=P���������<
+ �k?����<�
���8@HZ��Z��"=���=P���������<
+ �k?����<�
�����?HZ��r���"=���=P���������<
+ �k?����<�
��0FH?HZ���"��"=���=P���������<
+ �k?����<�뵾�+?D#���>����7-�>"=���=P���������<
+ �k?����<`��P�*?�
+����">ש��`Ъ>"=���=P���������<
+ �k?����<@w���*?����A�3>b���_L�>"=���=P���������<
+ �k?����< ���0s)?8����ID>
+hƼ���>"=���=P���������<
+ �k?����<����`�(?�!����T>H׼�R�>"=���=P���������<
+ �k?����<���� (?�I����e>.9�����>"=���=P���������<
+ �k?����<@u�PV'?�v��Rv>����U$?"=���=P���������<
+ �k?����<�r[��&?��h��>]��x�	?"=���=P���������<
+ �k?����<�%@�@�%?hBZ�8׋>�Y
�g�?"=���=P���������<
+ �k?����<@.#��$?�K�A(�>����Y?"=���=P���������<
+ �k?����<��P�#?P;�`��>�0��$?"=���=P���������<
+ �k?����<��Ƚ0�"?�*��Τ>m�&�3�,?"=���=P���������<
+ �k?����<������!?HF�'�>�/�T�5?"=���=P���������<
+ �k?����<L��p� ?И�Rr�>�g7��A>?"=���=P���������<
+ �k?����< �;�b?���6˽>(�?�\G?"=���=P���������<
+ �k?����<R6=�'?�?���>`6H�/�O?"=���=P���������<
+ �k?����<�d�=��? ����l�><�P��rX?"=���=P���������<
+ �k?����<y>��?�H����>
Y��a?"=���=P���������<
+ �k?����<�i,>P&?�ٽ�>3~a��i?"=���=P���������<
+ �k?����<@Z>��?HN��K�>��i�ӆr?"=���=P���������<
+ �k?����<���>p7?�p�=���>�Rr��K{?"=���=P���������<
+ �k?����<��>��?`�<>��>��z��?"=���=P���������<
+ �k?����<`d�>�?�s�>�!?����CZ�?"=���=P���������<
+ �k?����<���>@l?@=�>�G?B�����?"=���=P���������<
+ �k?����<y�>P�?b?�m?�扽�
�?"=���=P���������<
+ �k?����<��?�?�?]�?����f�?"=���=P���������<
+ �k?����<@�?@(
?��<?��?�I�����?"=���=P���������<
+ �k?����<�� ?L?�c[?��?�x��<�?"=���=P���������<
+ �k?����<��/?�b	?�!{?e?6���zp�?"=���=P���������<
+ �k?����<@W??�k?�ލ?�&?[ߞ�hȤ?"=���=P���������<
+ �k?����<NO?0g?��?�J!?m�� �?"=���=P���������<
+ �k?����<��_?pU?hȯ?�n%?�=��Nw�?"=���=P���������<
+ �k?����<�yp?P6?�c�?=�)?�f��Eα?"=���=P���������<
+ �k?����<׀?��>ln�?t�-?]����$�?"=���=P���������<
+ �k?����<X��?���>��?W�1?�ó�A{�?"=���=P���������<
+ �k?����<���?@�>���?��5?����BѾ?"=���=P���������<
+ �k?����<���?�g�>@1:?�%���&�?"=���=P���������<
+ �k?����<�Q�?���>_�@*?>?�L��T|�?"=���=P���������<
+ �k?����<���? ��>@�`B?A�Ľf��?"=���=P���������<
+ �k?����<�˸?]�>hd$@�AF?*{��2�?"=���=P���������<
+ �k?����<���? ��>��.@C?6�K�r-�?"=���=P���������<
+ �k?����<XN�? U�>P\9@9A?��ڿC��?"=���=P���������>
+ yM�����=4g=�@,��?"=��@P���������<
+ yM�����=4g=�;|@,��?�"{�"=���=P���������<
+ yM�����=4g=0r@,��?�"��"=���=P���������<
+ yM�����=4g=�a@,��?Z<�"=���=P���������<
+ yM�����=4g=�BK@,��?�"{�"=���=P���������<
+ yM�����=4g=|a.@,��?����"=���=P���������Test9 -��@
--- a/ml-agents/mlagents/trainers/tests/test_demo_dir/test3.demo
+++ b/ml-agents/mlagents/trainers/tests/test_demo_dir/test3.demo
+Test9 -��@**0:Ball3DBrain7
+ �k?����<�
���@HZ��"P���������<
+ �k?����<�
���;|@HZ���"{�"=���=P���������<
+ �k?����<�
��0r@HZ���"��"=���=P���������<
+ �k?����<�
���a@HZ��Z<�"=���=P���������<
+ �k?����<�
���BK@HZ���"{�"=���=P���������<
+ �k?����<�
��|a.@HZ������"=���=P���������<
+ �k?����<�
���8@HZ��Z��"=���=P���������<
+ �k?����<�
�����?HZ��r���"=���=P���������<
+ �k?����<�
��0FH?HZ���"��"=���=P���������<
+ �k?����<�뵾�+?D#���>����7-�>"=���=P���������<
+ �k?����<`��P�*?�
+����">ש��`Ъ>"=���=P���������<
+ �k?����<@w���*?����A�3>b���_L�>"=���=P���������<
+ �k?����< ���0s)?8����ID>
+hƼ���>"=���=P���������<
+ �k?����<����`�(?�!����T>H׼�R�>"=���=P���������<
+ �k?����<���� (?�I����e>.9�����>"=���=P���������<
+ �k?����<@u�PV'?�v��Rv>����U$?"=���=P���������<
+ �k?����<�r[��&?��h��>]��x�	?"=���=P���������<
+ �k?����<�%@�@�%?hBZ�8׋>�Y
�g�?"=���=P���������<
+ �k?����<@.#��$?�K�A(�>����Y?"=���=P���������<
+ �k?����<��P�#?P;�`��>�0��$?"=���=P���������<
+ �k?����<��Ƚ0�"?�*��Τ>m�&�3�,?"=���=P���������<
+ �k?����<������!?HF�'�>�/�T�5?"=���=P���������<
+ �k?����<L��p� ?И�Rr�>�g7��A>?"=���=P���������<
+ �k?����< �;�b?���6˽>(�?�\G?"=���=P���������<
+ �k?����<R6=�'?�?���>`6H�/�O?"=���=P���������<
+ �k?����<�d�=��? ����l�><�P��rX?"=���=P���������<
+ �k?����<y>��?�H����>
Y��a?"=���=P���������<
+ �k?����<�i,>P&?�ٽ�>3~a��i?"=���=P���������<
+ �k?����<@Z>��?HN��K�>��i�ӆr?"=���=P���������<
+ �k?����<���>p7?�p�=���>�Rr��K{?"=���=P���������<
+ �k?����<��>��?`�<>��>��z��?"=���=P���������<
+ �k?����<`d�>�?�s�>�!?����CZ�?"=���=P���������<
+ �k?����<���>@l?@=�>�G?B�����?"=���=P���������<
+ �k?����<y�>P�?b?�m?�扽�
�?"=���=P���������<
+ �k?����<��?�?�?]�?����f�?"=���=P���������<
+ �k?����<@�?@(
?��<?��?�I�����?"=���=P���������<
+ �k?����<�� ?L?�c[?��?�x��<�?"=���=P���������<
+ �k?����<��/?�b	?�!{?e?6���zp�?"=���=P���������<
+ �k?����<@W??�k?�ލ?�&?[ߞ�hȤ?"=���=P���������<
+ �k?����<NO?0g?��?�J!?m�� �?"=���=P���������<
+ �k?����<��_?pU?hȯ?�n%?�=��Nw�?"=���=P���������<
+ �k?����<�yp?P6?�c�?=�)?�f��Eα?"=���=P���������<
+ �k?����<׀?��>ln�?t�-?]����$�?"=���=P���������<
+ �k?����<X��?���>��?W�1?�ó�A{�?"=���=P���������<
+ �k?����<���?@�>���?��5?����BѾ?"=���=P���������<
+ �k?����<���?�g�>@1:?�%���&�?"=���=P���������<
+ �k?����<�Q�?���>_�@*?>?�L��T|�?"=���=P���������<
+ �k?����<���? ��>@�`B?A�Ľf��?"=���=P���������<
+ �k?����<�˸?]�>hd$@�AF?*{��2�?"=���=P���������<
+ �k?����<���? ��>��.@C?6�K�r-�?"=���=P���������<
+ �k?����<XN�? U�>P\9@9A?��ڿC��?"=���=P���������>
+ yM�����=4g=�@,��?"=��@P���������<
+ yM�����=4g=�;|@,��?�"{�"=���=P���������<
+ yM�����=4g=0r@,��?�"��"=���=P���������<
+ yM�����=4g=�a@,��?Z<�"=���=P���������<
+ yM�����=4g=�BK@,��?�"{�"=���=P���������<
+ yM�����=4g=|a.@,��?����"=���=P���������Test9 -��@