浏览代码

GAIL and Pretraining (#2118)

Based on the new reward signals architecture, add BC pretrainer and GAIL for PPO. Main changes:

- A new GAILRewardSignal and GAILModel for GAIL/VAIL
- A BCModule component (not a reward signal) to do pretraining during RL
- Documentation for both of these
- Change to Demo Loader that lets you load multiple demo files in a folder
- Example Demo files for all of our tested sample environments (for future regression testing)
/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
9c50abcf
共有 44 个文件被更改,包括 15563 次插入155 次删除
  1. 125
      docs/Training-Imitation-Learning.md
  2. 70
      docs/Training-PPO.md
  3. 99
      docs/Training-RewardSignals.md
  4. 7
      ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py
  5. 2
      ml-agents/mlagents/trainers/components/reward_signals/reward_signal_factory.py
  6. 37
      ml-agents/mlagents/trainers/demo_loader.py
  7. 14
      ml-agents/mlagents/trainers/ppo/policy.py
  8. 4
      ml-agents/mlagents/trainers/ppo/trainer.py
  9. 49
      ml-agents/mlagents/trainers/tests/mock_brain.py
  10. 13
      ml-agents/mlagents/trainers/tests/test_demo_loader.py
  11. 154
      ml-agents/mlagents/trainers/tests/test_reward_signals.py
  12. 92
      docs/Training-BehavioralCloning.md
  13. 80
      docs/images/mlagents-ImitationAndRL.png
  14. 158
      ml-agents/mlagents/trainers/tests/test_bcmodule.py
  15. 1001
      ml-agents/mlagents/trainers/tests/testdcvis.demo
  16. 442
      demos/Expert3DBall.demo
  17. 1001
      demos/Expert3DBallHard.demo
  18. 1001
      demos/ExpertBanana.demo
  19. 171
      demos/ExpertBasic.demo
  20. 198
      demos/ExpertBouncer.demo
  21. 1001
      demos/ExpertCrawlerSta.demo
  22. 1001
      demos/ExpertGrid.demo
  23. 1001
      demos/ExpertHallway.demo
  24. 1001
      demos/ExpertPush.demo
  25. 1001
      demos/ExpertPyramid.demo
  26. 1001
      demos/ExpertReacher.demo
  27. 1001
      demos/ExpertSoccerGoal.demo
  28. 1001
      demos/ExpertSoccerStri.demo
  29. 1001
      demos/ExpertTennis.demo
  30. 1001
      demos/ExpertWalker.demo
  31. 1
      ml-agents/mlagents/trainers/components/bc/__init__.py
  32. 101
      ml-agents/mlagents/trainers/components/bc/model.py
  33. 172
      ml-agents/mlagents/trainers/components/bc/module.py
  34. 1
      ml-agents/mlagents/trainers/components/reward_signals/gail/__init__.py
  35. 265
      ml-agents/mlagents/trainers/components/reward_signals/gail/model.py
  36. 270
      ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py
  37. 60
      ml-agents/mlagents/trainers/tests/test_demo_dir/test.demo
  38. 60
      ml-agents/mlagents/trainers/tests/test_demo_dir/test2.demo
  39. 60
      ml-agents/mlagents/trainers/tests/test_demo_dir/test3.demo

125
docs/Training-Imitation-Learning.md


Imitation Learning uses pairs of observations and actions from
from a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).
Imitation learning can also be used to help reinforcement learning. Especially in
environments with sparse (i.e., infrequent or rare) rewards, the agent may never see
the reward and thus not learn from it. Curiosity helps the agent explore, but in some cases
it is easier to just show the agent how to achieve the reward. In these cases,
imitation learning can dramatically reduce the time it takes to solve the environment.
For instance, on the [Pyramids environment](Learning-Environment-Examples.md#pyramids),
just 6 episodes of demonstrations can reduce training steps by more than 4 times.
<p align="center">
<img src="images/mlagents-ImitationAndRL.png"
alt="Using Demonstrations with Reinforcement Learning"
width="350" border="0" />
</p>
ML-Agents provides several ways to learn from demonstrations. For most situations,
[GAIL](Training-RewardSignals.md#the-gail-reward-signal) is the preferred approach.
* To train using GAIL (Generative Adversarial Imitaiton Learning) you can add the
[GAIL reward signal](Training-RewardSignals.md#the-gail-reward-signal). GAIL can be
used with or without environment rewards, and works well when there are a limited
number of demonstrations.
* To help bootstrap reinforcement learning, you can enable
[pretraining](Training-PPO.md#optional-pretraining-using-demonstrations)
on the PPO trainer, in addition to using a small GAIL reward signal.
* To train an agent to exactly mimic demonstrations, you can use the
[Behavioral Cloning](Training-BehavioralCloning.md) trainer. Behavioral Cloning can be
used offline and online (in-editor), and learns very quickly. However, it usually is ineffective
on more complex environments without a large number of demonstrations.
## Recording Demonstrations
It is possible to record demonstrations of agent behavior from the Unity Editor,

alt="BC Teacher Helper"
width="375" border="10" />
</p>
## Training with Behavioral Cloning
There are a variety of possible imitation learning algorithms which can
be used, the simplest one of them is Behavioral Cloning. It works by collecting
demonstrations from a teacher, and then simply uses them to directly learn a
policy, in the same way the supervised learning for image classification
or other traditional Machine Learning tasks work.
### Offline Training
With offline behavioral cloning, we can use demonstrations (`.demo` files)
generated using the `Demonstration Recorder` as the dataset used to train a behavior.
1. Choose an agent you would like to learn to imitate some set of demonstrations.
2. Record a set of demonstration using the `Demonstration Recorder` (see above).
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to
Control in the Broadcast Hub. For more information on Brains, see
[here](Learning-Environment-Design-Brains.md).
4. Open the `config/offline_bc_config.yaml` file.
5. Modify the `demo_path` parameter in the file to reference the path to the
demonstration file recorded in step 2. In our case this is:
`./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml`
as the config parameter, and include the `--run-id` and `--train` as usual.
Provide your environment as the `--env` parameter if it has been compiled
as standalone, or omit to train in the editor.
7. (Optional) Observe training performance using TensorBoard.
This will use the demonstration file to train a neural network driven agent
to directly imitate the actions provided in the demonstration. The environment
will launch and be used for evaluating the agent's performance during training.
### Online Training
It is also possible to provide demonstrations in realtime during training,
without pre-recording a demonstration file. The steps to do this are as follows:
1. First create two Brains, one which will be the "Teacher," and the other which
will be the "Student." We will assume that the names of the Brain
Assets are "Teacher" and "Student" respectively.
2. The "Teacher" Brain must be a **Player Brain**. You must properly
configure the inputs to map to the corresponding actions.
3. The "Student" Brain must be a **Learning Brain**.
4. The Brain Parameters of both the "Teacher" and "Student" Brains must be
compatible with the agent.
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
and check the `Control` checkbox on the "Student" Brain.
6. Link the Brains to the desired Agents (one Agent as the teacher and at least
one Agent as a student).
7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
the `trainer` parameter of this entry to `online_bc`, and the
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
Additionally, set `batches_per_epoch`, which controls how much training to do
each moment. Increase the `max_steps` option if you'd like to keep training
the Agents for a longer period of time.
8. Launch the training process with `mlagents-learn config/online_bc_config.yaml
--train --slow`, and press the :arrow_forward: button in Unity when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
9. From the Unity window, control the Agent with the Teacher Brain by providing
"teacher demonstrations" of the behavior you would like to see.
10. Watch as the Agent(s) with the student Brain attached begin to behave
similarly to the demonstrations.
11. Once the Student Agents are exhibiting the desired behavior, end the training
process with `CTL+C` from the command line.
12. Move the resulting `*.nn` file into the `TFModels` subdirectory of the
Assets folder (or a subdirectory within Assets of your choosing) , and use
with `Learning` Brain.
**BC Teacher Helper**
We provide a convenience utility, `BC Teacher Helper` component that you can add
to the Teacher Agent.
<p align="center">
<img src="images/bc_teacher_helper.png"
alt="BC Teacher Helper"
width="375" border="10" />
</p>
This utility enables you to use keyboard shortcuts to do the following:
1. To start and stop recording experiences. This is useful in case you'd like to
interact with the game _but not have the agents learn from these
interactions_. The default command to toggle this is to press `R` on the
keyboard.
2. Reset the training buffer. This enables you to instruct the agents to forget
their buffer of recent experiences. This is useful if you'd like to get them
to quickly learn a new behavior. The default command to reset the buffer is
to press `C` on the keyboard.

70
docs/Training-PPO.md


presented to an agent, see [Training with Curriculum
Learning](Training-Curriculum-Learning.md).
For information about imitation learning, which uses a different training
algorithm, see
For information about imitation learning from demonstrations, see
[Training with Imitation Learning](Training-Imitation-Learning.md).
## Best Practices when training with PPO

the agent will need to remember in order to successfully complete the task.
Typical Range: `64` - `512`
## (Optional) Pretraining Using Demonstrations
In some cases, you might want to bootstrap the agent's policy using behavior recorded
from a player. This can help guide the agent towards the reward. Pretraining adds
training operations that mimic a demonstration rather than attempting to maximize reward.
It is essentially equivalent to running [behavioral cloning](./Training-BehavioralCloning.md)
in-line with PPO.
To use pretraining, add a `pretraining` section to the trainer_config. For instance:
```
pretraining:
demo_path: ./demos/ExpertPyramid.demo
strength: 0.5
steps: 10000
```
Below are the avaliable hyperparameters for pretraining.
### Strength
`strength` corresponds to the learning rate of the imitation relative to the learning
rate of PPO, and roughly corresponds to how strongly we allow the behavioral cloning
to influence the policy.
Typical Range: `0.1` - `0.5`
### Demo Path
`demo_path` is the path to your `.demo` file or directory of `.demo` files.
See the [imitation learning guide](Training-ImitationLearning.md) for more on `.demo` files.
### Steps
During pretraining, it is often desirable to stop using demonstrations after the agent has
"seen" rewards, and allow it to optimize past the available demonstrations and/or generalize
outside of the provided demonstrations. `steps` corresponds to the training steps over which
pretraining is active. The learning rate of the pretrainer will anneal over the steps. Set
the steps to 0 for constant imitation over the entire training run.
### (Optional) Batch Size
`batch_size` is the number of demonstration experiences used for one iteration of a gradient
descent update. If not specified, it will default to the `batch_size` defined for PPO.
Typical Range (Continuous): `512` - `5120`
Typical Range (Discrete): `32` - `512`
### (Optional) Number of Epochs
`num_epoch` is the number of passes through the experience buffer during
gradient descent. If not specified, it will default to the number of epochs set for PPO.
Typical Range: `3` - `10`
### (Optional) Samples Per Update
`samples_per_update` is the maximum number of samples
to use during each imitation update. You may want to lower this if your demonstration
dataset is very large to avoid overfitting the policy on demonstrations. Set to 0
to train over all of the demonstrations at each update step.
Default Value: `0` (all)
Typical Range: Approximately equal to PPO's `buffer_size`
## Training Statistics

99
docs/Training-RewardSignals.md


observation, but also not too small to prevent it from learning to differentiate between
demonstrated and actual behavior.
Default Value: 64
Default Value: `64`
Typical Range: `64` - `256`
#### Learning Rate

Default Value: `3e-4`
### The GAIL Reward Signal
GAIL, or [Generative Adversarial Imitation Learning](https://arxiv.org/abs/1606.03476), is an
imitation learning algorithm that uses an adversarial approach, in a similar vein to GANs
(Generative Adversarial Networks). In this framework, a second neural network, the
discriminator, is taught to distinguish whether an observation/action is from a demonstration, or
produced by the agent. This discriminator can the examine a new observation/action and provide it a
reward based on how close it believes this new observation/action is to the provided demonstrations.
At each training step, the agent tries to learn how to maximize this reward. Then, the
discriminator is trained to better distinguish between demonstrations and agent state/actions.
In this way, while the agent gets better and better at mimicing the demonstrations, the
discriminator keeps getting stricter and stricter and the agent must try harder to "fool" it.
This approach, when compared to [Behavioral Cloning](Training-BehavioralCloning.md), requires
far fewer demonstrations to be provided. After all, we are still learning a policy that happens
to be similar to the demonstration, not directly copying the behavior of the demonstrations. It
is also especially effective when combined with an Extrinsic signal, but can also be used
independently to purely learn from demonstration.
Using GAIL requires recorded demonstrations from your Unity environment. See the
[imitation learning guide](Training-Imitation-Learning.md) to learn more about recording demonstrations.
#### Strength
`strength` is the factor by which to multiply the raw reward. Note that when using GAIL
with an Extrinsic Signal, this value should be set lower if your demonstrations are
suboptimal (e.g. from a human), so that a trained agent will focus on receiving extrinsic
rewards instead of exactly copying the demonstrations. Keep the strength below about 0.1 in those cases.
Typical Range: `0.01` - `1.0`
#### Gamma
`gamma` corresponds to the discount factor for future rewards.
Typical Range: `0.8` - `0.9`
#### Demo Path
`demo_path` is the path to your `.demo` file or directory of `.demo` files. See the [imitation learning guide]
(Training-ImitationLearning.md).
#### Encoding Size
`encoding_size` corresponds to the size of the hidden layer used by the discriminator.
This value should be small enough to encourage the discriminator to compress the original
observation, but also not too small to prevent it from learning to differentiate between
demonstrated and actual behavior. Dramatically increasing this size will also negatively affect
training times.
Default Value: `64`
Typical Range: `64` - `256`
#### Learning Rate
`learning_rate` is the learning rate used to update the discriminator.
This should typically be decreased if training is unstable, and the GAIL loss is unstable.
Default Value: `3e-4`
Typical Range: `1e-5` - `1e-3`
#### Use Actions
`use_actions` determines whether the discriminator should discriminate based on both
observations and actions, or just observations. Set to `True` if you want the agent to
mimic the actions from the demonstrations, and `False` if you'd rather have the agent
visit the same states as in the demonstrations but with possibly different actions.
Setting to `False` is more likely to be stable, especially with imperfect demonstrations,
but may learn slower.
Default Value: `false`
#### (Optional) Samples Per Update
`samples_per_update` is the maximum number of samples to use during each discriminator update. You may
want to lower this if your buffer size is very large to avoid overfitting the discriminator on current data.
If set to 0, we will use the minimum of buffer size and the number of demonstration samples.
Default Value: `0`
Typical Range: Approximately equal to [`buffer_size`](Training-PPO.md)
#### (Optional) Variational Discriminator Bottleneck
`use_vail` enables a [variational bottleneck](https://arxiv.org/abs/1810.00821) within the
GAIL discriminator. This forces the discriminator to learn a more general representation
and reduces its tendency to be "too good" at discriminating, making learning more stable.
However, it does increase training time. Enable this if you notice your imitation learning is
unstable, or unable to learn the task at hand.
Default Value: `false`

7
ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py


"""
Creates the Curiosity reward generator
:param policy: The Learning Policy
:param encoding_size: The size of the Curiosity encoding
:param signal_strength: The scaling parameter for the reward. The scaled reward will be the unscaled
:param strength: The scaling parameter for the reward. The scaled reward will be the unscaled
:param gamma: The time discounting factor used for this reward.
:param encoding_size: The size of the hidden encoding layer for the ICM
:param learning_rate: The learning rate for the ICM.
:param num_epoch: The number of epochs to train over the training buffer for the ICM.
"""
super().__init__(policy, strength, gamma)
self.model = CuriosityModel(

2
ml-agents/mlagents/trainers/components/reward_signals/reward_signal_factory.py


from mlagents.trainers.components.reward_signals.extrinsic.signal import (
ExtrinsicRewardSignal,
)
from mlagents.trainers.components.reward_signals.gail.signal import GAILRewardSignal
from mlagents.trainers.components.reward_signals.curiosity.signal import (
CuriosityRewardSignal,
)

NAME_TO_CLASS: Dict[str, Type[RewardSignal]] = {
"extrinsic": ExtrinsicRewardSignal,
"curiosity": CuriosityRewardSignal,
"gail": GAILRewardSignal,
}

37
ml-agents/mlagents/trainers/demo_loader.py


import pathlib
import logging
import os
from typing import List, Tuple
from mlagents.envs.communicator_objects import *
from mlagents.envs.communicator_objects import (
AgentInfoProto,
BrainParametersProto,
DemonstrationMetaProto,
)
from google.protobuf.internal.decoder import _DecodeVarint32 # type: ignore

def make_demo_buffer(brain_infos, brain_params, sequence_length):
def make_demo_buffer(
brain_infos: List[BrainInfo], brain_params: BrainParameters, sequence_length: int
) -> Buffer:
# Create and populate buffer using experiences
demo_buffer = Buffer()
for idx, experience in enumerate(brain_infos):

return demo_buffer
def demo_to_buffer(file_path, sequence_length):
def demo_to_buffer(
file_path: str, sequence_length: int
) -> Tuple[BrainParameters, Buffer]:
"""
Loads demonstration file and uses it to fill training buffer.
:param file_path: Location of demonstration file (.demo).

return brain_params, demo_buffer
def load_demonstration(file_path):
def load_demonstration(file_path: str) -> Tuple[BrainParameters, List[BrainInfo], int]:
"""
Loads and parses a demonstration file.
:param file_path: Location of demonstration file (.demo).

all_files = os.listdir(file_path)
for _file in all_files:
if _file.endswith(".demo"):
file_paths.append(_file)
file_paths.append(os.path.join(file_path, _file))
if not all_files:
raise ValueError("There are no '.demo' files in the provided directory.")
file_extension = pathlib.Path(file_path).suffix
if file_extension != ".demo":
raise ValueError(
"The file is not a '.demo' file. Please provide a file with the "
"correct extension."
)
file_extension = pathlib.Path(file_path).suffix
if file_extension != ".demo":
raise ValueError(
"The file is not a '.demo' file. Please provide a file with the "
"correct extension."
)
total_expected = 0
total_expected = 0
total_expected = meta_data_proto.number_steps
total_expected += meta_data_proto.number_steps
pos = INITIAL_POS
if obs_decoded == 1:
brain_param_proto = BrainParametersProto()

14
ml-agents/mlagents/trainers/ppo/policy.py


from mlagents.trainers.components.reward_signals.reward_signal_factory import (
create_reward_signal,
)
from mlagents.trainers.components.bc.module import BCModule
logger = logging.getLogger("mlagents.trainers")

self.reward_signals[reward_signal] = create_reward_signal(
self, reward_signal, config
)
# Create pretrainer if needed
if "pretraining" in trainer_params:
BCModule.check_config(trainer_params["pretraining"])
self.bc_module = BCModule(
self,
policy_learning_rate=trainer_params["learning_rate"],
default_batch_size=trainer_params["batch_size"],
default_num_epoch=trainer_params["num_epoch"],
**trainer_params["pretraining"],
)
else:
self.bc_module = None
if load:
self._load_graph()

4
ml-agents/mlagents/trainers/ppo/trainer.py


)
for stat, val in update_stats.items():
self.stats[stat].append(val)
if self.policy.bc_module:
update_stats = self.policy.bc_module.update()
for stat, val in update_stats.items():
self.stats[stat].append(val)
self.training_buffer.reset_update_buffer()
self.trainer_metrics.end_policy_update()

49
ml-agents/mlagents/trainers/tests/mock_brain.py


import pytest
import numpy as np
from mlagents.trainers.buffer import Buffer
def create_mock_brainparams(
number_visual_observations=0,

mock_env.return_value.brain_names = ["MockBrain"]
mock_env.return_value.reset.return_value = {"MockBrain": mock_braininfo}
mock_env.return_value.step.return_value = {"MockBrain": mock_braininfo}
def simulate_rollout(env, policy, buffer_init_samples):
brain_info_list = []
for i in range(buffer_init_samples):
brain_info_list.append(env.step()[env.brain_names[0]])
buffer = create_buffer(brain_info_list, policy.brain, policy.sequence_length)
return buffer
def create_buffer(brain_infos, brain_params, sequence_length):
buffer = Buffer()
# Make a buffer
for idx, experience in enumerate(brain_infos):
if idx > len(brain_infos) - 2:
break
current_brain_info = brain_infos[idx]
next_brain_info = brain_infos[idx + 1]
buffer[0].last_brain_info = current_brain_info
buffer[0]["done"].append(next_brain_info.local_done[0])
buffer[0]["rewards"].append(next_brain_info.rewards[0])
for i in range(brain_params.number_visual_observations):
buffer[0]["visual_obs%d" % i].append(
current_brain_info.visual_observations[i][0]
)
buffer[0]["next_visual_obs%d" % i].append(
current_brain_info.visual_observations[i][0]
)
if brain_params.vector_observation_space_size > 0:
buffer[0]["vector_obs"].append(current_brain_info.vector_observations[0])
buffer[0]["next_vector_in"].append(
current_brain_info.vector_observations[0]
)
buffer[0]["actions"].append(next_brain_info.previous_vector_actions[0])
buffer[0]["prev_action"].append(current_brain_info.previous_vector_actions[0])
buffer[0]["masks"].append(1.0)
buffer[0]["advantages"].append(1.0)
buffer[0]["action_probs"].append(np.ones(buffer[0]["actions"][0].shape))
buffer[0]["actions_pre"].append(np.ones(buffer[0]["actions"][0].shape))
buffer[0]["random_normal_epsilon"].append(
np.ones(buffer[0]["actions"][0].shape)
)
buffer[0]["action_mask"].append(np.ones(buffer[0]["actions"][0].shape))
buffer[0]["memory"].append(np.ones(8))
buffer.append_update_buffer(0, batch_size=None, training_length=sequence_length)
return buffer

13
ml-agents/mlagents/trainers/tests/test_demo_loader.py


demo_buffer = make_demo_buffer(brain_infos, brain_parameters, 1)
assert len(demo_buffer.update_buffer["actions"]) == total_expected - 1
def test_load_demo_dir():
path_prefix = os.path.dirname(os.path.abspath(__file__))
brain_parameters, brain_infos, total_expected = load_demonstration(
path_prefix + "/test_demo_dir"
)
assert brain_parameters.brain_name == "Ball3DBrain"
assert brain_parameters.vector_observation_space_size == 8
assert len(brain_infos) == total_expected
demo_buffer = make_demo_buffer(brain_infos, brain_parameters, 1)
assert len(demo_buffer.update_buffer["actions"]) == total_expected - 1

154
ml-agents/mlagents/trainers/tests/test_reward_signals.py


from mlagents.trainers.ppo.models import PPOModel
from mlagents.trainers.ppo.trainer import discount_rewards
from mlagents.trainers.ppo.policy import PPOPolicy
from mlagents.trainers.demo_loader import make_demo_buffer
from mlagents.envs import UnityEnvironment
from mlagents.envs.mock_communicator import MockCommunicator

@pytest.fixture
def gail_dummy_config():
return {
"gail": {
"strength": 0.1,
"gamma": 0.9,
"encoding_size": 128,
"demo_path": os.path.dirname(os.path.abspath(__file__)) + "/test.demo",
}
}
@pytest.fixture
VECTOR_ACTION_SPACE = [2]
VECTOR_OBS_SPACE = 8
DISCRETE_ACTION_SPACE = [2]
BUFFER_INIT_SAMPLES = 20
NUM_AGENTS = 12
def create_ppo_policy_mock(

if not use_visual:
mock_brain = mb.create_mock_brainparams(
vector_action_space_type="discrete" if use_discrete else "continuous",
vector_action_space_size=[2],
vector_observation_space_size=8,
vector_action_space_size=DISCRETE_ACTION_SPACE
if use_discrete
else VECTOR_ACTION_SPACE,
vector_observation_space_size=VECTOR_OBS_SPACE,
num_agents=12,
num_vector_observations=8,
num_vector_acts=2,
num_agents=NUM_AGENTS,
num_vector_observations=VECTOR_OBS_SPACE,
num_vector_acts=sum(
DISCRETE_ACTION_SPACE if use_discrete else VECTOR_ACTION_SPACE
),
vector_action_space_size=[2],
vector_action_space_size=DISCRETE_ACTION_SPACE
if use_discrete
else VECTOR_ACTION_SPACE,
num_agents=12,
num_agents=NUM_AGENTS,
num_vector_acts=2,
num_vector_acts=sum(
DISCRETE_ACTION_SPACE if use_discrete else VECTOR_ACTION_SPACE
),
discrete=use_discrete,
)
mb.setup_mock_unityenvironment(mock_env, mock_brain, mock_braininfo)

return env, policy
@mock.patch("mlagents.envs.UnityEnvironment")
def test_curiosity_cc_evaluate(mock_env, dummy_config, curiosity_dummy_config):
env, policy = create_ppo_policy_mock(
mock_env, dummy_config, curiosity_dummy_config, False, False, False
)
def reward_signal_eval(env, policy, reward_signal_name):
scaled_reward, unscaled_reward = policy.reward_signals["curiosity"].evaluate(
# Test evaluate
rsig_result = policy.reward_signals[reward_signal_name].evaluate(
assert scaled_reward.shape == (12,)
assert unscaled_reward.shape == (12,)
assert rsig_result.scaled_reward.shape == (NUM_AGENTS,)
assert rsig_result.unscaled_reward.shape == (NUM_AGENTS,)
def reward_signal_update(env, policy, reward_signal_name):
buffer = mb.simulate_rollout(env, policy, BUFFER_INIT_SAMPLES)
out = policy.reward_signals[reward_signal_name].update(buffer.update_buffer, 2)
assert type(out) is dict
def test_curiosity_dc_evaluate(mock_env, dummy_config, curiosity_dummy_config):
def test_gail_cc(mock_env, dummy_config, gail_dummy_config):
mock_env, dummy_config, curiosity_dummy_config, False, True, False
mock_env, dummy_config, gail_dummy_config, False, False, False
brain_infos = env.reset()
brain_info = brain_infos[env.brain_names[0]]
next_brain_info = env.step()[env.brain_names[0]]
scaled_reward, unscaled_reward = policy.reward_signals["curiosity"].evaluate(
brain_info, next_brain_info
reward_signal_eval(env, policy, "gail")
reward_signal_update(env, policy, "gail")
@mock.patch("mlagents.envs.UnityEnvironment")
def test_gail_dc(mock_env, dummy_config, gail_dummy_config):
env, policy = create_ppo_policy_mock(
mock_env, dummy_config, gail_dummy_config, False, True, False
assert scaled_reward.shape == (12,)
assert unscaled_reward.shape == (12,)
reward_signal_eval(env, policy, "gail")
reward_signal_update(env, policy, "gail")
@mock.patch("mlagents.envs.UnityEnvironment")
def test_gail_visual(mock_env, dummy_config, gail_dummy_config):
gail_dummy_config["gail"]["demo_path"] = (
os.path.dirname(os.path.abspath(__file__)) + "/testdcvis.demo"
)
env, policy = create_ppo_policy_mock(
mock_env, dummy_config, gail_dummy_config, False, True, True
)
reward_signal_eval(env, policy, "gail")
reward_signal_update(env, policy, "gail")
@mock.patch("mlagents.envs.UnityEnvironment")
def test_gail_rnn(mock_env, dummy_config, gail_dummy_config):
env, policy = create_ppo_policy_mock(
mock_env, dummy_config, gail_dummy_config, True, False, False
)
reward_signal_eval(env, policy, "gail")
reward_signal_update(env, policy, "gail")
@mock.patch("mlagents.envs.UnityEnvironment")
def test_curiosity_cc(mock_env, dummy_config, curiosity_dummy_config):
env, policy = create_ppo_policy_mock(
mock_env, dummy_config, curiosity_dummy_config, False, False, False
)
reward_signal_eval(env, policy, "curiosity")
reward_signal_update(env, policy, "curiosity")
@mock.patch("mlagents.envs.UnityEnvironment")
def test_curiosity_dc(mock_env, dummy_config, curiosity_dummy_config):
env, policy = create_ppo_policy_mock(
mock_env, dummy_config, curiosity_dummy_config, False, True, False
)
reward_signal_eval(env, policy, "curiosity")
reward_signal_update(env, policy, "curiosity")
def test_curiosity_visual_evaluate(mock_env, dummy_config, curiosity_dummy_config):
def test_curiosity_visual(mock_env, dummy_config, curiosity_dummy_config):
brain_infos = env.reset()
brain_info = brain_infos[env.brain_names[0]]
next_brain_info = env.step()[env.brain_names[0]]
scaled_reward, unscaled_reward = policy.reward_signals["curiosity"].evaluate(
brain_info, next_brain_info
)
assert scaled_reward.shape == (12,)
assert unscaled_reward.shape == (12,)
reward_signal_eval(env, policy, "curiosity")
reward_signal_update(env, policy, "curiosity")
def test_curiosity_rnn_evaluate(mock_env, dummy_config, curiosity_dummy_config):
def test_curiosity_rnn(mock_env, dummy_config, curiosity_dummy_config):
brain_infos = env.reset()
brain_info = brain_infos[env.brain_names[0]]
next_brain_info = env.step()[env.brain_names[0]]
scaled_reward, unscaled_reward = policy.reward_signals["curiosity"].evaluate(
brain_info, next_brain_info
reward_signal_eval(env, policy, "curiosity")
reward_signal_update(env, policy, "curiosity")
@mock.patch("mlagents.envs.UnityEnvironment")
def test_extrinsic(mock_env, dummy_config, curiosity_dummy_config):
env, policy = create_ppo_policy_mock(
mock_env, dummy_config, curiosity_dummy_config, False, False, False
assert scaled_reward.shape == (12,)
assert unscaled_reward.shape == (12,)
reward_signal_eval(env, policy, "extrinsic")
reward_signal_update(env, policy, "extrinsic")
if __name__ == "__main__":

92
docs/Training-BehavioralCloning.md


# Training with Behavioral Cloning
There are a variety of possible imitation learning algorithms which can
be used, the simplest one of them is Behavioral Cloning. It works by collecting
demonstrations from a teacher, and then simply uses them to directly learn a
policy, in the same way the supervised learning for image classification
or other traditional Machine Learning tasks work.
## Offline Training
With offline behavioral cloning, we can use demonstrations (`.demo` files)
generated using the `Demonstration Recorder` as the dataset used to train a behavior.
1. Choose an agent you would like to learn to imitate some set of demonstrations.
2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)).
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to
Control in the Broadcast Hub. For more information on Brains, see
[here](Learning-Environment-Design-Brains.md).
4. Open the `config/offline_bc_config.yaml` file.
5. Modify the `demo_path` parameter in the file to reference the path to the
demonstration file recorded in step 2. In our case this is:
`./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml`
as the config parameter, and include the `--run-id` and `--train` as usual.
Provide your environment as the `--env` parameter if it has been compiled
as standalone, or omit to train in the editor.
7. (Optional) Observe training performance using TensorBoard.
This will use the demonstration file to train a neural network driven agent
to directly imitate the actions provided in the demonstration. The environment
will launch and be used for evaluating the agent's performance during training.
## Online Training
It is also possible to provide demonstrations in realtime during training,
without pre-recording a demonstration file. The steps to do this are as follows:
1. First create two Brains, one which will be the "Teacher," and the other which
will be the "Student." We will assume that the names of the Brain
Assets are "Teacher" and "Student" respectively.
2. The "Teacher" Brain must be a **Player Brain**. You must properly
configure the inputs to map to the corresponding actions.
3. The "Student" Brain must be a **Learning Brain**.
4. The Brain Parameters of both the "Teacher" and "Student" Brains must be
compatible with the agent.
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
and check the `Control` checkbox on the "Student" Brain.
6. Link the Brains to the desired Agents (one Agent as the teacher and at least
one Agent as a student).
7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
the `trainer` parameter of this entry to `online_bc`, and the
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
Additionally, set `batches_per_epoch`, which controls how much training to do
each moment. Increase the `max_steps` option if you'd like to keep training
the Agents for a longer period of time.
8. Launch the training process with `mlagents-learn config/online_bc_config.yaml
--train --slow`, and press the :arrow_forward: button in Unity when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
9. From the Unity window, control the Agent with the Teacher Brain by providing
"teacher demonstrations" of the behavior you would like to see.
10. Watch as the Agent(s) with the student Brain attached begin to behave
similarly to the demonstrations.
11. Once the Student Agents are exhibiting the desired behavior, end the training
process with `CTL+C` from the command line.
12. Move the resulting `*.nn` file into the `TFModels` subdirectory of the
Assets folder (or a subdirectory within Assets of your choosing) , and use
with `Learning` Brain.
**BC Teacher Helper**
We provide a convenience utility, `BC Teacher Helper` component that you can add
to the Teacher Agent.
<p align="center">
<img src="images/bc_teacher_helper.png"
alt="BC Teacher Helper"
width="375" border="10" />
</p>
This utility enables you to use keyboard shortcuts to do the following:
1. To start and stop recording experiences. This is useful in case you'd like to
interact with the game _but not have the agents learn from these
interactions_. The default command to toggle this is to press `R` on the
keyboard.
2. Reset the training buffer. This enables you to instruct the agents to forget
their buffer of recent experiences. This is useful if you'd like to get them
to quickly learn a new behavior. The default command to reset the buffer is
to press `C` on the keyboard.

80
docs/images/mlagents-ImitationAndRL.png

之前 之后
宽度: 600  |  高度: 371  |  大小: 23 KiB

158
ml-agents/mlagents/trainers/tests/test_bcmodule.py


import unittest.mock as mock
import pytest
import mlagents.trainers.tests.mock_brain as mb
import numpy as np
import yaml
import os
from mlagents.trainers.ppo.policy import PPOPolicy
@pytest.fixture
def dummy_config():
return yaml.safe_load(
"""
trainer: ppo
batch_size: 32
beta: 5.0e-3
buffer_size: 512
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 3.0e-4
max_steps: 5.0e4
normalize: true
num_epoch: 5
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 1000
use_recurrent: false
memory_size: 8
pretraining:
demo_path: ./demos/ExpertPyramid.demo
strength: 1.0
steps: 10000000
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
"""
)
def create_mock_3dball_brain():
mock_brain = mb.create_mock_brainparams(
vector_action_space_type="continuous",
vector_action_space_size=[2],
vector_observation_space_size=8,
)
return mock_brain
def create_mock_banana_brain():
mock_brain = mb.create_mock_brainparams(
number_visual_observations=1,
vector_action_space_type="discrete",
vector_action_space_size=[3, 3, 3, 2],
vector_observation_space_size=0,
)
return mock_brain
def create_ppo_policy_with_bc_mock(
mock_env, mock_brain, dummy_config, use_rnn, demo_file
):
mock_braininfo = mb.create_mock_braininfo(num_agents=12, num_vector_observations=8)
mb.setup_mock_unityenvironment(mock_env, mock_brain, mock_braininfo)
env = mock_env()
trainer_parameters = dummy_config
model_path = env.brain_names[0]
trainer_parameters["model_path"] = model_path
trainer_parameters["keep_checkpoints"] = 3
trainer_parameters["use_recurrent"] = use_rnn
trainer_parameters["pretraining"]["demo_path"] = (
os.path.dirname(os.path.abspath(__file__)) + "/" + demo_file
)
policy = PPOPolicy(0, mock_brain, trainer_parameters, False, False)
return env, policy
# Test default values
@mock.patch("mlagents.envs.UnityEnvironment")
def test_bcmodule_defaults(mock_env, dummy_config):
# See if default values match
mock_brain = create_mock_3dball_brain()
env, policy = create_ppo_policy_with_bc_mock(
mock_env, mock_brain, dummy_config, False, "test.demo"
)
assert policy.bc_module.num_epoch == dummy_config["num_epoch"]
assert policy.bc_module.batch_size == dummy_config["batch_size"]
env.close()
# Assign strange values and see if it overrides properly
dummy_config["pretraining"]["num_epoch"] = 100
dummy_config["pretraining"]["batch_size"] = 10000
env, policy = create_ppo_policy_with_bc_mock(
mock_env, mock_brain, dummy_config, False, "test.demo"
)
assert policy.bc_module.num_epoch == 100
assert policy.bc_module.batch_size == 10000
env.close()
# Test with continuous control env and vector actions
@mock.patch("mlagents.envs.UnityEnvironment")
def test_bcmodule_update(mock_env, dummy_config):
mock_brain = create_mock_3dball_brain()
env, policy = create_ppo_policy_with_bc_mock(
mock_env, mock_brain, dummy_config, False, "test.demo"
)
stats = policy.bc_module.update()
for _, item in stats.items():
assert isinstance(item, np.float32)
env.close()
# Test with RNN
@mock.patch("mlagents.envs.UnityEnvironment")
def test_bcmodule_rnn_update(mock_env, dummy_config):
mock_brain = create_mock_3dball_brain()
env, policy = create_ppo_policy_with_bc_mock(
mock_env, mock_brain, dummy_config, True, "test.demo"
)
stats = policy.bc_module.update()
for _, item in stats.items():
assert isinstance(item, np.float32)
env.close()
# Test with discrete control and visual observations
@mock.patch("mlagents.envs.UnityEnvironment")
def test_bcmodule_dc_visual_update(mock_env, dummy_config):
mock_brain = create_mock_banana_brain()
env, policy = create_ppo_policy_with_bc_mock(
mock_env, mock_brain, dummy_config, False, "testdcvis.demo"
)
stats = policy.bc_module.update()
for _, item in stats.items():
assert isinstance(item, np.float32)
env.close()
# Test with discrete control, visual observations and RNN
@mock.patch("mlagents.envs.UnityEnvironment")
def test_bcmodule_rnn_dc_update(mock_env, dummy_config):
mock_brain = create_mock_banana_brain()
env, policy = create_ppo_policy_with_bc_mock(
mock_env, mock_brain, dummy_config, True, "testdcvis.demo"
)
stats = policy.bc_module.update()
for _, item in stats.items():
assert isinstance(item, np.float32)
env.close()
if __name__ == "__main__":
pytest.main()

1001
ml-agents/mlagents/trainers/tests/testdcvis.demo
文件差异内容过多而无法显示
查看文件

442
demos/Expert3DBall.demo


BallDemo� -bfB**0: 3DBallBrain7
f&C�v���x��?�@�Q��"P���������<
��<����x��?�;|@�Q���"{�"n��>@��==���=P���������<
†�=���x��?0r@�Q���"��":MG?��J?=���=P���������<
���=e] =x��?�a@�Q��Z<�"�hw�{�>=���=P���������<
��m={׾;x��?�BK@�Q���"{�"�ѫ��٠�=���=P���������<
q�<�H=x��?|a.@�Q������"3����K�>=���=P���������<
��_�ޚJ=x��?�8 @�Q��Z��"������E>=���=P���������<
�l=qR=x��?���?�Q��r���"�R?6�o<=���=P���������<
�')=F-�x��?0FH?�Q���"��"o�=��=���=P���������<
���=A��袧?��@?�:����J���{Ѿ"���>�Ҵ�=���=P���������<
�'H=����(��?-5?0k�Q�(�����t^޾"�dr���N>=���=P���������<
���;8��9��?�T?P���,-�z�H�Q���"�%���t>=���=P���������<
ގn���Y=��? ) ?�j���!����5�˾"��5�l?=���=P���������<
��V��9�<�Շ?0�?h�$�; ������햾"܂=@ ��=���=P���������<
�.����<P[�?�� ?+�~/ӾA=?�}�"�A�=6���=���=P���������<
��B�%�1=�{?�"?p /�xϿ�3l�<.J�"��>p-c>=���=P���������<
��U=4(=�Wp?�3A?x�0�1NϾ;������"s;<?�;
�=���=P���������<
.�<��|<P�e?�Z1?M4�1NϾ�w�����"�~�ÿ��=���=P���������<
�/��>F= �Z?P�+?��6�����5���k�"?+���>=���=P���������<
ψ);��=��N?p�5?�j5��w��k޻ ="�3"=dO�>=���=P���������<
t�v<}��=�C?�9?�2�,����Z:>"��>�}U�=���=P���������<
}nE<Z��<P7?`=)?�=/�,������Z:>"I�����=���=P���������<
z�ڼ� �<� +?Pa?�+�����J���=>"PA㾻�<=���=P���������<
^�0=˙X=��?�s8?(."�J��
��� 9�>"��M?�*�>=���=P���������<
��S<M�<>?��'?Pk�J羙,�� 9�>"���?��=���=P���������<
1g��N� =?�_"?��k���g��NJ�>"7 Z�(a�==���=P���������<
�ݡ�q�&� ��>`&?h(
�k����W��NJ�>"ne+���Y�=���=P���������<
^2����%�� �> �?x�(���*h��Q �>"R1�xµ>=���=P���������<
�-�@T%;�Y�>��?@9�� ���[8���>"#�c?�q>=���=P���������<
: <�絼���>n?�z��ٱ��ݘ��j�>"�&�=�\��=���=P���������<
�����ʻ��>@�?�ݾ�/��2��;[Zr>"�X;���<>=���=P���������<
r�Fa\�`��> ?pGѾ@���`���U>"�g��'� �=���=P���������<
�*������>��?pA˾�
m�q�K<]��="0u=HwS>=���=P���������<
ZE&=?�� Ǔ>`�?�_Ⱦ��� o���U="�?��S==���=P���������<
��=�C =��>�:%?��ľ`���������="�����=?=���=P���������<
,C<"�<�
e>��?0V���ª��qp�SL�="k+���=���=P���������<
qeӻ��;�xB>?����e��7�M��B�="�ZY��)M�=���=P���������<
ڿF���,=�$> ? ����G��Vo<.y">"����&P�>=���=P���������<
���.>-=�T
>P ?0���“y���!���]>"+��>����=���=P���������<
]���<J<d�=��?���жd��TS��du>"������=���=P���������<
��<�u����=��?0���%�k�u1���>>"�k?��=���=P���������<
(i<���;�=0s?1���4�����j��="���� `l?=���=P���������<
�.�$x �&=�w?����D���F��%�=" |�����=���=P���������<
�"�����:�7<`?M��au�OH<v��="���h<�>=���=P���������<
G-��K6�LJ���?p���\l��@�_T="^�7>���=���=P���������<
ږ��<
;y��[?pҀ��{I���O<K�V�"� ���[ ?=���=P���������<
m��<^ˑ<�T��x? �����L�'�*�P<"��'?��2>=���=P���������<
��1�:�z�j��#?ీ���.��u��Q�"LqJ�(xf�=���=P���������<
�w�����;���<?�����������;�ף�"�2|>��G?=���=P���������<
��%����;z��@j?�݈�k=ϽV{K����"v�h>���=���=P���������<
���;[��;�wӽ�? a��Ea׽�:@�3��"��=�;A<=���=P���������<
��I��'��P`?P���0瞽�d�; ɇ�"����;��=���=P���������<
�2�;l>y<�G��0}?�䒾�΀���^:Wxe�"�@?�zd>=���=P���������<
����P:�+����?�e�� �&���(;]�@�"׀޾/2&�=���=P���������<
k3�9yk�=���$?�(���B��Ah� �;"t�>�x9?=���=P���������<
c<�����0??��������ܾ9�9<"�a�=PC�=���=P���������<
n/,��������?�Y��s��:l�����;"x��K�<=���=P���������<
�F<���5��?�;��ю;��~�����"
?�3��=���=P���������<
��e��E<=@���^!?P~��^a�;�N��WO<"����]H?=���=P���������<
^U3=d�;��p�?`���?��������<"#C)?�;��=���=P���������<
�B�75�<�����?@돾�T�� ` �o�[="o�n��K?>=���=P���������<
>�żc"#��
���?�k��nn<����="�>++8�=���=P���������<
>��&�<�����?�*�����<-�K�_(<"B�B>z�2?=���=P���������<
�)P<�8[��T��?@��z<�<�ɾ E��"�jm>d[�=���=P���������<
���h� =!�P9?@ҏ�ރ�;)~Z;{�K�"B�*��?=���=P���������<
��E�%6�;@����?0t���==>Ն���Լ"i������=���=P���������<
�=���<�0��K?�B��Y=\�m�W��"�?��3>=���=P���������<
{���\nf=&�� $?������<
u
��v="\�B����>=���=P���������<
�79�A��L����?�⋾�]v=���n�p="�LU���Y�=���=P���������<
�I4�Z����bڽ�d?+��ԡ�=g�����<"���>;Fe�=���=P���������<
D�0= g�:�ŽI?̉��K�=�]�;��;"��?��>=���=P���������<
�V =G�ﺀU��@2?�v����=�U��Q�;"km佌M*�=���=P���������<
��E�RǾ<�����J? T���JZ=�ͻ���<"�n��ϙ>=���=P���������<
!x�<ST=��`�?`��jx�=g%�L�="�H<?�7�==���=P���������<
BJd�*Z��ғ�0�?p�����=Sb�sU:="�jT��6=�=���=P���������<
( ����^=�a�@�!?���&>�/Z�X��="���>�?=���=P���������<
����[ɻ���? gx��{8>�����Ț="+�����1�=���=P���������<
D^�={��<���&?@�o�A
>�a<�,�="�?{�>=���=P���������<
3��<=�Y��;� �?��e����=�8���="w��l�پ=���=P���������<
撧�v�� =���?`�_�'��=3d)���<"��L@�=���=P���������<
�����_�<�?��f�̅�=e����h��"G�>@x�>=���=P���������<
/ d=�zN=�k<�!?�l��I�=�b<��q�"��<?�?=���=P���������<
C�V=a�1= �<�� ?� o���:�e=���:"��ơ��=���=P���������<
��+=�.��w<��? n�:{;���r�� <"c��m�=���=P���������<
9Oj�{;�;� ?`l�?���`=�7��<">B����==���=P���������<
�x<9�%��@���{?@�j�J��x�����<"K�?<n�n�=���=P���������<
@��;�*����N?��i�o��������ǻ"8�K=�ⴾ=���=P���������<
rlܼ��ǼЎ��<?`�m�����u�:�9V�"�鹾�_�==���=P���������<
qHt=_�0=n߼�6?*q���ν���!i��"�?��C?=���=P���������<
ft�;��b���`1?`t����?~��}��"(n!���%�=���=P���������<
+��� �<�I�0�?�v���ڽ�d�;�p�"�r��*k?=���=P���������<
�*�:o�P3?��v�����x�������"�9�!���=���=P���������<
n�;�a̼O����?@xw�Er�7�%�U���"BU�>�x��=���=P���������<
{^�<�lA����PN?��{��Ӷ��zֻ{1L�"�Y�>Ȥ>=���=P���������<
�)��Ƙ<�G���^?�-��6������;�N'�"8>7��8�>=���=P���������<
�p��a�������?`��Ui�����&��"M��{ы�=���=P���������<
�^��ئ����`�?A�� %�<J�9�=�\�"_?�e6�=���=P���������<
H�!;de�=�ң��/#? ���9�=��:R>@�"1K>�?=���=P���������<
�9�9�
=���P�?�샾[=ŜQ��� ="������þ=���=P���������<
��<a�<������?@��� ��< ýz��="��N>�Q�=���=P���������<
�α��TW������? �x��<�{��~��="y쾳z¾=���=P���������<
gF#<"�;�����?tq�M� =m�8�I�="�>�>��D>=���=P���������<
nI�:r���g���?�k����<� [����<"�VԽr�B�=���=P���������<
G��<@`�����`z?@�n��{;�X9rm��"�~�>|*1?=���=P���������<
d]���=���ׂ���?�ar���R;�} �JH�""�����x�=���=P���������<
�؇�Qϥ<�{���?��w�R��<6�t:��=�"�(;��?=���=P���������<
�U�9z~�<nm�@+?��z��=�4¼����"o�G> eU�=���=P���������<
�-��K
;2[���?�{�>�^=��_� ��"Z;��f�C�=���=P���������<
I�;
�:=8?���?��y����=#�2�s�="rɓ>� �>=���=P���������<
��
<0W&�,$�j?��v��K=N���>=_<"Njn=�A}�=���=P���������<
h�̻���< �0R?��w�(/k=�kչ���"y|!���Q?=���=P���������<
�1=���.��P�?��x���=^���}�"�6�>{�,�=���=P���������<
(��<�=4���?�z�����Ln��"V"����:?=���=P���������<
�'�<����� �`?� {�e�A��8���<\�"��Լ�*�=���=P���������<
xr��"=��@� ?�%{��B���@,;()�"����7?=���=P���������<
c��=Z�<���?�y�Y�U<)2.�k+�<"j��>o��=���=P���������<
��=rx<0��Q?�t�+�y���d="�?{�(�=���=P���������<
�����ig:'�?��n���λK�Ǽ�#�="���b�!�=���=P���������<
�|M=%`l;`� �?`�g���%�h���0�="B?{8�<=���=P���������<
�b������4�`�?`�a��_b��8���O:="�,�S翾=���=P���������<
>��<��0<ZM��F?@_^�����n�M��� ="�>J��>=���=P���������<
�y��@�n��_�`�? [�䙴�T\l;aR�<"�n��3��=���=P���������<
®Y���<XZ��?`X�n��<����="�?���>=���=P���������<
����m��<�K���? S��6=3�p�Wp="V!�� ��=���=P���������<
�;H<�֤�:9���?�AM���*=׭��lM="<�>þվ=���=P���������<
ӎ;<P�;�.� �?`�I����<���9t ="v%ջC�>=���=P���������<
c�4<�¦�)���?��F�Z�;� �����<"�Q������=���=P���������<
aL�<�f���.�`!? �E���м�������;"țT>�i>=���=P���������<
��:�&�<t?��9?`
D�.���W�m �<"�����^O>=���=P���������<
�X��n��I��+?@vC�)��:^ʽ����"�� �p?[�=���=P���������<
Jo�����6� >?`K��c�=�>�B�Ž"�vm>~t>=���=P���������<
:1�<Nӟ<�� L?��U����=]��;}�ν"B�?�@$?=���=P���������<
�f��5A <����
?`�^����=NV��y��"�nʾD�ٽ=���=P���������<
�-6=�i�<�ɼ��? �e��L6=���;KPZ�"��1?�rA>=���=P���������<
KŻN��,���s?�xk���=U�g��@��""H� p,�=���=P���������<
-�=]�b=,��`� ?`tp��d<��;��E�"O��>�?=���=P���������<
�dd=J-�6��p�?��u����A4����s�"��>��=���=P���������<
@=�9"<r��p�?@�~��d�!!+�~��"@7ǽ���>=���=P���������<
����`&�E6���?�ƒ��� ��tü�V��"��I���b�=���=P���������<
U�����<�d�`/? s����ٽ&ɨ;!��"_�M=ҁV>=���=P���������<
� (�W��<H��� ?` ��c|���<[���"����v�S>=���=P���������<
�����<7���?����[�e�ڻ�����"��w=0�߽=���=P���������<
C4�V-��Y���?`�����|;Aǒ���Y�"5��>�8 �=���=P���������<
�:��ϼüц��+?P����\<�@����۽"�1�=6��==���=P���������<
ⱀ��bO=Yk�``$?�$���v�=�{�Ґ��"Z:5���\?=���=P���������<
֌�NK�<�L�`?�ǜ�*ղ=1����h�"g�?��þ=���=P���������<
?�=|��;�+��
? 垾)I�=ṽ���"s�>E$�=���=P���������<
�!L<��
�����?�2���g0=!�V��n�"Yg�cM޾=���=P���������<
䡰�RS��M�В?����+O=�"p�b�ý"k�ƾ&~�>=���=P���������<
�l�<�*�<Fݼ�e?@ ���<=���;pؙ�"Ӹ�>f��>=���=P���������<
�P$<>��<T����?��� ��<#�����B�"�M�����=���=P���������<
����� <$����?�䮾��d=B���� ��"0 �.�۽=���=P���������<
��H=��7=x�"?h��6Q=��;��k<"�?���>=���=P���������<
b���ɼ�:�P-?�����. =/���T��;"2�(�@K�=���=P���������<
�P= �K=|H���"?�q��ϫY:����1�<"<.?�RY?=���=P���������<
�v=P=(s���?`����9� ����="nG/��OQ�=���=P���������<
��<2rx����0s?0���zA��mu��aڊ="C6a�ԋ �=���=P���������<
sL;r{�����p�?�x��W6��Dƨ�S�<"�2�Oo5�=���=P���������<
;ļ^�R=����"?`ȡ��+��B�ĺ�(D="�����Sq?=���=P���������<
��$��<�)��?�)������*�8��="�0�Zϝ�=���=P���������<
�O\;��q��4�@�?P���O7���Ɇ��I�="f�>ʎ��=���=P���������<
t.H�� �9V;��?@ȕ���P�Ha�9�I�="P�5���1>=���=P���������<
-�����< =�pH? ���$����L��!�=".��={�M>=���=P���������<
[Z9=ok��@�J?�u���]��S̾�EP="0?��=���=P���������<
�b�<�|R��e��\?pq��q ֽ�Y�����"B膾�P>=���=P���������<
#f*��L<�����^?�����ݽ�U�;n�ν"�澾z�;?=���=P���������<
�>��hE��ћ��N?&���Y����L��ֽ"��Ծ����=���=P���������<
M��<��8=�����O ?A���k\�Aǐ;vܖ�"7�F?Wr'?=���=P���������<
�`���t;�&��`?{�����/�35a�"�T��]��=���=P���������<
b=}o��O���? H���g��6X��W3d�"�?���=���=P���������<
r��)��<��̽�q?����N���b�;��"C�a�� �>=���=P���������<
���6=|;��۽��?�1��$j��
�G���"^�����=���=P���������<
b�;� <�,���?�Ө��#�~pc�+��"&��>��O==���=P���������<
E�w��� �����p�?�:��.���Jx�k6�"�����,U�=���=P���������<
��λiF=��Z"?� ���y��p��`�<"�K�=��*?=���=P���������<
�m'����<�����{ ?p���y
�<�-�����="��˾�U��=���=P���������<
K�
<�<�;@��@'?&]�<Զh�0�="�o?*�U�=���=P���������<
~���ʢ� ��`�?p���&M=��幫e�="tH����ؽ=���=P���������<
��^<�w5<�,۽��? ����; =lzj�i��="\��>:>=���=P���������<
i6h<U�=�@׽� ?���/�z<�<�����="<+܂>=���=P���������<
}���N|��sӽP�?p�����<��V�{1�="�{ξ��#�=���=P���������<
�l�<8o��`ͽ�?0���t(�<�zV��O="���>ԕӾ=���=P���������<
��=Hm,<��ѽ@7?Pv����������1�;"Ôa>d�I?=���=P���������<
"U��N�<�ؽ@�?P�������ٹ��<"\��x>=���=P���������<
���<�0"<�޽`>?0 ��������#D="��>R}�=���=P���������<
�G�:L�:��D��0?������J��ό9="ЏG���w�=���=P���������<
B�����x<��p�?�!��^�vѸ�CM="Ȼ��a�>=���=P���������<
T� � >E������?`���·�����\M�<"� >�J:�=���=P���������<
Gk
�J]�����P�?�,��O��<�Zf�(�I�"^i��+ґ>=���=P���������<
b_=��\;����x?�#��� �;�q:��q�"�iA?�'�>=���=P���������<
��:?yV=�"��� ? ؅��},��C�:��U�"Ӆ���?=���=P���������<
�����@<X��p�?ළ�;H�:�;8��e<"�/i�����=���=P���������<
~�;�f;,����?������3<�%ƽ���<"��>4ɽ=���=P���������<
.�ռ
��z���/?`׃�``�<Av��oK��"n��6�ݾ=���=P���������<
R�;B�==m���|?`Y��2)=��I�f�d<"��>�Ml?=���=P���������<
� S���;��ܽ�&?����P�D=�8��H ="0�p��*��=���=P���������<
���<�e开�ҽ��?𘀾�,=j�H�&��<"{j�>|���=���=P���������<
>s����/<�Ƚ�V?p����V=�M��:$9"�F
�� �>=���=P���������<
�$�<P���.���~?瀾0,B=��b���(�"�S?J���=���=P���������<
|1r<
{=�Q���b!? �~��p<|2��v�<"62t�� q?=���=P���������<
�D9�q�;���@L? �{���@<xY���.="�C@�F(,�=���=P���������<
�_R�ND<�����?�Nv��I�<8���~mc="f� ����==���=P���������<
Z;$"���.��@t?.q�v�< �I�ګ!="��)>D��=���=P���������<
�b|���<����@�?`�m��S=�,s���$="o�D�pu2?=���=P���������<
��%�dU=Ś�@" ?��e����=n�T�ゲ="������==���=P���������<
D��<��������E?�X]�RV�=rݧ�՝�="�-?�J�=���=P���������<
S�Q��+ ��|�0�?`[�{#~=�¡�֑;"࿾քw�=���=P���������<
x)�<�����e��1?�,]��V1=K�^�"|V�>�Ta==���=P���������<
��&= �{��a��I? c�[�����'����"_� >�>=���=P���������<
���;��R<no���?��i�CE���:s��"��ž���>=���=P���������<
�Y�;��P=W�� ?`l�n.��{:$M�"�n���>=���=P���������<
�E��O��<�ׇ��z?��j���޼��ҽ�?�<"HH��iġ�=���=P���������<
|��:��W��ь���?��h�{��Ed��2��"��>L_�=���=P���������<
�ޝ<�{������?`7p��-��� ��T��"�W>��+>=���=P���������<
pà� �G=U���� ?`�w��P��"
<����"�&ھ�?=���=P���������<
����� 1=����`_ ?`1{�H�f ��hU�"8�3=�̓�=���=P���������<
>@��6$�= ��p�)? �o�r~�<筼��="�m��|0?=���=P���������<
h=�U�<E��`�?`�e�gC�<UC���="*`c?��=���=P���������<
�`���s�������?`cY��Gi�B�r;���="��W�7ɋ�=���=P���������<
c�׼��=�����?@K�E�мQ���d>"�Dؽ~�?=���=P���������<
��ȼ%������o?`b<�j�ź�d��L
>"v D<��K�=���=P���������<
���<=���|���?��1���� ���^ �="��"?��>=���=P���������<
i�ɻ��?��0����?O*�?��N��:��="n�׾���==���=P���������<
r�_�7�n����?,%����-�1�\�="�o�� ��=���=P���������<
?_3;���; ����?��#�N�J� ���T<"��F>+K�>=���=P���������<
B��K������?@�$���]=f'-�D�޼"0�\�B��=���=P���������<
�m^�ʙ<t����?�Q)����=����A�"3`-?K+?=���=P���������<
.��<w;�܆��T?@�-�B%�=�B��y �"�>�㐽=���=P���������<
�f�D��<�`���?7/��J�=!>R�?97�"�$v� ��>=���=P���������<
e��<�6�=�)� � ?�5+��Q>�����=" �s?���>=���=P���������<
T�)=���v��@z?�g$����=�������="C�+>��=���=P���������<
dŁ;�\��ü0:? �* O=$ �:��="
ܾV��;=���=P���������<
a\;~�<@��0?�_��A>="ƀ�|��<"�ȕ�i��>=���=P���������<
X>ռ��<�Y�`�?���qM�=G»��T="�;����6>=���=P���������<
�=[=�ɼ����?������<� D�x1"="��h?�� �=���=P���������<
h�"�ࡆ�����?@��U�;yw|9#�!<"�.:�XK�==���=P���������<
,s=Z��<��0S?���RE��Iz�А�<"�v?��>=���=P���������<
X��^�]<T>���?�3 ��_�&Ϊ��n'="s�ؾ[���=���=P���������<
�2�*ڨ;v��?@0�z9���ꦼ)TX="�k���ý=���=P���������<
j��<���ʘ���?��GC����V��"���>��V�=���=P���������<
3����I��v̼�?`
��Ȃ��b��BP_�"d$���E9?=���=P���������<
;|Ӽ���H����?@��g+�t�����" ݍ��+O�=���=P���������<
^h7��xV=|�0C?@T�*���[�;�)8�"X�.>�\?=���=P���������<
9��@����`�?�2���|9ܯX�bg�"�ֽ��=���=P���������<
OE<��=��,?��!���<���;k�m�"|
�>G?=���=P���������<
�̂<%1�<���M?��%���/�%HE��>ͼ"֠5=X�=���=P���������<
®_<W��<���?`&�4��1��  <" �׼��"==���=P���������<
9ߕ�)%���@�? 3&����<Pǻ���"��� ;�=���=P���������<
&�F�~�<��7?��(��=k��E���"H�4?vlH?=���=P���������<
5�Z=���n��P:?`8+���-=�? ��
$�"��??��D�=���=P���������<
�;=�n<����@?��3��/�ҹ���ݽ"8K���ٰ�=���=P���������<
D�뻹���漀�?��B���p���2��*�" ����==���=P���������<
S�X;���<v
�@o?@�S��<j���<�Y%�"���=L�:?=���=P���������<
��D<ߥ:��#�Ћ?��c���������"�z�=���=���=P���������<
 ѻ�A�={A� "?�km�&B���2R<����"@�W�L>;?=���=P���������<
(R�<�ݧ�v_�=?� w�4���U޾s�̽"3��>��v�=���=P���������<
�F%���;=�}��F!?`R��(���[<o¤�"P�=��|B?=���=P���������<
E=k���p����?�僾�浽Ϡ��
"�]?� Y�=���=P���������<
6V�grh<����?�쉾o���#D<����"���>=���=P���������<
�"d;+@=i��Pg?�ێ��j����;z���"g#? X[>=���=P���������<
߻k2<����D?`d��q�]��6���z�"+���
��=���=P���������<
C0��s�@<Z���H?��������;pf6�"6�� �2==���=P���������<
�T����;�½��?֖�Ǵ�;���?"�"�P^�.���=���=P���������<
s_�<������03?ט�F��eY��@�W�"�9D?�Rؾ=���=P���������<
Z���P:�ǽp=?0R���C����]:-寽"�
����>=���=P���������<
��D��#Y<�ý��?PJ�����<��"�$%��"�뷾X>=���=P���������<
�k=�h�;����}?�夾4�i<� �;��"�y?O��=���=P���������<
�Sq;f�=ý@ ? m��W���w�:q+Ҽ"pHþ���>=���=P���������<
�T��c3= ����$?�2��B��<P�����="��"�kS==���=P���������<
c�߼�׎<����P_?����k�Y=�����g="�|�>�7��=���=P���������<
v=��f=&����!?�^�����<�03��*�="�?Sc�>=���=P���������<
��K=,��<���@�?>��'[ּZq�c�>"�����;=���=P���������<
�4�;�ο��׼��V?���*p��?!&���>"�y��r��=���=P���������<
���<���:�ν��?�P���Ľ ���S>"��d>�p�>=���=P���������<
��i�D%G������?�Q��m�½��ǽ� �="^36�=���=P���������<
B���0����p�?@�~���d�U/�;���<"TPO�k7{==���=P���������<
��=��������?�~�:s����4�����"��L?��w>=���=P���������<
<Ê��X�<����z?�7��A�ŽE!:\pF�"&����C ?=���=P���������<
���J���`� �?�h��yڊ��C����b�"�˾bį�=���=P���������<
\��;������P;?�逾��k���8�����"A�>���=���=P���������<
��ͼ��K;%���? ����
��1 ;s���"������x>=���=P���������<
�:���h��@��p0?P���3a��3"�,K�"ێ=�^��=���=P���������<
`lE����
���?����
@=��Ļ���"�k��q_>=���=P���������<
o����<��p�?@_����=�N��6��"�5�>Xr�>=���=P���������<
*�^=�~�<��
�0?� ��>Pa=3˾�4b�"�b]?+B�==���=P���������<
�2�<�=�� �`]?����5>N: U�h@S<"O�޾��T==���=P���������<
�K�JA<�� �PE?�s��� ;�U��� ="���0�v�=���=P���������<
��;�^.�� ��5?0Ӌ���;T����P�<"�WX>'&�=���=P���������<
��(� !��@ �@�?�t��!�;������"�B>�I�>=���=P���������<
�c�JS=@��0?"?�錾��<�{��C�P<"�i:��"?=���=P���������<
)P���wǼ���J?`\��5W=0�¾�5�;"�fx���^�=���=P���������<
���M5<=����k ?�����=^F�Բ�<"�#0>J�N?=���=P���������<
м�<�x�;����8?�1��ͰK=�t��%� ="s٩>�D��=���=P���������<
ܕM��p=�6轀s?`†��==�ʻ���="G?ھ(��>=���=P���������<
wT =҈N�����Р?pP���ã</��N�T="��?��=���=P���������<
t��<�����!?�����cx�qwa�n
��"��&�2�?=���=P���������<
�< �#�;� ����?���y=�f;m�Ҽ"�r.��>=���=P���������<
�6�;K|<^��`e?@��������\��U&�"���>^��==���=P���������<
�Ҽ��P=�����"?焾1h�n0="ӈ���{�>=���=P���������<
�-��eD��L�K?������xI���6="��7>��8�=���=P���������<
�d@����<����� ? ���bC=$����m="B�׾�2�>=���=P���������<
�H;k��<���@?�
x����=�H��F�=""�?�Q'�=���=P���������<
ʈ�<��ͼ���?`�n��8=�큾�="�>�j�=���=P���������<
�����=�ܽ�=?�f� �;=�d��O�="��*�X�)?=���=P���������<
1蛻��v���ѽ��?��\��m=�H���?�="�q�>�p �=���=P���������<
!y�<�s<�Ƚ��? kT��t=/�湮��="��>���>=���=P���������<
\���,2��½pH?@�K����<iE��="DgѾ�ZǾ=���=P���������<
�L����<����z?`-C��I=�묻*;�="�;c=l
?=���=P���������<
��߼q.�;����?@�8�$t=���/�="�����h��=���=P���������<
��;_����(��=?�(/���=ߢH�h#z="���>۱K�=���=P���������<
��r;���� ��`�? C0��k=�����B��"Ɠ�����>=���=P���������<
���<=K��g���?��5� �=�.��"n�W>c�>=���=P���������<
9(��%��<����d?� <�)�!=/Y�:_l�"(s쾾�?=���=P���������<
�X><iip=�|�P�?�R=��X =���9�Q�;" V�>���>=���=P���������<
�zH��.�<�l�@C?��:�/11=.����4="����>߾=���=P���������<
�2��i��ZS�`�?@=5�:1�=����)O="�Q����=���=P���������<
���<_e=�1�~?`�.��ߜ=��Ǻb�="��?�s?=���=P���������<
�K�� �S����?�(�w�==�)�W�2="+����=���=P���������<
��c<ںN�fۼ@�?�n)��ƪ=�n3�z�3�"vh�>�0�<=���=P���������<
ӹ�<���������?��0�Ij=���:���"`+�=L�?=���=P���������<
_��;d��=���P�"?@�1�s&=��;���"tL��?=���=P���������<
O��<k,=�X� ?�z/���<iE���F="_4=>�'��=���=P���������<
����p1=0�P2?`%��q=U��>�="��+5L<=���=P���������<
L&��3��ϻ�?���е[=�}-�)>"��{>?�=���=P���������<
E�J=˭<�0'���?�W �k�a<����տ�="�L/?�q;=���=P���������<
�C_�Ac��?��v?���\A��*9�LN="��9��?=���=P���������<
1�<�M���X�?���h=���J� s�<"�ϲ>���=���=P���������<
l�W=L��;��p�?@��J̈́��滟�D<"���>���>=���=P���������<
�5<.�^�����?@����Ƚ��P� ;�;"0��b�m�=���=P���������<
x8˻���;�ۼ� ? ���9ɽ���:��9"��0� �h>=���=P���������<
P?��䭥�����?�[��㞽�֕�-F��"��*���=���=P���������<
��<}3*� 3��?@�M鵽����,
�"��?�#�>=���=P���������<
@\X�?�,<�Z�0�?�#�&⿽�;&�-�"����)?=���=P���������<
��輇��<\x��G?�2�E\� �8<.z�"�1���D>=���=P���������<
9M��]: ;�����? @�K���Z׽���"�C�=(���=���=P���������<
��,�=N�;����@/? �L���t�N#��!���"��j>'�<=���=P���������<
&c���
=q���?2W�,����;���"�'T��&�>=���=P���������<
�j3=�h�;!���D?�_�K����$�]{��"j�?;檾=���=P���������<
4Q�F�Q<D��@?��f�3����A�:.?~�"k���==���=P���������<
��<���������?�l�����[�ӽ�Sl�"�%�>p�*�=���=P���������<
�p�f?<�����|? �q�|{���� <�lF�"��x�>=���=P���������<
����= ���$?�oq�G�>�\����q?<"���>�6?=���=P���������<
*)��(sH;�*���~?@�o�jH�<LG��o�<"��:>>:5�=���=P���������<
��V=wb<���`�?�kl� ���í"�W)(="\�T?�>=���=P���������<
T�Z���@��#��P�?@�i�PBǼ<����;"�~@�|b3�=���=P���������<
�$><��z<�ý��?�vk���ȼޟ��ߢu�" R�>�k7?=���=P���������<
͎Լ=�"=��ƽ��?`ki��[�'7ֺ��<"<�ܾKw�>=���=P���������<
��,�-9���Ƚ`�?�>g�y ߻MrľLW�;"�d�>s�x�=���=P���������<
�Fȼ%M���ƽ�o?@�m�?��<�N��R���"4ߋ�nP">=���=P���������<
Y9�<��J��mý0M?`^x� G<�k�ux�"��?�T�=���=P���������<
k�^=��C<��?�[����y��<��q�"T��>�4?=���=P���������<
u�<��>=���p�?`4��*˽�b
<D���"WH�h��>=���=P���������<
���� �=�����%?��������(�;�ۼ"P����G�>=���=P���������<
���<����@�� �?Ps�������jy��ۼ".B�>��=���=P���������<
JO���l�;�� �� ? ��psH�y�;��K�"��k?�>=���=P���������<
�=��5�<� �� ?�)�����<��4��"�zt>.��==���=P���������<
x*;͗�<��P�?@ ��]x8=����7R#�"ز?Hz�==���=P���������<
)6=�=<)��@?`���.E�<��(�gU�<"8�>;O�=���=P���������<
9�����<����?�����t;HO6�h="pz����=���=P���������<
P�;J�3��j��9?`F��@�;hL���="on�>�߽=���=P���������<
�y<�<���`�?PG���SF�{'���,="S��=�6�==���=P���������<
A���ڻ����9?p��խ��k):
�'="��پ�@+�=���=P���������<
���<��c����?@I������7���bq�<"\�>�p��=���=P���������<
���<�}Y<@�2?0�����
�D_�����<"��;"��>=���=P���������<
n�`���<�� �`?�����e�����S="�۶��Jw==���=P���������<
�s��h����} �Ѓ?0т� �����xd="�Լ=>2��=���=P���������<
�=Ɍ������?@����l4��,R����<"kR�>����=���=P���������<
vGռ���<@2��l?�����N�� ;��c<"��-�rN?=���=P���������<
!w��l��<`��?��|�|����ђ���,="ku�=��{==���=P���������<
�j&<�G����p�?@;y���ҼH�ھfI@<"2`�>Lpx�=���=P���������<
�nJ���< ���?*���><y�9mqo�"0/���S?=���=P���������<
Ah�O�~��Q���?p���=�-5�/�d�"�>�O��=���=P���������<
�b)<���<@���X?�B��=�&=m�';�@
�"_i�>u�>=���=P���������<
��U=i�<�����? F��Ҹi�9 սp��"���>��x�=���=P���������<
�GS<T��r�@�?�ކ���&�5�u!ü"�Y����=���=P���������<
�p���<����?߈��o���;�� �"K����?=���=P���������<
:�c�C����00?๊��_ļFDM�� -�"�gA<H��=���=P���������<
�w���%=����#!?� ����<���:�o˼"�>ٽ[�4?=���=P���������<
�����4`=�d���#?p܊�95=�һ��0="��e<��)>=���=P���������<
�*W���<�0� ?����p��=#1��C��="p��F���=���=P���������<
�� =�k"<����<?�����=�a�T��="�?�]��=���=P���������<
'�<��üY���?��|�r�'=p޽ æ="�cW��Ǿ=���=P���������<
��
�������pj?`�w�8n=�� ��W9="|���W�<=���=P���������<
j�_�P�?<����?��s� y�=9+a���2="��j>+�>=���=P���������<
@��< ��<Խ��?��m�g��=���ރ="�?�>��==���=P���������<
p�8=C3���ǽ�X?��g�"�=k�h���5="���>���=���=P���������<
�z�<,f0;��̽`p?�f�����g�溯�0<"kQO�+��>=���=P���������BallDemo� -bfB

1001
demos/Expert3DBallHard.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertBanana.demo
文件差异内容过多而无法显示
查看文件

171
demos/ExpertBasic.demo


 ExpertBasic\ -�]?*: BasicLearningc
P�?"P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������h
P�?"@=
�#�P���������j
P�?"@=�p}?@P��������� ExpertBasic\ -�]?

198
demos/ExpertBouncer.demo


 ExpertBouncer� -䪌A ***0:BouncerLearningc
H:�?�
o?
�������܍@ݶY�" P���������h
H:�?�
o?
�������܍@ݶY�I��>��_?��������܍@ݶY�" ,k̽ ���>\��="��P���������h
H:�?�
o?
�������܍@ݶY�I��>��_?��������܍@ݶY����?��t?�EB����T9@�O��" ք�=XDH��.B>=?�?P���������h
HI��>��_?��������܍@ݶY����?��t?�EB����T9@�O�� f���J�@s��>���T9@�O��" ����N=L.�>=s��P���������h
H���?��t?�EB����T9@�O�� f���J�@s��>���T9@�O󾼁��ՠ)?��
@�a���iG@4���" �i.=ƚk�2o�==n�?P���������h
H f���J�@s��>���T9@�O󾼁��ՠ)?��
@�a���iG@4���d��,�W?^9��& �?!�@@�)@" ���C���'�=|�}?P���������h
H����ՠ)?��
@�a���iG@4���d��,�W?^9��& �?!�@@�)@MK�@6�a?3`�@�'@����@hK��" ��\?G���?=�Kx?P���������h
Hd��,�W?^9��& �?!�@@�)@MK�@6�a?3`�@�'@����@hK��r����z?uS��[��@P�@b�c�" Np*�6�?���}�=�y?P���������h
HMK�@6�a?3`�@�'@����@hK��r����z?uS��[��@P�@b�c�%i�@��S?!���[��@P�@b�c�" Z�E?s#��.�==6A&�P���������h
Hr����z?uS��[��@P�@b�c�%i�@��S?!���[��@P�@b�c�� �@t�C?zN_�Xq�+��@r�c�" L
=�EJ=�)�>=c]?P���������h
H%i�@��S?!���[��@P�@b�c�� �@t�C?zN_�Xq�+��@r�c�&����[N?�;?Y��@�j@拞�" ��ˡx�vJ�==��{?P���������h
H� �@t�C?zN_�Xq�+��@r�c�&����[N?�;?Y��@�j@拞�� �@��Q?#��,����6@�Fd�" �?��[��!�=��y?P���������h
H&����[N?�;?Y��@�j@拞�� �@��Q?#��,����6@�Fd�������c?/���,����6@�Fd�" �C ���{�;��<=��ǻP���������h
H� �@��Q?#��,����6@�Fd�������c?/���,����6@�Fd��Z>�SJ�? f@z@@4��@��Ŀ" � ����@?P?=kc�?P���������h
H������c?/���,����6@�Fd��Z>�SJ�? f@z@@4��@��Ŀ��k@�D?C�R�����@���@" :9�>�P��bܘ�=��~?P���������h
H�Z>�SJ�? f@z@@4��@��Ŀ��k@�D?C�R�����@���@����?"P�@q�q@�;�@���@" 'd��� ��1��>=�?P���������h
H��k@�D?C�R�����@���@����?"P�@q�q@�;�@���@W�A�\F?+͠@�R.@ɩl@iX��" �U3?{�T�EZ�==V�}?P���������h
H����?"P�@q�q@�;�@���@W�A�\F?+͠@�R.@ɩl@iX��cV�A^? [������@j��?" ��X��~B��a�=�P{?P���������h
HW�A�\F?+͠@�R.@ɩl@iX��cV�A^? [������@j��?ܯ���W�?���@�U�>�Y�@�h��" �o������r�?==#�?P���������j
H��@ �j�8@KX�@G@" =x�y?@P���������h
H�М�@ �j�8@KX�@G@��v@�A?���j�8@KX�@G@" �*�>n �kYv<=�~ҺP���������h
H�М�@ �j�8@KX�@G@��v@�A?���j�8@KX�@G@Ը��Y�z?�R�@�f����@ �>" ���nzD=�L�>=��~?P���������h
H��v@�A?���j�8@KX�@G@Ը��Y�z?�R�@�f����@ �>�5� �K?h��?U{ @���@���" ^Xͽ8>��(�=��?P���������h
HԸ��Y�z?�R�@�f����@ �>�5� �K?h��?U{ @���@��� HA:YX?����u������@d��@" ��@?��>���=a�|?P���������h
H�5� �K?h��?U{ @���@��� HA:YX?����u������@d��@�q��?
Z? f�@��n��k@�DK@" ��u����=� ?="1�?P���������h
H HA:YX?����u������@d��@�q��?
Z? f�@��n��k@�DK@q4�����?��@�-�@�A@�ת�" �d�<�1m�����=�{?P���������h
H�q��?
Z? f�@��n��k@�DK@q4�����?��@�-�@�A@�ת�)� A��]?�]�O�ֿ�Y@�Ou?" �?><!��z=�=A{{?P���������h
Hq4�����?��@�-�@�A@�ת�)� A��]?�]�O�ֿ�Y@�Ou?�A��2>S?_Ԩ���>�{`@��'�" �k<���� ��<=F�}?P���������h
H)� A��]?�]�O�ֿ�Y@�Ou?�A��2>S?_Ԩ���>�{`@��'�����n?A�^>���?I�1@�"G�" զ=b�=Rr==5�?P���������h
H�A��2>S?_Ԩ���>�{`@��'�����n?A�^>���?I�1@�"G�,z�? U?�g�Ռ���HY@F��@" ���={�C�Ă��=�z?P���������h
H����n?A�^>���?I�1@�"G�,z�? U?�g�Ռ���HY@F��@ɶ�>�tG?��5@Ռ���HY@F��@" p�4��������>=�bW�P���������h
H,z�? U?�g�Ռ���HY@F��@ɶ�>�tG?��5@Ռ���HY@F��@�uֿH�r?�1�@��@e4D@#K��" s+�6r���\>=�?P���������h
Hɶ�>�tG?��5@Ռ���HY@F��@�uֿH�r?�1�@��@e4D@#K���� ??'�@���@e4D@#K��" ɏ>.�P��^�=6ro�P���������h
H�uֿH�r?�1�@��@e4D@#K���� ??'�@���@e4D@#K���J?��e?����Կ@n�@����" KF <֬p� ���=���?P���������h
H�� ??'�@���@e4D@#K���J?��e?����Կ@n�@���������0x?i�9�q:@��@hs@" .���fw8�7�=�a?P���������h
H�J?��e?����Կ@n�@���������0x?i�9�q:@��@hs@t�A(�X?%�@��&��)-@�`�?" �ee?Ĝ�� <?=2?z?P���������h
H�����0x?i�9�q:@��@hs@t�A(�X?%�@��&��)-@�`�?����ɱJ?I�����&��)-@�`�?" ���,���%g�=+w��P���������h
Ht�A(�X?%�@��&��)-@�`�?����ɱJ?I�����&��)-@�`�?Y�@�j~?�??l���@��Z@" :�?bz���T�>=��}?P���������h
H����ɱJ?I�����&��)-@�`�?Y�@�j~?�??l���@��Z@\�Y=�Q�?)%@8�>��@O���" �|!�����c�>=��?P���������j
H]^h@@T@�@�Dw=@����" =W�~?@P���������h
H]^h@@T@�@�Dw=@������K�`��@Sq��@�Dw=@����" *�$�Ьl��j�=:Z�P���������h
H]^h@@T@�@�Dw=@������K�`��@Sq��@�Dw=@����mRn�'6[?���P@��@���" 6�>G���fȆ�=�?P���������h
H��K�`��@Sq��@�Dw=@����mRn�'6[?���P@��@���,o�@��@�΄?P@��@���" #?r�p���k?=�P��P���������h
HmRn�'6[?���P@��@���,o�@��@�΄?P@��@���*=�?*ps?J{����?<d�@��q@" �??�@^��1��=�?P���������h
H,o�@��@�΄?P@��@���*=�?*ps?J{����?<d�@��q@�<��cG?˜�=���?<d�@��q@" �������<=1J�P���������h
H*=�?*ps?J{����?<d�@��q@�<��cG?˜�=���?<d�@��q@A��?��Q?�S�@�e�Q�@;���" J� >�q_�k��>= ?P���������h
H�<��cG?˜�=���?<d�@��q@A��?��Q?�S�@�e�Q�@;���]���p?K���I#]@��D@�р�" ��������7�b�=�[|?P���������h
HA��?��Q?�S�@�e�Q�@;���]���p?K���I#]@��D@�р����@��p?� \>nw&�a@d2��" ��?Ј��H�>=��}?P���������h
H]���p?K���I#]@��D@���@��p?� \>nw&�a@d2���r���DJ?
�&@�@^�@p��=" �N��f���E4>=?�|?P���������h
H���@��p?� \>nw&�a@d2���r���DJ?
�&@�@^�@p��=5�z@�M?T���;�T���@�a�>" �� ?P�������=^�}?P���������h
H�r���DJ?
�&@�@^�@p��=5�z@�M?T���;�T���@�a�>x��~�[?��;�T���@�a�>" �K��F��<�|>=��#�P���������h
H5�z@�M?T���;�T���@�a�>x��~�[?Ф��;�T���@�a�>�H��b�Q?<�-@"i@�kH@i澿" (���v)ֽk$�>=��~?P���������h
Hx��~�[?Ф��;�T���@�a�>�H��b�Q?<�-@"i@�kH@i澿m��@J�~?�#�?"i@�kH@i澿" �?$D�0ާ�=�u��P���������h
H�H��b�Q?<�-@"i@�kH@i澿m��@J�~?�#�?"i@�kH@i澿plG@XH?m������?%�@"�?" �w���T�R y�=HP?P���������h
Hm��@J�~?�#�?"i@�kH@i澿plG@XH?m������?%�@"�?�7��N�e? �?���@oh7@~��@" z���Bv����4>="?P���������h
HplG@XH?m������?%�@"�?�7��N�e? �?���@oh7@~��@���@0�|?$��@�(��l%@"n�?" �A)?�S��L��>="�|?P���������h
H�7��N�e? �?���@oh7@~��@���@0�|?$��@�(��l%@"n�?�]����x?\""�z��@��@���@" ��k���~����=�f{?P���������h
H���@0�|?$��@�(��l%@"n�?�]����x?\""�z��@��@���@T��@�v�?�j@K�Q��P@0�H@" �?�RY�k��>=�#{?P���������h
H�]����x?\""�z��@��@���@T��@�v�?�j@K�Q��P@0�H@�y����D?��ƿK�Q��P@0�H@" FW���W��仾=�r�P���������j
H�Ŀ@�=�@G7�T�@���" =,9?@P���������h
H�Ŀ@�=�@G7�T�@�俖+��V�Q?^�������"K@w!�?" �����`���@�=�}?P���������h
H�Ŀ@�=�@G7�T�@�俖+��V�Q?^�������"K@w!�?�-���'J?jT%@�y����@)���" p��>�V���/?=�}?P���������h
H�+��V�Q?^�������"K@w!�?�-���'J?jT%@�y����@)�迊5�6BF?Qd�>�y����@)���" �)���c=���=��e�P���������h
H�-���'J?jT%@�y����@)�迊5�6BF?Qd�>�y����@)���������u?W�����e�ߑ@~Y�" �+�^��=����=�A?P���������h
H�5�6BF?Qd�>�y����@)���������u?W�����e�ߑ@~Y��B��*Q?�:@+(T���@���>" ���>6��=�+?=2&�?P���������h
H������u?W�����e�ߑ@~Y��B��*Q?�:@+(T���@���>1t��j�s?c '@n�$���e@/�Կ" k��I�>+���=x?P���������h
H�B��*Q?�:@+(T���@���>1t��j�s?c '@n�$���e@/�Կ����3�@b�W�n�$���e@/�Կ" /��>��B=O&�=�l��P���������h
H1t��j�s?c '@n�$���e@/�Կ����3�@b�W�n�$���e@/�ԿS��@��=?�Lz�n�$���e@/�Կ" kV�>�s��v��=�y��P���������h
H����3�@b�W�n�$���e@/�ԿS��@��=?�Lz�n�$���e@/�Կt��ru_?������ݿ��@�_Z@" [�꾞�r�"��>=r�~?P���������h
HS��@��=?�Lz�n�$���e@/�Կt��ru_?������ݿ��@�_Z@��z�@J��?J�(�cV@<��>" {�¼�8�=[��>=?P���������h
Ht��ru_?������ݿ��@�_Z@��z�@J��?J�(�cV@<��> �ANJs?�A(@J�(�cV@<��>" ��?OP1��k2==�q��P���������h
H��z�@J��?J�(�cV@<��> �ANJs?�A(@J�(�cV@<��>K��"1{?�I
�|R�>�}�@�� �" ���h1�����=�{?P���������h
H �ANJs?�A(@J�(�cV@<��>K��"1{?�I
�|R�>�}�@�� �e?�?F�z?sݢ?ƨ.���~@7ޅ�" ��?09��s?�>=
�}?P���������h
HK��"1{?�I
�|R�>�}�@�� �e?�?F�z?sݢ?ƨ.���~@7ޅ�<��f�8?D5 ��t@���@R:�@" �1���$�=V�#�=H�}?P���������h
He?�?F�z?sݢ?ƨ.���~@7ޅ�<��f�8?D5 ��t@���@R:�@C�@�uS? /�@���>�{�@s�" h�/?�v��?=�yy?P���������h
H<��f�8?D5 ��t@���@R:�@C�@�uS? /�@���>�{�@s�@����?�c0�?�o@���" O�;�xd-=N�=�G�?P���������h
HC�@�uS? /�@���>�{�@s�@����?�c0�?�o@����H�?�֝@=d,�?�o@���" x�6?��+�+��;=�!�P���������h
H@����?�c0�?�o@����H�?�֝@=d,�?�o@���6��>�3T?f��� �VT�@|��>" S7����۾��&>=J��?P���������h
H�H�?�֝@=d,�?�o@���6��>�3T?f��� �VT�@|��>`ҿ�\\?M6�?{*>�y�#@�e�" ? �8½�E�==n�?P���������j
HV�#@@�']?���=勲@م�?" =�"?@P���������h
HV�#@@�']?���=勲@م�?�Ț��Bh?M�3�\w@1�@�J!�" �����~�𦀾=Z��?P���������h
HV�#@@�']?���=勲@م�?�Ț��Bh?M�3�\w@1�@�J!�|n�?��b?�^?l<�>I;@Dr�>" Z�E>+�¼�]h>=O�?P���������h
H�Ț��Bh?M�3�\w@1�@�J!�|n�?��b?�^?l<�>I;@Dr�>C�����s?j��@3Tt@`i�@�4@" ��f�z��=���>=+y?P���������h
H|n�?��b?�^?l<�>I;@Dr�>C�����s?j��@3Tt@`i�@�4@���@��B?L�]�\���@T @" ��?�J�=��=@D}?P���������h
HC�����s?j��@3Tt@`i�@�4@���@��B?L�]�\���@T @�2��8�Y?�Y�?�$��$�@�?b�" Jp.��Q�0��>=�@}?P���������h
H���@��B?L�]�\���@T @�2��8�Y?�Y�?�$��$�@�?b�����KtL?��)�q�0����@o�?" ��+���������=�|?P���������h
H�2��8�Y?�Y�?�$��$�@�?b�����KtL?��)�q�0����@o�?hS���xF?]�_@,��@���@t�ľ" T�>j0>+��>=(?P���������h
H����KtL?��)�q�0����@o�?hS���xF?]�_@,��@���@t�ľ�N A�Z?ȕ�_�@w%@��@" 2W1?�^�<� �=�|?P���������h
HhS���xF?]�_@,��@���@t�ľ�N A�Z?ȕ�_�@w%@��@Y��@�[�?y��?4�>/��@�9�" |�Ͻ��9���>=#(?P���������h
H�N A�Z?ȕ�_�@w%@��@Y��@�[�?y��?4�>/��@�9�#ɼ��aV?�5�?4�>/��@�9�" �&`�ߤ��V{�:=�3S�P���������h
HY��@�[�?y��?4�>/��@�9�#ɼ��aV?�5�?4�>/��@�9�v@�`?��������2�@��$�" ҫ?[h0�7(ھ=B�}?P���������h
H#ɼ��aV?�5�?4�>/��@�9�v@�`?��������2�@��$��|���Hv?������?ש@T���" f�`�rN>=߫~?P���������h
Hv@�`?��������2�@��$��|���Hv?������?ש@T�𾂥�@�A|?s��@L��?/�@�Ί�" @+I?��Ľ���>=�U|?P���������h
H�|���Hv?������?ש@T�𾂥�@�A|?s��@L��?/�@�Ί�.Hv?��F?�����i�>$�@K �@" RfD�=7A�=�~?P���������h
H���@�A|?s��@L��?/�@�Ί�.Hv?��F?�����i�>$�@K �@�����46?�E�@X�[�2 �@���?" �ҾmS> '?=�q}?P���������h
H.Hv?��F?�����i�>$�@K �@�����46?�E�@X�[�2 �@���?�����F?d�>����n�@�h�?" V�q>�Q�=D��=�<?P���������h
H�����46?�E�@X�[�2 �@���?�����F?d�>����n�@�h�?~�ŸY?�9>?��?*n=@��P�" �. ��J�s� ==��?P���������h
H�����F?d�>����n�@�h�?~�ŸY?�9>?��?*n=@��P�.W|@�_?4�ܿ�͆?7@}�m@" �� ?��
��^/�=̀~?P���������h
H~�ŸY?�9>?��?*n=@��P�.W|@�_?4�ܿ�͆?7@}�m@f��H�f?>~�@��@@�L�@�'��" 8k���Ӏ�gW�>=� ~?P���������j
Ho���@+���O�X9I@Z �?" =k;|?@P���������h
Ho���@+���O�X9I@Z �?�$��@�"j��O�X9I@Z �?" ު��O:�����>=s,K�P���������h
Ho���@+���O�X9I@Z �?�$��@�"j��O�X9I@Z �?�5����\?�N;@�y9@&u�@��b�" i���Ƚ;�9>=�?P���������h
H�$��@�"j��O�X9I@Z �?�5����\?�N;@�y9@&u�@��b�P�I@l��?� �g/@٦X@*I�" FM,?�����+^�=J�z?P���������h
H�5����\?�N;@�y9@&u�@��b�P�I@l��?� �g/@٦X@*I�`�A��i?�:?^)�7��@:-�?" ��>�#��v+2?=�g}?P���������h
HP�I@l��?� �g/@٦X@*I�`�A��i?�:?^)�7��@:-�?�ȱ�6M?B4~@GVw@��@��]�" ��PU��
�p>=��>@P���������h
H`�A��i?�:?^)�7��@:-�?�ȱ�6M?B4~@GVw@��@��]��D�@:��@����GVw@��@��]�" ��l?����e�=�q��P���������h
H�ȱ�6M?B4~@GVw@��@��]��D�@:��@����GVw@��@��]��-A0�J?犈�GVw@��@��]�" ��>8Y����==���P���������h
H�D�@:��@����GVw@��@��]��-A0�J?犈�GVw@��@��]�d�p@�N?��I��`�����@�o@" �����=�����==��~?P���������h
H�-A0�J?犈�GVw@��@��]�d�p@�N?��I��`�����@�o@�]���r?GB�@M�˿�j�@k߇�" fi`���p��i4?=�]z?P���������h
Hd�p@�N?��I��`�����@�o@�]���r?GB�@M�˿�j�@k߇�b�@Tnb?,Ux��j
@�&�@��s@" H�1?ƑI=3�-�=��{?P���������h
H�]���r?GB�@M�˿�j�@k߇�b�@Tnb?,Ux��j
@�&�@��s@7�@0�v?+Fu@�j
@�&�@��s@" ���>��%���?=��ŻP���������h
Hb�@Tnb?,Ux��j
@�&�@��s@7�@0�v?+Fu@�j
@�&�@��s@��k�2��@d�c@��Q�qw@�b�@" _j6����=�Hܼ=��}?P���������h
H7�@0�v?+Fu@�j
@�&�@��s@��k�2��@d�c@��Q�qw@�b�@p�'@��*?/m&A,*�@d@ my@" ��%>�:.���>=�?P���������h
H��k�2��@d�c@��Q�qw@�b�@p�'@��*?/m&A,*�@d@ my@qHU@a�?9�Ѿ�[@���@w$?" ;M=@��3�B�=r}?P���������h
Hp�'@��*?/m&A,*�@d@ my@qHU@a�?9�Ѿ�[@���@w$?���@$�z?AB�@��~@O��@ƚP�" �Y�>x������>=/^~?P���������h
HqHU@a�?9�Ѿ�[@���@w$?���@$�z?AB�@��~@O��@ƚP������/8?x�,?��~@O��@ƚP�" ���V�=���=+���P���������h
H���@$�z?AB�@��~@O��@ƚP������/8?x�,?��~@O��@ƚP�C�@�C?�>���T����@5�>�" l��>���<vݠ�=J�~?P���������h
H�����/8?x�,?��~@O��@ƚP�C�@�C?�>���T����@5�>�����&X?�ړ�‡g?t��@K@" ga����= G�=o�|?P���������h
HC�@�C?�>���T����@5�>�����&X?�ړ�‡g?t��@K@��@��d?��@:���J��@l�@" W�g?8�V�oe_?=�y?P���������j
H�祿@���>Ǝ[@���@�ԁ�" =�y?@P���������h
H�祿@���>Ǝ[@���@�ԁ�?Y�? ?�@H�t�� @�;@�:��"
�t> 6���e��=�-?P���������h
H�祿@���>Ǝ[@���@�ԁ�?Y�? ?�@H�t�� @�;@�:��h�0@� +?��[@ٹ��r@j��@" ���=0Vr��G�>=��~?P���������h
H?Y�? ?�@H�t�� @�;@�:��h�0@� +?��[@ٹ��r@j��@�J��}@?�ʜ@��@]7b@<�ž" {D����Q�(��==AP?P���������h
Hh�0@� +?��[@ٹ��r@j��@�J��}@?�ʜ@��@]7b@<�ž�?�g�?�>z?���@���@" �I�>���o���=O�~?P���������h
H�J��}@?�ʜ@��@]7b@<�ž�?�g�?�>z?���@���@��P���V?�D�@��2�Ѫ�@�&��" ��@��|��X�>=��~?P���������h
H�?�g�?�>z?���@���@��P���V?�D�@��2�Ѫ�@�&��1r;��b??Y��7�@�>�@�8
?" H�
��j�= �Ͼ=0?P���������h
H��P���V?�D�@��2�Ѫ�@�&��1r;��b??Y��7�@�>�@�8
?�4m@���?xv"@,׽?%�@ܶ�>" ��>bν��>=��~?P���������h
H1r;��b??Y��7�@�>�@�8
?�4m@���?xv"@,׽?%�@ܶ�>R�?EOE?u~��Օ�;%@ar�@" d�!���>�uо=�?@P���������h
H�4m@���?xv"@,׽?%�@ܶ�>R�?EOE?u~��Օ�;%@ar�@�_P���f?��A�"�@���@t{:�" 7� ��o-��\o?=Q/|?P���������h
HR�?EOE?u~��Օ�;%@ar�@�_P���f?��A�"�@���@t{:��J�@�zg?�7o��>��3�@�?" ��>��=��z?P���������h
H�_P���f?��A�"�@���@t{:��J�@�zg?�7o��>��3�@�?����2J?e� ?�I>�s@�_׿" �'� ��H �>=��}?P���������h
H�J�@�zg?�7o��>��3�@�?����2J?e� ?�I>�s@�_׿N �>�/]?ՙ�?�I>�s@�_׿" @ڦ>p��<L��==r���P���������h
H����2J?e� ?�I>�s@�_׿N �>�/]?ՙ�?�I>�s@�_׿���?z�]?j�-����?,�@�\f@" ���=�l=���=��?P���������h
HN �>�/]?ՙ�?�I>�s@�_׿���?z�]?j�-����?,�@�\f@B(@.�^?��?gC�@y�@���" �m=.W�=��>=.��?P���������h
H���?z�]?j�-����?,�@�\f@B(@.�^?��?gC�@y�@���8�@�pG?�Y��Dߓ��c�@8�5>" �_�=��=9H�=ø?P���������h
HB(@.�^?��?gC�@y�@���8�@�pG?�Y��Dߓ��c�@8�5>�}4���G?��?�;����@���@" �����=c�=>=��{?P���������h
H8�@�pG?�Y��Dߓ��c�@8�5>�}4���G?��?�;����@���@� .@D6\?�,Ả�?���@0c<�" �?�YR�| �>=S�z?P���������h
H�}4���G?��?�;����@���@� .@D6\?�,Ả�?���@0c<��3m@n�a?C����^����@��\�" �q�=�Ž��=(�>@P���������h
H� .@D6\?�,Ả�?���@0c<��3m@n�a?C����^����@��\��C��Z?"'R��y�@c@ uǿ" Є���bb���q>=u�~?P���������j
H,]�@@q�R@S�:@  @g�0@" =Ď|?@P���������h
H,]�@@q�R@S�:@  @g�0@ӿ�@cM�@��1@S�:@  @g�0@" ;V��ª�(�p�=��<�P���������h
H,]�@@q�R@S�:@  @g�0@ӿ�@cM�@��1@S�:@  @g�0@�.-@ڒ@��@S�:@  @g�0@" �⽪���S�=�q��P���������h
Hӿ�@cM�@��1@S�:@  @g�0@�.-@ڒ@��@S�:@  @g�0@L�@A�� @�%@2��@PD�?" O��G������<=���?P���������h
H�.-@ڒ@��@S�:@  @g�0@L�@A�� @�%@2��@PD�?i����M?�}*>k���U�@�#@" �b<�d�۾K���=;?P���������h
HL�@A�� @�%@2��@PD�?i����M?�}*>k���U�@�#@����O?�t�@��&�d;a@�~o�" �'���f���-�>=�t�?P���������h
Hi����M?�}*>k���U�@�#@����O?�t�@��&�d;a@�~o�0m��xB?��-���&�d;a@�~o�" �z>_����վ=)���P���������h
H����O?�t�@��&�d;a@�~o�0m��xB?��-���&�d;a@�~o�%���K�a?*���k忺;�@1i�@" ��K�$�d��=�?P���������h
H0m��xB?��-���&�d;a@�~o�%���K�a?*���k忺;�@1i�@x���۳n?g��@�����Q@ ]�?" :$�W�~��?="\{?P���������h
H%���K�a?*���k忺;�@1i�@x���۳n?g��@�����Q@ ]�?J��Tj??��
@
?�b@�;S�" ������<��=m�?P���������h
Hx���۳n?g��@�����Q@ ]�?J��Tj??��
@
?�b@�;S� �8�r��@P�\?�s¾v @��@" �N?Pp��6��=��}?P���������h
HJ��Tj??��
@
?�b@�;S� �8�r��@P�\?�s¾v @��@b���@�u9@�s¾v @��@" F4H=�;�e >=��L�P���������h
H �8�r��@P�\?�s¾v @��@b���@�u9@�s¾v @��@�@�[?��x����hK�@歝�" �ӎ>��P�{�q�=f��?P���������h
Hb���@�u9@�s¾v @��@�@�[?��x����hK�@歝�E:��~o?v������hK�@歝�" �� ��[������=B�D�P���������h
H�@�[?��x����hK�@歝�E:��~o?v������hK�@歝�`nۿD�J?�{!�b����g@M{o@" `��>3"�b��>=Z��?P���������h
HE:��~o?v������hK�@歝�`nۿD�J?�{!�b����g@M{o@C�?|q?'�@�? �@�3�" �'">�5H�:�?=f~?P��������� ExpertBouncer� -䪌A

1001
demos/ExpertCrawlerSta.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertGrid.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertHallway.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertPush.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertPyramid.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertReacher.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertSoccerGoal.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertSoccerStri.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertTennis.demo
文件差异内容过多而无法显示
查看文件

1001
demos/ExpertWalker.demo
文件差异内容过多而无法显示
查看文件

1
ml-agents/mlagents/trainers/components/bc/__init__.py


from .module import BCModule

101
ml-agents/mlagents/trainers/components/bc/model.py


import tensorflow as tf
import numpy as np
from mlagents.trainers.models import LearningModel
class BCModel(object):
def __init__(
self,
policy_model: LearningModel,
learning_rate: float = 3e-4,
anneal_steps: int = 0,
):
"""
Tensorflow operations to perform Behavioral Cloning on a Policy model
:param policy_model: The policy of the learning algorithm
:param lr: The initial learning Rate for behavioral cloning
:param anneal_steps: Number of steps over which to anneal BC training
"""
self.policy_model = policy_model
self.expert_visual_in = self.policy_model.visual_in
self.obs_in_expert = self.policy_model.vector_in
self.make_inputs()
self.create_loss(learning_rate, anneal_steps)
def make_inputs(self) -> None:
"""
Creates the input layers for the discriminator
"""
self.done_expert = tf.placeholder(shape=[None, 1], dtype=tf.float32)
self.done_policy = tf.placeholder(shape=[None, 1], dtype=tf.float32)
if self.policy_model.brain.vector_action_space_type == "continuous":
action_length = self.policy_model.act_size[0]
self.action_in_expert = tf.placeholder(
shape=[None, action_length], dtype=tf.float32
)
self.expert_action = tf.identity(self.action_in_expert)
else:
action_length = len(self.policy_model.act_size)
self.action_in_expert = tf.placeholder(
shape=[None, action_length], dtype=tf.int32
)
self.expert_action = tf.concat(
[
tf.one_hot(
self.action_in_expert[:, i], self.policy_model.act_size[i]
)
for i in range(len(self.policy_model.act_size))
],
axis=1,
)
def create_loss(self, learning_rate: float, anneal_steps: int) -> None:
"""
Creates the loss and update nodes for the BC module
:param learning_rate: The learning rate for the optimizer
:param anneal_steps: Number of steps over which to anneal the learning_rate
"""
selected_action = self.policy_model.output
action_size = self.policy_model.act_size
if self.policy_model.brain.vector_action_space_type == "continuous":
self.loss = tf.reduce_mean(
tf.squared_difference(selected_action, self.expert_action)
)
else:
log_probs = self.policy_model.all_log_probs
action_idx = [0] + list(np.cumsum(action_size))
entropy = tf.reduce_sum(
(
tf.stack(
[
tf.nn.softmax_cross_entropy_with_logits_v2(
labels=tf.nn.softmax(
log_probs[:, action_idx[i] : action_idx[i + 1]]
),
logits=log_probs[:, action_idx[i] : action_idx[i + 1]],
)
for i in range(len(action_size))
],
axis=1,
)
),
axis=1,
)
self.loss = tf.reduce_mean(
-tf.log(tf.nn.softmax(log_probs) + 1e-7) * self.expert_action
)
if anneal_steps > 0:
self.annealed_learning_rate = tf.train.polynomial_decay(
learning_rate,
self.policy_model.global_step,
anneal_steps,
0.0,
power=1.0,
)
else:
self.annealed_learning_rate = learning_rate
optimizer = tf.train.AdamOptimizer(learning_rate=self.annealed_learning_rate)
self.update_batch = optimizer.minimize(self.loss)

172
ml-agents/mlagents/trainers/components/bc/module.py


from typing import Dict, Any
import numpy as np
from mlagents.trainers.tf_policy import TFPolicy
from .model import BCModel
from mlagents.trainers.demo_loader import demo_to_buffer
from mlagents.trainers.trainer import UnityTrainerException
class BCModule:
def __init__(
self,
policy: TFPolicy,
policy_learning_rate: float,
default_batch_size: int,
default_num_epoch: int,
strength: float,
demo_path: str,
steps: int,
batch_size: int = None,
num_epoch: int = None,
samples_per_update: int = 0,
):
"""
A BC trainer that can be used inline with RL, especially for pretraining.
:param policy: The policy of the learning model
:param policy_learning_rate: The initial Learning Rate of the policy. Used to set an appropriate learning rate for the pretrainer.
:param default_batch_size: The default batch size to use if batch_size isn't provided.
:param default_num_epoch: The default num_epoch to use if num_epoch isn't provided.
:param strength: The proportion of learning rate used to update through BC.
:param steps: The number of steps to anneal BC training over. 0 for continuous training.
:param demo_path: The path to the demonstration file.
:param batch_size: The batch size to use during BC training.
:param num_epoch: Number of epochs to train for during each update.
:param samples_per_update: Maximum number of samples to train on during each pretraining update.
"""
self.policy = policy
self.current_lr = policy_learning_rate * strength
self.model = BCModel(policy.model, self.current_lr, steps)
_, self.demonstration_buffer = demo_to_buffer(demo_path, policy.sequence_length)
self.batch_size = batch_size if batch_size else default_batch_size
self.num_epoch = num_epoch if num_epoch else default_num_epoch
self.n_sequences = max(
min(
self.batch_size, len(self.demonstration_buffer.update_buffer["actions"])
)
// policy.sequence_length,
1,
)
self.has_updated = False
self.use_recurrent = self.policy.use_recurrent
self.samples_per_update = samples_per_update
self.out_dict = {
"loss": self.model.loss,
"update": self.model.update_batch,
"learning_rate": self.model.annealed_learning_rate,
}
@staticmethod
def check_config(config_dict: Dict[str, Any]) -> None:
"""
Check the pretraining config for the required keys.
:param config_dict: Pretraining section of trainer_config
"""
param_keys = ["strength", "demo_path", "steps"]
for k in param_keys:
if k not in config_dict:
raise UnityTrainerException(
"The required pre-training hyper-parameter {0} was not defined. Please check your \
trainer YAML file.".format(
k
)
)
def update(self) -> Dict[str, Any]:
"""
Updates model using buffer.
:param max_batches: The maximum number of batches to use per update.
:return: The loss of the update.
"""
# Don't continue training if the learning rate has reached 0, to reduce training time.
if self.current_lr <= 0:
return {"Losses/Pretraining Loss": 0}
batch_losses = []
possible_demo_batches = (
len(self.demonstration_buffer.update_buffer["actions"]) // self.n_sequences
)
possible_batches = possible_demo_batches
max_batches = self.samples_per_update // self.n_sequences
n_epoch = self.num_epoch
for _ in range(n_epoch):
self.demonstration_buffer.update_buffer.shuffle()
if max_batches == 0:
num_batches = possible_batches
else:
num_batches = min(possible_batches, max_batches)
for i in range(num_batches):
demo_update_buffer = self.demonstration_buffer.update_buffer
start = i * self.n_sequences
end = (i + 1) * self.n_sequences
mini_batch_demo = demo_update_buffer.make_mini_batch(start, end)
run_out = self._update_batch(mini_batch_demo, self.n_sequences)
loss = run_out["loss"]
self.current_lr = run_out["learning_rate"]
batch_losses.append(loss)
self.has_updated = True
update_stats = {"Losses/Pretraining Loss": np.mean(batch_losses)}
return update_stats
def _update_batch(
self, mini_batch_demo: Dict[str, Any], n_sequences: int
) -> Dict[str, Any]:
"""
Helper function for update_batch.
"""
feed_dict = {
self.policy.model.batch_size: n_sequences,
self.policy.model.sequence_length: self.policy.sequence_length,
}
if self.policy.model.brain.vector_action_space_type == "continuous":
feed_dict[self.model.action_in_expert] = mini_batch_demo["actions"].reshape(
[-1, self.policy.model.brain.vector_action_space_size[0]]
)
feed_dict[self.policy.model.epsilon] = np.random.normal(
size=(1, self.policy.model.act_size[0])
)
else:
feed_dict[self.model.action_in_expert] = mini_batch_demo["actions"].reshape(
[-1, len(self.policy.model.brain.vector_action_space_size)]
)
feed_dict[self.policy.model.action_masks] = np.ones(
(
self.n_sequences,
sum(self.policy.model.brain.vector_action_space_size),
)
)
if self.policy.model.brain.vector_observation_space_size > 0:
apparent_obs_size = (
self.policy.model.brain.vector_observation_space_size
* self.policy.model.brain.num_stacked_vector_observations
)
feed_dict[self.policy.model.vector_in] = mini_batch_demo[
"vector_obs"
].reshape([-1, apparent_obs_size])
for i, _ in enumerate(self.policy.model.visual_in):
visual_obs = mini_batch_demo["visual_obs%d" % i]
if self.policy.sequence_length > 1 and self.policy.use_recurrent:
(_batch, _seq, _w, _h, _c) = visual_obs.shape
feed_dict[self.policy.model.visual_in[i]] = visual_obs.reshape(
[-1, _w, _h, _c]
)
else:
feed_dict[self.policy.model.visual_in[i]] = visual_obs
if self.use_recurrent:
feed_dict[self.policy.model.memory_in] = np.zeros(
[self.n_sequences, self.policy.m_size]
)
if not self.policy.model.brain.vector_action_space_type == "continuous":
feed_dict[self.policy.model.prev_action] = mini_batch_demo[
"prev_action"
].reshape([-1, len(self.policy.model.act_size)])
network_out = self.policy.sess.run(
list(self.out_dict.values()), feed_dict=feed_dict
)
run_out = dict(zip(list(self.out_dict.keys()), network_out))
return run_out

1
ml-agents/mlagents/trainers/components/reward_signals/gail/__init__.py


from .signal import GAILRewardSignal

265
ml-agents/mlagents/trainers/components/reward_signals/gail/model.py


from typing import Tuple, List
import tensorflow as tf
from mlagents.trainers.models import LearningModel
class GAILModel(object):
def __init__(
self,
policy_model: LearningModel,
h_size: int = 128,
learning_rate: float = 3e-4,
encoding_size: int = 64,
use_actions: bool = False,
use_vail: bool = False,
):
"""
The initializer for the GAIL reward generator.
https://arxiv.org/abs/1606.03476
:param policy_model: The policy of the learning algorithm
:param h_size: Size of the hidden layer for the discriminator
:param learning_rate: The learning Rate for the discriminator
:param encoding_size: The encoding size for the encoder
:param use_actions: Whether or not to use actions to discriminate
:param use_vail: Whether or not to use a variational bottleneck for the
discriminator. See https://arxiv.org/abs/1810.00821.
"""
self.h_size = h_size
self.z_size = 128
self.alpha = 0.0005
self.mutual_information = 0.5
self.policy_model = policy_model
self.encoding_size = encoding_size
self.use_vail = use_vail
self.use_actions = use_actions # True # Not using actions
self.make_beta()
self.make_inputs()
self.create_network()
self.create_loss(learning_rate)
def make_beta(self) -> None:
"""
Creates the beta parameter and its updater for GAIL
"""
self.beta = tf.get_variable(
"gail_beta",
[],
trainable=False,
dtype=tf.float32,
initializer=tf.ones_initializer(),
)
self.kl_div_input = tf.placeholder(shape=[], dtype=tf.float32)
new_beta = tf.maximum(
self.beta + self.alpha * (self.kl_div_input - self.mutual_information), 1e-7
)
self.update_beta = tf.assign(self.beta, new_beta)
def make_inputs(self) -> None:
"""
Creates the input layers for the discriminator
"""
self.done_expert = tf.placeholder(shape=[None, 1], dtype=tf.float32)
self.done_policy = tf.placeholder(shape=[None, 1], dtype=tf.float32)
if self.policy_model.brain.vector_action_space_type == "continuous":
action_length = self.policy_model.act_size[0]
self.action_in_expert = tf.placeholder(
shape=[None, action_length], dtype=tf.float32
)
self.expert_action = tf.identity(self.action_in_expert)
else:
action_length = len(self.policy_model.act_size)
self.action_in_expert = tf.placeholder(
shape=[None, action_length], dtype=tf.int32
)
self.expert_action = tf.concat(
[
tf.one_hot(
self.action_in_expert[:, i], self.policy_model.act_size[i]
)
for i in range(len(self.policy_model.act_size))
],
axis=1,
)
encoded_policy_list = []
encoded_expert_list = []
if self.policy_model.vec_obs_size > 0:
self.obs_in_expert = tf.placeholder(
shape=[None, self.policy_model.vec_obs_size], dtype=tf.float32
)
if self.policy_model.normalize:
encoded_expert_list.append(
self.policy_model.normalize_vector_obs(self.obs_in_expert)
)
encoded_policy_list.append(
self.policy_model.normalize_vector_obs(self.policy_model.vector_in)
)
else:
encoded_expert_list.append(self.obs_in_expert)
encoded_policy_list.append(self.policy_model.vector_in)
if self.policy_model.vis_obs_size > 0:
self.expert_visual_in: List[tf.Tensor] = []
visual_policy_encoders = []
visual_expert_encoders = []
for i in range(self.policy_model.vis_obs_size):
# Create input ops for next (t+1) visual observations.
visual_input = self.policy_model.create_visual_input(
self.policy_model.brain.camera_resolutions[i],
name="visual_observation_" + str(i),
)
self.expert_visual_in.append(visual_input)
encoded_policy_visual = self.policy_model.create_visual_observation_encoder(
self.policy_model.visual_in[i],
self.encoding_size,
LearningModel.swish,
1,
"stream_{}_visual_obs_encoder".format(i),
False,
)
encoded_expert_visual = self.policy_model.create_visual_observation_encoder(
self.expert_visual_in[i],
self.encoding_size,
LearningModel.swish,
1,
"stream_{}_visual_obs_encoder".format(i),
True,
)
visual_policy_encoders.append(encoded_policy_visual)
visual_expert_encoders.append(encoded_expert_visual)
hidden_policy_visual = tf.concat(visual_policy_encoders, axis=1)
hidden_expert_visual = tf.concat(visual_expert_encoders, axis=1)
encoded_policy_list.append(hidden_policy_visual)
encoded_expert_list.append(hidden_expert_visual)
self.encoded_expert = tf.concat(encoded_expert_list, axis=1)
self.encoded_policy = tf.concat(encoded_policy_list, axis=1)
def create_encoder(
self, state_in: tf.Tensor, action_in: tf.Tensor, done_in: tf.Tensor, reuse: bool
) -> Tuple[tf.Tensor, tf.Tensor]:
"""
Creates the encoder for the discriminator
:param state_in: The encoded observation input
:param action_in: The action input
:param done_in: The done flags input
:param reuse: If true, the weights will be shared with the previous encoder created
"""
with tf.variable_scope("GAIL_model"):
if self.use_actions:
concat_input = tf.concat([state_in, action_in, done_in], axis=1)
else:
concat_input = state_in
hidden_1 = tf.layers.dense(
concat_input,
self.h_size,
activation=LearningModel.swish,
name="d_hidden_1",
reuse=reuse,
)
hidden_2 = tf.layers.dense(
hidden_1,
self.h_size,
activation=LearningModel.swish,
name="d_hidden_2",
reuse=reuse,
)
z_mean = None
if self.use_vail:
# Latent representation
z_mean = tf.layers.dense(
hidden_2,
self.z_size,
reuse=reuse,
name="z_mean",
kernel_initializer=LearningModel.scaled_init(0.01),
)
self.noise = tf.random_normal(tf.shape(z_mean), dtype=tf.float32)
# Sampled latent code
self.z = z_mean + self.z_sigma * self.noise * self.use_noise
estimate_input = self.z
else:
estimate_input = hidden_2
estimate = tf.layers.dense(
estimate_input,
1,
activation=tf.nn.sigmoid,
name="d_estimate",
reuse=reuse,
)
return estimate, z_mean
def create_network(self) -> None:
"""
Helper for creating the intrinsic reward nodes
"""
if self.use_vail:
self.z_sigma = tf.get_variable(
"sigma_vail",
self.z_size,
dtype=tf.float32,
initializer=tf.ones_initializer(),
)
self.z_sigma_sq = self.z_sigma * self.z_sigma
self.z_log_sigma_sq = tf.log(self.z_sigma_sq + 1e-7)
self.use_noise = tf.placeholder(
shape=[1], dtype=tf.float32, name="NoiseLevel"
)
self.expert_estimate, self.z_mean_expert = self.create_encoder(
self.encoded_expert, self.expert_action, self.done_expert, reuse=False
)
self.policy_estimate, self.z_mean_policy = self.create_encoder(
self.encoded_policy,
self.policy_model.selected_actions,
self.done_policy,
reuse=True,
)
self.discriminator_score = tf.reshape(
self.policy_estimate, [-1], name="GAIL_reward"
)
self.intrinsic_reward = -tf.log(1.0 - self.discriminator_score + 1e-7)
def create_loss(self, learning_rate: float) -> None:
"""
Creates the loss and update nodes for the GAIL reward generator
:param learning_rate: The learning rate for the optimizer
"""
self.mean_expert_estimate = tf.reduce_mean(self.expert_estimate)
self.mean_policy_estimate = tf.reduce_mean(self.policy_estimate)
self.discriminator_loss = -tf.reduce_mean(
tf.log(self.expert_estimate + 1e-7)
+ tf.log(1.0 - self.policy_estimate + 1e-7)
)
if self.use_vail:
# KL divergence loss (encourage latent representation to be normal)
self.kl_loss = tf.reduce_mean(
-tf.reduce_sum(
1
+ self.z_log_sigma_sq
- 0.5 * tf.square(self.z_mean_expert)
- 0.5 * tf.square(self.z_mean_policy)
- tf.exp(self.z_log_sigma_sq),
1,
)
)
self.loss = (
self.beta * (self.kl_loss - self.mutual_information)
+ self.discriminator_loss
)
else:
self.loss = self.discriminator_loss
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
self.update_batch = optimizer.minimize(self.loss)

270
ml-agents/mlagents/trainers/components/reward_signals/gail/signal.py


from typing import Any, Dict, List
import logging
import numpy as np
import tensorflow as tf
from mlagents.envs.brain import BrainInfo
from mlagents.trainers.buffer import Buffer
from mlagents.trainers.components.reward_signals import RewardSignal, RewardSignalResult
from mlagents.trainers.tf_policy import TFPolicy
from .model import GAILModel
from mlagents.trainers.demo_loader import demo_to_buffer
LOGGER = logging.getLogger("mlagents.trainers")
class GAILRewardSignal(RewardSignal):
def __init__(
self,
policy: TFPolicy,
strength: float,
gamma: float,
demo_path: str,
num_epoch: int = 3,
encoding_size: int = 64,
learning_rate: float = 3e-4,
samples_per_update: int = 0,
use_actions: bool = False,
use_vail: bool = False,
):
"""
The GAIL Reward signal generator. https://arxiv.org/abs/1606.03476
:param policy: The policy of the learning model
:param strength: The scaling parameter for the reward. The scaled reward will be the unscaled
reward multiplied by the strength parameter
:param gamma: The time discounting factor used for this reward.
:param demo_path: The path to the demonstration file
:param encoding_size: The size of the the hidden layers of the discriminator
:param learning_rate: The Learning Rate used during GAIL updates.
:param samples_per_update: The maximum number of samples to update during GAIL updates.
:param use_actions: Whether or not to use the actions for the discriminator.
:param use_vail: Whether or not to use a variational bottleneck for the discriminator.
See https://arxiv.org/abs/1810.00821.
"""
super().__init__(policy, strength, gamma)
self.num_epoch = num_epoch
self.samples_per_update = samples_per_update
self.model = GAILModel(
policy.model, 128, learning_rate, encoding_size, use_actions, use_vail
)
_, self.demonstration_buffer = demo_to_buffer(demo_path, policy.sequence_length)
self.has_updated = False
def evaluate(
self, current_info: BrainInfo, next_info: BrainInfo
) -> RewardSignalResult:
if len(current_info.agents) == 0:
return []
feed_dict: Dict[tf.Tensor, Any] = {
self.policy.model.batch_size: len(next_info.vector_observations),
self.policy.model.sequence_length: 1,
}
if self.model.use_vail:
feed_dict[self.model.use_noise] = [0]
feed_dict = self.policy.fill_eval_dict(feed_dict, brain_info=current_info)
feed_dict[self.model.done_policy] = np.reshape(next_info.local_done, [-1, 1])
if self.policy.use_continuous_act:
feed_dict[
self.policy.model.selected_actions
] = next_info.previous_vector_actions
else:
feed_dict[
self.policy.model.action_holder
] = next_info.previous_vector_actions
if self.policy.use_recurrent:
if current_info.memories.shape[1] == 0:
current_info.memories = self.policy.make_empty_memory(
len(current_info.agents)
)
feed_dict[self.policy.model.memory_in] = current_info.memories
unscaled_reward = self.policy.sess.run(
self.model.intrinsic_reward, feed_dict=feed_dict
)
scaled_reward = unscaled_reward * float(self.has_updated) * self.strength
return RewardSignalResult(scaled_reward, unscaled_reward)
@classmethod
def check_config(
cls, config_dict: Dict[str, Any], param_keys: List[str] = None
) -> None:
"""
Checks the config and throw an exception if a hyperparameter is missing. GAIL requires strength and gamma
at minimum.
"""
param_keys = ["strength", "gamma", "demo_path"]
super().check_config(config_dict, param_keys)
def update(self, update_buffer: Buffer, n_sequences: int) -> Dict[str, float]:
"""
Updates model using buffer.
:param update_buffer: The policy buffer containing the trajectories for the current policy.
:param n_sequences: The number of sequences from demo and policy used in each mini batch.
:return: The loss of the update.
"""
batch_losses = []
# Divide by 2 since we have two buffers, so we have roughly the same batch size
n_sequences = max(n_sequences // 2, 1)
possible_demo_batches = (
len(self.demonstration_buffer.update_buffer["actions"]) // n_sequences
)
possible_policy_batches = len(update_buffer["actions"]) // n_sequences
possible_batches = min(possible_policy_batches, possible_demo_batches)
max_batches = self.samples_per_update // n_sequences
kl_loss = []
policy_estimate = []
expert_estimate = []
z_log_sigma_sq = []
z_mean_expert = []
z_mean_policy = []
n_epoch = self.num_epoch
for _epoch in range(n_epoch):
self.demonstration_buffer.update_buffer.shuffle()
update_buffer.shuffle()
if max_batches == 0:
num_batches = possible_batches
else:
num_batches = min(possible_batches, max_batches)
for i in range(num_batches):
demo_update_buffer = self.demonstration_buffer.update_buffer
policy_update_buffer = update_buffer
start = i * n_sequences
end = (i + 1) * n_sequences
mini_batch_demo = demo_update_buffer.make_mini_batch(start, end)
mini_batch_policy = policy_update_buffer.make_mini_batch(start, end)
run_out = self._update_batch(mini_batch_demo, mini_batch_policy)
loss = run_out["gail_loss"]
policy_estimate.append(run_out["policy_estimate"])
expert_estimate.append(run_out["expert_estimate"])
if self.model.use_vail:
kl_loss.append(run_out["kl_loss"])
z_log_sigma_sq.append(run_out["z_log_sigma_sq"])
z_mean_policy.append(run_out["z_mean_policy"])
z_mean_expert.append(run_out["z_mean_expert"])
batch_losses.append(loss)
self.has_updated = True
print_list = ["n_epoch", "beta", "policy_estimate", "expert_estimate"]
print_vals = [
n_epoch,
self.policy.sess.run(self.model.beta),
np.mean(policy_estimate),
np.mean(expert_estimate),
]
if self.model.use_vail:
print_list += [
"kl_loss",
"z_mean_expert",
"z_mean_policy",
"z_log_sigma_sq",
]
print_vals += [
np.mean(kl_loss),
np.mean(z_mean_expert),
np.mean(z_mean_policy),
np.mean(z_log_sigma_sq),
]
LOGGER.debug(
"GAIL Debug:\n\t\t"
+ "\n\t\t".join(
"{0}: {1}".format(_name, _val)
for _name, _val in zip(print_list, print_vals)
)
)
update_stats = {"Losses/GAIL Loss": np.mean(batch_losses)}
return update_stats
def _update_batch(
self,
mini_batch_demo: Dict[str, np.ndarray],
mini_batch_policy: Dict[str, np.ndarray],
) -> Dict[str, float]:
"""
Helper method for update.
:param mini_batch_demo: A mini batch of expert trajectories
:param mini_batch_policy: A mini batch of trajectories sampled from the current policy
:return: Output from update process.
"""
feed_dict: Dict[tf.Tensor, Any] = {
self.model.done_expert: mini_batch_demo["done"].reshape([-1, 1]),
self.model.done_policy: mini_batch_policy["done"].reshape([-1, 1]),
}
if self.model.use_vail:
feed_dict[self.model.use_noise] = [1]
if self.policy.use_continuous_act:
feed_dict[self.policy.model.selected_actions] = mini_batch_policy[
"actions"
].reshape([-1, self.policy.model.act_size[0]])
feed_dict[self.model.action_in_expert] = mini_batch_demo["actions"].reshape(
[-1, self.policy.model.act_size[0]]
)
else:
feed_dict[self.policy.model.action_holder] = mini_batch_policy[
"actions"
].reshape([-1, len(self.policy.model.act_size)])
feed_dict[self.model.action_in_expert] = mini_batch_demo["actions"].reshape(
[-1, len(self.policy.model.act_size)]
)
if self.policy.use_vis_obs > 0:
for i in range(len(self.policy.model.visual_in)):
policy_obs = mini_batch_policy["visual_obs%d" % i]
if self.policy.sequence_length > 1 and self.policy.use_recurrent:
(_batch, _seq, _w, _h, _c) = policy_obs.shape
feed_dict[self.policy.model.visual_in[i]] = policy_obs.reshape(
[-1, _w, _h, _c]
)
else:
feed_dict[self.policy.model.visual_in[i]] = policy_obs
demo_obs = mini_batch_demo["visual_obs%d" % i]
if self.policy.sequence_length > 1 and self.policy.use_recurrent:
(_batch, _seq, _w, _h, _c) = demo_obs.shape
feed_dict[self.model.expert_visual_in[i]] = demo_obs.reshape(
[-1, _w, _h, _c]
)
else:
feed_dict[self.model.expert_visual_in[i]] = demo_obs
if self.policy.use_vec_obs:
feed_dict[self.policy.model.vector_in] = mini_batch_policy[
"vector_obs"
].reshape([-1, self.policy.vec_obs_size])
feed_dict[self.model.obs_in_expert] = mini_batch_demo["vector_obs"].reshape(
[-1, self.policy.vec_obs_size]
)
out_dict = {
"gail_loss": self.model.loss,
"update_batch": self.model.update_batch,
"policy_estimate": self.model.policy_estimate,
"expert_estimate": self.model.expert_estimate,
}
if self.model.use_vail:
out_dict["kl_loss"] = self.model.kl_loss
out_dict["z_log_sigma_sq"] = self.model.z_log_sigma_sq
out_dict["z_mean_expert"] = self.model.z_mean_expert
out_dict["z_mean_policy"] = self.model.z_mean_policy
run_out = self.policy.sess.run(out_dict, feed_dict=feed_dict)
if self.model.use_vail:
self.update_beta(run_out["kl_loss"])
return run_out
def update_beta(self, kl_div: float) -> None:
"""
Updates the Beta parameter with the latest kl_divergence value.
The larger Beta, the stronger the importance of the kl divergence in the loss function.
:param kl_div: The KL divergence
"""
self.policy.sess.run(
self.model.update_beta, feed_dict={self.model.kl_div_input: kl_div}
)

60
ml-agents/mlagents/trainers/tests/test_demo_dir/test.demo


Test9 -��@**0: Ball3DBrain7
�k?����<� ���@HZ��"P���������<
�k?����<� ���;|@HZ���"{�"=���=P���������<
�k?����<� ��0r@HZ���"��"=���=P���������<
�k?����<� ���a@HZ��Z<�"=���=P���������<
�k?����<� ���BK@HZ���"{�"=���=P���������<
�k?����<� ��|a.@HZ������"=���=P���������<
�k?����<� ���8 @HZ��Z��"=���=P���������<
�k?����<� �����?HZ��r���"=���=P���������<
�k?����<� ��0FH?HZ���"��"=���=P���������<
�k?����<�뵾�+?D#���>����7-�>"=���=P���������<
�k?����<`��P�*?�
����">ש��`Ъ>"=���=P���������<
�k?����<@w��� *?����A�3>b���_L�>"=���=P���������<
�k?����< ���0s)?8����ID>
hƼ���>"=���=P���������<
�k?����<����`�(?�!����T>H׼�R�>"=���=P���������<
�k?����<���� (?�I����e>.9�����>"=���=P���������<
�k?����<@u�PV'?�v��Rv>����U$?"=���=P���������<
�k?����<�r[��&?��h��>]��x� ?"=���=P���������<
�k?����<�%@�@�%?hBZ�8׋>�Y �g�?"=���=P���������<
�k?����<@.#��$?�K�A(�>����Y?"=���=P���������<
�k?����<��P�#?P;�`��>�0��$?"=���=P���������<
�k?����<��Ƚ0�"?�*��Τ>m�&�3�,?"=���=P���������<
�k?����<������!?HF� '�>� /�T�5?"=���=P���������<
�k?����<L��p� ?�Rr�>�g7��A>?"=���=P���������<
�k?����< �;�b?���6˽>(�?�\G?"=���=P���������<
�k?����<R6=�'?�?���>`6H�/�O?"=���=P���������<
�k?����<�d�=��? ����l�><�P��rX?"=���=P���������<
�k?����<y>��?�H����> Y��a?"=���=P���������<
�k?����<�i,>P&?�ٽ �>3~a��i?"=���=P���������<
�k?����<@Z>��?HN��K�>��i�ӆr?"=���=P���������<
�k?����<���>p7?�p�=���>�Rr��K{?"=���=P���������<
�k?����<��>��?`�<>��>��z��?"=���=P���������<
�k?����<`d�>�?�s�>�!?����CZ�?"=���=P���������<
�k?����<���>@l?@=�>�G?B�����?"=���=P���������<
�k?����<y�>P�?b?�m?�扽� �?"=���=P���������<
�k?����<��?�?�?]� ?����f�?"=���=P���������<
�k?����<@�?@( ?��<?��?�I�����?"=���=P���������<
�k?����<�� ?L ?�c[?��?�x��<�?"=���=P���������<
�k?����<��/?�b ?�!{?e?6���zp�?"=���=P���������<
�k?����<@W??�k?�ލ?�&?[ߞ�hȤ?"=���=P���������<
�k?����<NO?0g?��?�J!?m �� �?"=���=P���������<
�k?����<��_?pU?hȯ?�n%?�=��Nw�?"=���=P���������<
�k?����<�yp?P6?�c�?=�)?�f��Eα?"=���=P���������<
�k?����<׀?��>ln�?t�-?]����$�?"=���=P���������<
�k?����<X��?���>��?W�1?�ó�A{�?"=���=P���������<
�k?����<���?@�>���?��5?����BѾ?"=���=P���������<
�k?����<���?�g�>@1:?�%���&�?"=���=P���������<
�k?����<�Q�?���>_�@*?>?�L��T|�?"=���=P���������<
�k?����<���? ��>@�`B?A�Ľf��?"=���=P���������<
�k?����<�˸?]�>hd$@�AF?*{��2�?"=���=P���������<
�k?����<���? ��>��.@C?6�K�r-�?"=���=P���������<
�k?����<XN�? U�>P\9@9A?��ڿC��?"=���=P���������>
yM�����=4g=�@,��?"=��@P���������<
yM�����=4g=�;|@,��?�"{�"=���=P���������<
yM�����=4g=0r@,��?�"��"=���=P���������<
yM�����=4g=�a@,��?Z<�"=���=P���������<
yM�����=4g=�BK@,��?�"{�"=���=P���������<
yM�����=4g=|a.@,��?����"=���=P���������Test9 -��@

60
ml-agents/mlagents/trainers/tests/test_demo_dir/test2.demo


Test9 -��@**0: Ball3DBrain7
�k?����<� ���@HZ��"P���������<
�k?����<� ���;|@HZ���"{�"=���=P���������<
�k?����<� ��0r@HZ���"��"=���=P���������<
�k?����<� ���a@HZ��Z<�"=���=P���������<
�k?����<� ���BK@HZ���"{�"=���=P���������<
�k?����<� ��|a.@HZ������"=���=P���������<
�k?����<� ���8 @HZ��Z��"=���=P���������<
�k?����<� �����?HZ��r���"=���=P���������<
�k?����<� ��0FH?HZ���"��"=���=P���������<
�k?����<�뵾�+?D#���>����7-�>"=���=P���������<
�k?����<`��P�*?�
����">ש��`Ъ>"=���=P���������<
�k?����<@w��� *?����A�3>b���_L�>"=���=P���������<
�k?����< ���0s)?8����ID>
hƼ���>"=���=P���������<
�k?����<����`�(?�!����T>H׼�R�>"=���=P���������<
�k?����<���� (?�I����e>.9�����>"=���=P���������<
�k?����<@u�PV'?�v��Rv>����U$?"=���=P���������<
�k?����<�r[��&?��h��>]��x� ?"=���=P���������<
�k?����<�%@�@�%?hBZ�8׋>�Y �g�?"=���=P���������<
�k?����<@.#��$?�K�A(�>����Y?"=���=P���������<
�k?����<��P�#?P;�`��>�0��$?"=���=P���������<
�k?����<��Ƚ0�"?�*��Τ>m�&�3�,?"=���=P���������<
�k?����<������!?HF� '�>� /�T�5?"=���=P���������<
�k?����<L��p� ?�Rr�>�g7��A>?"=���=P���������<
�k?����< �;�b?���6˽>(�?�\G?"=���=P���������<
�k?����<R6=�'?�?���>`6H�/�O?"=���=P���������<
�k?����<�d�=��? ����l�><�P��rX?"=���=P���������<
�k?����<y>��?�H����> Y��a?"=���=P���������<
�k?����<�i,>P&?�ٽ �>3~a��i?"=���=P���������<
�k?����<@Z>��?HN��K�>��i�ӆr?"=���=P���������<
�k?����<���>p7?�p�=���>�Rr��K{?"=���=P���������<
�k?����<��>��?`�<>��>��z��?"=���=P���������<
�k?����<`d�>�?�s�>�!?����CZ�?"=���=P���������<
�k?����<���>@l?@=�>�G?B�����?"=���=P���������<
�k?����<y�>P�?b?�m?�扽� �?"=���=P���������<
�k?����<��?�?�?]� ?����f�?"=���=P���������<
�k?����<@�?@( ?��<?��?�I�����?"=���=P���������<
�k?����<�� ?L ?�c[?��?�x��<�?"=���=P���������<
�k?����<��/?�b ?�!{?e?6���zp�?"=���=P���������<
�k?����<@W??�k?�ލ?�&?[ߞ�hȤ?"=���=P���������<
�k?����<NO?0g?��?�J!?m �� �?"=���=P���������<
�k?����<��_?pU?hȯ?�n%?�=��Nw�?"=���=P���������<
�k?����<�yp?P6?�c�?=�)?�f��Eα?"=���=P���������<
�k?����<׀?��>ln�?t�-?]����$�?"=���=P���������<
�k?����<X��?���>��?W�1?�ó�A{�?"=���=P���������<
�k?����<���?@�>���?��5?����BѾ?"=���=P���������<
�k?����<���?�g�>@1:?�%���&�?"=���=P���������<
�k?����<�Q�?���>_�@*?>?�L��T|�?"=���=P���������<
�k?����<���? ��>@�`B?A�Ľf��?"=���=P���������<
�k?����<�˸?]�>hd$@�AF?*{��2�?"=���=P���������<
�k?����<���? ��>��.@C?6�K�r-�?"=���=P���������<
�k?����<XN�? U�>P\9@9A?��ڿC��?"=���=P���������>
yM�����=4g=�@,��?"=��@P���������<
yM�����=4g=�;|@,��?�"{�"=���=P���������<
yM�����=4g=0r@,��?�"��"=���=P���������<
yM�����=4g=�a@,��?Z<�"=���=P���������<
yM�����=4g=�BK@,��?�"{�"=���=P���������<
yM�����=4g=|a.@,��?����"=���=P���������Test9 -��@

60
ml-agents/mlagents/trainers/tests/test_demo_dir/test3.demo


Test9 -��@**0: Ball3DBrain7
�k?����<� ���@HZ��"P���������<
�k?����<� ���;|@HZ���"{�"=���=P���������<
�k?����<� ��0r@HZ���"��"=���=P���������<
�k?����<� ���a@HZ��Z<�"=���=P���������<
�k?����<� ���BK@HZ���"{�"=���=P���������<
�k?����<� ��|a.@HZ������"=���=P���������<
�k?����<� ���8 @HZ��Z��"=���=P���������<
�k?����<� �����?HZ��r���"=���=P���������<
�k?����<� ��0FH?HZ���"��"=���=P���������<
�k?����<�뵾�+?D#���>����7-�>"=���=P���������<
�k?����<`��P�*?�
����">ש��`Ъ>"=���=P���������<
�k?����<@w��� *?����A�3>b���_L�>"=���=P���������<
�k?����< ���0s)?8����ID>
hƼ���>"=���=P���������<
�k?����<����`�(?�!����T>H׼�R�>"=���=P���������<
�k?����<���� (?�I����e>.9�����>"=���=P���������<
�k?����<@u�PV'?�v��Rv>����U$?"=���=P���������<
�k?����<�r[��&?��h��>]��x� ?"=���=P���������<
�k?����<�%@�@�%?hBZ�8׋>�Y �g�?"=���=P���������<
�k?����<@.#��$?�K�A(�>����Y?"=���=P���������<
�k?����<��P�#?P;�`��>�0��$?"=���=P���������<
�k?����<��Ƚ0�"?�*��Τ>m�&�3�,?"=���=P���������<
�k?����<������!?HF� '�>� /�T�5?"=���=P���������<
�k?����<L��p� ?�Rr�>�g7��A>?"=���=P���������<
�k?����< �;�b?���6˽>(�?�\G?"=���=P���������<
�k?����<R6=�'?�?���>`6H�/�O?"=���=P���������<
�k?����<�d�=��? ����l�><�P��rX?"=���=P���������<
�k?����<y>��?�H����> Y��a?"=���=P���������<
�k?����<�i,>P&?�ٽ �>3~a��i?"=���=P���������<
�k?����<@Z>��?HN��K�>��i�ӆr?"=���=P���������<
�k?����<���>p7?�p�=���>�Rr��K{?"=���=P���������<
�k?����<��>��?`�<>��>��z��?"=���=P���������<
�k?����<`d�>�?�s�>�!?����CZ�?"=���=P���������<
�k?����<���>@l?@=�>�G?B�����?"=���=P���������<
�k?����<y�>P�?b?�m?�扽� �?"=���=P���������<
�k?����<��?�?�?]� ?����f�?"=���=P���������<
�k?����<@�?@( ?��<?��?�I�����?"=���=P���������<
�k?����<�� ?L ?�c[?��?�x��<�?"=���=P���������<
�k?����<��/?�b ?�!{?e?6���zp�?"=���=P���������<
�k?����<@W??�k?�ލ?�&?[ߞ�hȤ?"=���=P���������<
�k?����<NO?0g?��?�J!?m �� �?"=���=P���������<
�k?����<��_?pU?hȯ?�n%?�=��Nw�?"=���=P���������<
�k?����<�yp?P6?�c�?=�)?�f��Eα?"=���=P���������<
�k?����<׀?��>ln�?t�-?]����$�?"=���=P���������<
�k?����<X��?���>��?W�1?�ó�A{�?"=���=P���������<
�k?����<���?@�>���?��5?����BѾ?"=���=P���������<
�k?����<���?�g�>@1:?�%���&�?"=���=P���������<
�k?����<�Q�?���>_�@*?>?�L��T|�?"=���=P���������<
�k?����<���? ��>@�`B?A�Ľf��?"=���=P���������<
�k?����<�˸?]�>hd$@�AF?*{��2�?"=���=P���������<
�k?����<���? ��>��.@C?6�K�r-�?"=���=P���������<
�k?����<XN�? U�>P\9@9A?��ڿC��?"=���=P���������>
yM�����=4g=�@,��?"=��@P���������<
yM�����=4g=�;|@,��?�"{�"=���=P���������<
yM�����=4g=0r@,��?�"��"=���=P���������<
yM�����=4g=�a@,��?Z<�"=���=P���������<
yM�����=4g=�BK@,��?�"{�"=���=P���������<
yM�����=4g=|a.@,��?����"=���=P���������Test9 -��@

部分文件因为文件数量过多而无法显示

正在加载...
取消
保存