浏览代码

[docs] Link to Imitation Learning docs in Readme, cleanup IL docs (#3582)

/bug-failed-api-check
GitHub 5 年前
当前提交
3e9c8cc7
共有 2 个文件被更改,包括 28 次插入11 次删除
  1. 2
      README.md
  2. 37
      docs/Training-Imitation-Learning.md

2
README.md


* Self-play mechanism for training agents in adversarial scenarios
* Train memory-enhanced agents using deep reinforcement learning
* Easily definable Curriculum Learning and Generalization scenarios
* Built-in support for Imitation Learning
* Built-in support for [Imitation Learning](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Training-Imitation-Learning.md) through Behavioral Cloning or Generative Adversarial Imitation Learning
* Flexible agent control with On Demand Decision Making
* Visualizing network outputs within the environment
* Wrap learning environments as a gym

37
docs/Training-Imitation-Learning.md


of a reward function, we can give the medic real world examples of observations
from the game and actions from a game controller to guide the medic's behavior.
Imitation Learning uses pairs of observations and actions from
a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).
a demonstration to learn a policy.
Imitation learning can also be used to help reinforcement learning. Especially in
environments with sparse (i.e., infrequent or rare) rewards, the agent may never see

</p>
The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations.
In most scenarios, you should combine these two features
In most scenarios, you can combine these two features.
* GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to
reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the

* Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions
shown in a set of demonstrations.
[The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
can be enabled on the PPO or SAC trainer. BC tends to work best when
there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward.
The BC feature can be enabled on the [PPO](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
or [SAC](Training-SAC.md#optional-behavioral-cloning-using-demonstrations) trainer. As BC cannot generalize
past the examples shown in the demonstrations, BC tends to work best when there exists demonstrations
for nearly all of the states that the agent can experience, or in conjunction with GAIL and/or an extrinsic reward.
### How to Choose
### What to Use
If you want to help your agents learn (especially with environments that have sparse rewards)
using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning

## Recording Demonstrations
It is possible to record demonstrations of agent behavior from the Unity Editor,
and save them as assets. These demonstrations contain information on the
Demonstrations of agent behavior can be recorded from the Unity Editor,
and saved as assets. These demonstrations contain information on the
They can be managed from the Editor, as well as used for training with BC and GAIL.
They can be managed in the Editor, as well as used for training with BC and GAIL.
In order to record demonstrations from an agent, add the `Demonstration Recorder`
component to a GameObject in the scene which contains an `Agent` component.

is played from the Editor. Depending on the complexity of the task, anywhere
from a few minutes or a few hours of demonstration data may be necessary to
be useful for imitation learning. When you have recorded enough data, end
the Editor play session, and a `.demo` file will be created in the
the Editor play session. A `.demo` file will be created in the
`Assets/Demonstrations` folder (by default). This file contains the demonstrations.
Clicking on the file will provide metadata about the demonstration in the
inspector.

alt="BC Teacher Helper"
width="375" border="10" />
</p>
You can then specify the path to this file as the `demo_path` in your `trainer_config.yaml` file
when using BC or GAIL. For instance, for BC:
```
behavioral_cloning:
demo_path: <path_to_your_demo_file>
...
```
And for GAIL:
```
reward_signals:
gail:
demo_path: <path_to_your_demo_file>
...
```
正在加载...
取消
保存