浏览代码

Fixing tables in documentation and other markdown errors. (#1199)

/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
020d6e8b
共有 16 个文件被更改,包括 89 次插入90 次删除
  1. 2
      docs/Basic-Guide.md
  2. 2
      docs/FAQ.md
  3. 2
      docs/Feature-Memory.md
  4. 2
      docs/Getting-Started-with-Balance-Ball.md
  5. 8
      docs/Installation.md
  6. 10
      docs/Learning-Environment-Create-New.md
  7. 10
      docs/Learning-Environment-Design-Brains.md
  8. 2
      docs/Learning-Environment-Design-External-Internal-Brains.md
  9. 40
      docs/Learning-Environment-Design-Player-Brains.md
  10. 8
      docs/Learning-Environment-Design.md
  11. 20
      docs/Learning-Environment-Examples.md
  12. 2
      docs/Migrating.md
  13. 50
      docs/Training-ML-Agents.md
  14. 1
      docs/Training-on-Amazon-Web-Service.md
  15. 2
      docs/Using-TensorFlow-Sharp-in-Unity.md
  16. 18
      ml-agents/README.md

2
docs/Basic-Guide.md


- `--train` tells `mlagents-learn` to run a training session (rather
than inference)
4. If you cloned the ML-Agents repo, then you can simply run
5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.

2
docs/FAQ.md


If you try to use ML-Agents in Unity versions 2017.1 - 2017.3, you might
encounter an error that looks like this:
```console
```console
Instance of CoreBrainInternal couldn't be created. The the script
class needs to derive from ScriptableObject.
UnityEngine.ScriptableObject:CreateInstance(String)

2
docs/Feature-Memory.md


## What are memories used for?
Have you ever entered a room to get something and immediately forgot what you
were looking for? Don't let that happen to your agents.
were looking for? Don't let that happen to your agents.
It is now possible to give memories to your agents. When training, the agents
will be able to store a vector of floats to be used next time they need to make

2
docs/Getting-Started-with-Balance-Ball.md


An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing an
Academy and one or more Brain and Agent objects, and, of course, the other
entities that an agent interacts with.
entities that an agent interacts with.
![Unity Editor](images/mlagents-3DBallHierarchy.png)

8
docs/Installation.md


Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
git clone https://github.com/Unity-Technologies/ml-agents.git
```sh
git clone https://github.com/Unity-Technologies/ml-agents.git
```
The `UnitySDK` subdirectory contains the Unity Assets to add to your projects.
It also contains many [example environments](Learning-Environment-Examples.md)

To install the dependencies and `mlagents` Python package, enter the
`ml-agents/` subdirectory and run from the command line:
pip3 install .
```sh
pip3 install .
```
If you installed this correctly, you should be able to run
`mlagents-learn --help`

10
docs/Learning-Environment-Create-New.md


in the scene that represents the Agent in the simulation. Each Agent object
must be assigned a Brain object.
6. If training, set the Brain type to External and
[run the training process](Training-ML-Agents.md).
[run the training process](Training-ML-Agents.md).
**Note:** If you are unfamiliar with Unity, refer to
[Learning the interface](https://docs.unity3d.com/Manual/LearningtheInterface.html)

public override void AgentReset()
{
if (this.transform.position.y < -1.0)
{
{
// The Agent fell
this.transform.position = Vector3.zero;
this.rBody.angularVelocity = Vector3.zero;

There are three kinds of game objects you need to include in your scene in order
to use Unity ML-Agents:
* Academy
* Brain
* Agents
* Academy
* Brain
* Agents
Keep in mind:

10
docs/Learning-Environment-Design-Brains.md


children of the Academy in the Unity scene hierarchy. Every Agent must be
assigned a Brain, but you can use the same Brain with more than one Agent. You
can also create several Brains, attach each of the Brain to one or more than one
Agent.
Agent.
Use the Brain class directly, rather than a subclass. Brain behavior is
determined by the **Brain Type**. The ML-Agents toolkit defines four Brain

The Player, Heuristic and Internal Brains have been updated to support
broadcast. The broadcast feature allows you to collect data from your Agents
using a Python program without controlling them.
using a Python program without controlling them.
### How to use: Unity

the Agents connected to non-External Brains are doing. When calling `step` or
`reset` on your environment, you retrieve a dictionary mapping Brain names to
`BrainInfo` objects. The dictionary contains a `BrainInfo` object for each
non-External Brain set to broadcast as well as for any External Brains.
non-External Brain set to broadcast as well as for any External Brains.
were taken by the Agents at the previous step, not the current one.
were taken by the Agents at the previous step, not the current one.
call `step()` with no arguments.
call `step()` with no arguments.
You can use the broadcast feature to collect data generated by Player,
Heuristics or Internal Brains game sessions. You can then use this data to train

2
docs/Learning-Environment-Design-External-Internal-Brains.md


a Brain component.)
2. Set the **Brain Type** to **Internal**.
**Note:** In order to see the **Internal** Brain Type option, you must
[enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md).
[enable TensorFlowSharp](Using-TensorFlow-Sharp-in-Unity.md).
3. Import the `environment_run-id.bytes` file produced by the PPO training
program. (Where `environment_run-id` is the name of the model file, which is
constructed from the name of your Unity environment executable and the run-id

40
docs/Learning-Environment-Design-Player-Brains.md


can send any number of floating point values (up to the **Vector Action Space
Size** setting).
| **Property** | | **Description** |
| :-- |:-- | :-- |
|**Continuous Player Actions**|| The mapping for the continuous vector action
space. Shown when the action space is **Continuous**|.
|| **Size** | The number of key commands defined. You can assign more than one
command to the same action index in order to send different values for that
action. (If you press both keys at the same time, deterministic results are not guaranteed.)|
||**Element 0–N**| The mapping of keys to action values. |
|| **Key** | The key on the keyboard. |
|| **Index** | The element of the Agent's action vector to set when this key is
pressed. The index value cannot exceed the size of the Action Space (minus 1,
since it is an array index).|
|| **Value** | The value to send to the Agent as its action for the specified
index when the mapped key is pressed. All other members of the action vector
are set to 0. |
|**Discrete Player Actions**|| The mapping for the discrete vector action space.
Shown when the action space is **Discrete**.|
|| **Size** | The number of key commands defined. |
||**Element 0–N**| The mapping of keys to action values. |
|| **Key** | The key on the keyboard. |
|| **Branch Index** |The element of the Agent's action vector to set when this
key is pressed. The index value cannot exceed the size of the Action Space
(minus 1, since it is an array index).|
|| **Value** | The value to send to the Agent as its action when the mapped key
is pressed. Cannot exceed the max value for the associated branch (minus 1,
since it is an array index).|
| **Property** | | **Description** |
| :---------------------------- | :--------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Continuous Player Actions** | | The mapping for the continuous vector action space. Shown when the action space is **Continuous**. |
| | **Size** | The number of key commands defined. You can assign more than one command to the same action index in order to send different values for that action. (If you press both keys at the same time, deterministic results are not guaranteed.) |
| | **Element 0–N** | The mapping of keys to action values. |
| | **Key** | The key on the keyboard. |
| | **Index** | The element of the Agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index). |
| | **Value** | The value to send to the Agent as its action for the specified index when the mapped key is pressed. All other members of the action vector are set to 0. |
| **Discrete Player Actions** | | The mapping for the discrete vector action space. Shown when the action space is **Discrete**. |
| | **Size** | The number of key commands defined. |
| | **Element 0–N** | The mapping of keys to action values. |
| | **Key** | The key on the keyboard. |
| | **Branch Index** | The element of the Agent's action vector to set when this key is pressed. The index value cannot exceed the size of the Action Space (minus 1, since it is an array index). |
| | **Value** | The value to send to the Agent as its action when the mapped key is pressed. Cannot exceed the max value for the associated branch (minus 1, since it is an array index). |
For more information about the Unity input system, see
[Input](https://docs.unity3d.com/ScriptReference/Input.html).

8
docs/Learning-Environment-Design.md


implement the above methods. The `Agent.CollectObservations()` and
`Agent.AgentAction()` functions are required; the other methods are optional —
whether you need to implement them or not depends on your specific scenario.
**Note:** The API used by the Python PPO training process to communicate with
and control the Academy during training can be used for other purposes as well.
For example, you could use the API to use Unity as the simulation engine for

properties is `Max Steps`, which determines how long each training episode
lasts. Once the Academy's step counter reaches this value, it calls the
`AcademyReset()` function to start the next episode.
the Academy properties and their uses.
the Academy properties and their uses.
### Brain

carries out actions. The Agent class is typically attached to the GameObject in
the scene that otherwise represents the actor — for example, to a player object
in a football game or a car object in a vehicle simulation. Every Agent must be
assigned a Brain.
assigned a Brain.
To create an Agent, extend the Agent class and implement the essential
`CollectObservations()` and `AgentAction()` methods:

20
docs/Learning-Environment-Examples.md


* Brains: One Brain with the following observation/action space.
* Vector Observation space: None
* Vector Action space: (Discrete) Size of 4, corresponding to movement in
cardinal directions. Note that for this environment,
cardinal directions. Note that for this environment,
[action masking](Learning-Environment-Design-Agents.md#masking-discrete-actions)
is turned on by default (this option can be toggled
using the `Mask Actions` checkbox within the `trueAgent` GameObject).

* Set-up: A platforming environment where the agent can push a block around.
* Goal: The agent must push the block to the goal.
* Agents: The environment contains one agent linked to a single brain.
* Agent Reward Function:
* -0.0025 for every step.
* +1.0 if the block touches the goal.
* Agent Reward Function:
* -0.0025 for every step.
* +1.0 if the block touches the goal.
* Vector Observation space: (Continuous) 70 variables corresponding to 14
ray-casts each detecting one of three possible objects (wall, goal, or block).
* Vector Action space: (Discrete) Size of 6, corresponding to turn clockwise
and counterclockwise and move along four different face directions.
* Visual Observations (Optional): One first-person camera. Use `VisualPushBlock` scene.
* Vector Observation space: (Continuous) 70 variables corresponding to 14
ray-casts each detecting one of three possible objects (wall, goal, or
block).
* Vector Action space: (Discrete) Size of 6, corresponding to turn clockwise
and counterclockwise and move along four different face directions.
* Visual Observations (Optional): One first-person camera. Use
`VisualPushBlock` scene.
* Reset Parameters: None.
* Benchmark Mean Reward: 4.5
* Optional Imitation Learning scene: `PushBlockIL`.

2
docs/Migrating.md


or later`. 2017.4 is an LTS (Long Term Support) version that helps us
maintain good quality and support. Earlier versions of Unity might still work,
but you may encounter an
[error](FAQ.md#instance-of-corebraininternal-couldnt-be-created) listed here.
[error](FAQ.md#instance-of-corebraininternal-couldnt-be-created) listed here.
### Unity API

50
docs/Training-ML-Agents.md


settings. (This GameObject will be a child of the Academy in your scene.)
Sections for the example environments are included in the provided config file.
| **Setting** | **Description** | **Applies To Trainer\***|
| :-- | :-- | :-- |
| batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
| beta | The strength of entropy regularization.| PPO |
| brain\_to\_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
| buffer_size | The number of experiences to collect before updating the policy model. | PPO |
| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
| curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
| epsilon | Influences how rapidly the policy can evolve during training.| PPO |
| gamma | The reward discount rate for the Generalized Advantage Estimator (GAE). | PPO |
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, BC |
| lambd | The regularization parameter. | PPO |
| learning_rate | The initial learning rate for gradient descent. | PPO, BC |
| max_steps | The maximum number of simulation steps to run during a training session. | PPO, BC |
| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
| normalize | Whether to automatically normalize observations. | PPO |
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
| num_layers | The number of hidden layers in the neural network. | PPO, BC |
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, BC |
| trainer | The type of training to perform: "ppo" or "imitation".| PPO, BC |
| use_curiosity | Train using an additional intrinsic reward signal generated from Intrinsic Curiosity Module. | PPO |
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md).| PPO, BC |
| **Setting** | **Description** | **Applies To Trainer\*** |
| :------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------- |
| batch_size | The number of experiences in each iteration of gradient descent. | PPO, BC |
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model. | BC |
| beta | The strength of entropy regularization. | PPO |
| brain\_to\_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
| buffer_size | The number of experiences to collect before updating the policy model. | PPO |
| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
| curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
| epsilon | Influences how rapidly the policy can evolve during training. | PPO |
| gamma | The reward discount rate for the Generalized Advantage Estimator (GAE). | PPO |
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, BC |
| lambd | The regularization parameter. | PPO |
| learning_rate | The initial learning rate for gradient descent. | PPO, BC |
| max_steps | The maximum number of simulation steps to run during a training session. | PPO, BC |
| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
| normalize | Whether to automatically normalize observations. | PPO |
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
| num_layers | The number of hidden layers in the neural network. | PPO, BC |
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, BC |
| trainer | The type of training to perform: "ppo" or "imitation". | PPO, BC |
| use_curiosity | Train using an additional intrinsic reward signal generated from Intrinsic Curiosity Module. | PPO |
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
\*PPO = Proximal Policy Optimization, BC = Behavioral Cloning (Imitation)

1
docs/Training-on-Amazon-Web-Service.md


env = UnityEnvironment(<your_env>)
```
Where `<your_env>` corresponds to the path to your environment executable.
You should receive a message confirming that the environment was loaded successfully.

2
docs/Using-TensorFlow-Sharp-in-Unity.md


## iOS additional instructions for building
* Before build your game against iOS platform, make sure you've set the
flag `ENABLE_TENSORFLOW` for it.
flag `ENABLE_TENSORFLOW` for it.
* Once you build the project for iOS in the editor, open the .xcodeproj file
within the project folder using Xcode.
* Set up your ios account following the

18
ml-agents/README.md


game engine as well as a collection of trainers and algorithms to train agents
in Unity environments.
The `mlagents` Python package contains two sub packages:
The `mlagents` Python package contains two sub packages:
* `mlagents.envs`: A low level API which
allows you to interact directly with a Unity Environment.
See [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Python-API.md) for more information on using this package.
* `mlagents.envs`: A low level API which allows you to interact directly with a
Unity Environment. See
[here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Python-API.md)
for more information on using this package.
* `mlagents.trainers`: A set of Reinforcement Learning
algorithms designed to be used with Unity environments. Access them using the
`mlagents-learn` access point. See [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md) for more information on using
this package.
* `mlagents.trainers`: A set of Reinforcement Learning algorithms designed to be
used with Unity environments. Access them using the: `mlagents-learn` access
point. See
[here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md)
for more information on using this package.
## Installation

正在加载...
取消
保存