浏览代码
Release v0.8 docs (#1924)
Release v0.8 docs (#1924)
* update title caps * Rename Custom-Protos.md to Creating-Custom-Protobuf-Messages.md * Updated with custom protobuf messages * Cleanup against to our doc guidelines * Minor text revision * Create Training-Concurrent-Unity-Instances * Rename Training-Concurrent-Unity-Instances to Training-Concurrent-Unity-Instances.md * update to right format for --num-envs * added link to concurrent unity instances * Update and rename Training-Concurrent-Unity-Instances.md to Training-Using-Concurrent-Unity-Instances.md * Added considerations section * Update Training-Using-Concurrent-Unity-Instances.md * cleaned up language to match doc * minor updates * retroactive migration from 0.6 to 0.7 * Updated from 0.7 to 0.8 migration * Minor typo * minor fix * accidentally duplicated step * updated with new features list/develop-generalizationTraining-TrainerController
GitHub
6 年前
当前提交
2ca5cd21
共有 8 个文件被更改,包括 217 次插入 和 175 次删除
-
2README.md
-
4docs/Installation.md
-
17docs/Migrating.md
-
2docs/Readme.md
-
7docs/Training-ML-Agents.md
-
168docs/Creating-Custom-Protobuf-Messages.md
-
25docs/Training-Using-Concurrent-Unity-Instances.md
-
167docs/Custom-Protos.md
|
|||
# Creating Custom Protobuf Messages |
|||
|
|||
Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default. |
|||
|
|||
## Implementing a Custom Message |
|||
|
|||
Assume the ml-agents repository is checked out to a folder named $MLAGENTS_ROOT. Whenever you change the fields of a custom message, you must run `$MLAGENTS_ROOT/protobuf-definitions/make.bat` to create C# and Python files corresponding to the new message. Follow the directions in [this file](../protobuf-definitions/README.md) for guidance. After running `$MLAGENTS_ROOT/protobuf-definitions/make.bat`, reinstall the Python package by running `pip install $MLAGENTS_ROOT/ml-agents` and make sure your Unity project is using the newly-generated version of `$MLAGENTS_ROOT/UnitySDK`. |
|||
|
|||
## Custom Message Types |
|||
|
|||
There are three custom message types currently supported - Custom Actions, Custom Reset Parameters, and Custom Observations. In each case, `env` is an instance of a `UnityEnvironment` in Python. |
|||
|
|||
### Custom Actions |
|||
|
|||
By default, the Python API sends actions to Unity in the form of a floating point list and an optional string-valued text action for each agent. |
|||
|
|||
You can define a custom action type, to either replace or augment the default, by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`. |
|||
|
|||
Instances of custom actions are set via the `custom_action` parameter of the `env.step`. An agent receives a custom action by defining a method with the signature: |
|||
|
|||
```csharp |
|||
public virtual void AgentAction(float[] vectorAction, string textAction, CommunicatorObjects.CustomAction customAction) |
|||
``` |
|||
|
|||
Below is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk. |
|||
|
|||
The `custom_action.proto` file looks like: |
|||
|
|||
```protobuf |
|||
syntax = "proto3"; |
|||
|
|||
option csharp_namespace = "MLAgents.CommunicatorObjects"; |
|||
package communicator_objects; |
|||
|
|||
message CustomAction { |
|||
enum Direction { |
|||
NORTH=0; |
|||
SOUTH=1; |
|||
EAST=2; |
|||
WEST=3; |
|||
} |
|||
float walkAmount = 1; |
|||
Direction direction = 2; |
|||
} |
|||
``` |
|||
|
|||
The Python instance of the custom action looks like: |
|||
|
|||
```python |
|||
from mlagents.envs.communicator_objects import CustomAction |
|||
env = mlagents.envs.UnityEnvironment(...) |
|||
... |
|||
action = CustomAction(direction=CustomAction.NORTH, walkAmount=2.0) |
|||
env.step(custom_action=action) |
|||
``` |
|||
|
|||
And the agent code looks like: |
|||
|
|||
```csharp |
|||
... |
|||
using MLAgents; |
|||
using MLAgents.CommunicatorObjects; |
|||
|
|||
class MyAgent : Agent { |
|||
... |
|||
override public void AgentAction(float[] vectorAction, string textAction, CustomAction customAction) { |
|||
switch(customAction.Direction) { |
|||
case CustomAction.Types.Direction.North: |
|||
transform.Translate(0, 0, customAction.WalkAmount); |
|||
break; |
|||
... |
|||
} |
|||
} |
|||
} |
|||
``` |
|||
|
|||
Keep in mind that the protobuffer compiler automatically configures the capitalization scheme of the C# version of the custom field names you defined in the `CustomAction` message to match C# conventions - "NORTH" becomes "North", "walkAmount" becomes "WalkAmount", etc. |
|||
|
|||
### Custom Reset Parameters |
|||
|
|||
By default, you can configure an environment `env` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats. |
|||
|
|||
You can also configure the environment reset using a custom protobuf message. To do this, add fields to the `CustomResetParameters` protobuf message in `custom_reset_parameters.proto`, analogously to `CustomAction` above. Then pass an instance of the message to `env.reset` via the `custom_reset_parameters` keyword parameter. |
|||
|
|||
In Unity, you can then access the `customResetParameters` field of your academy to accesss the values set in your Python script. |
|||
|
|||
In this example, the academy is setting the initial position of a box based on custom reset parameters. The `custom_reset_parameters.proto` would look like: |
|||
|
|||
```protobuf |
|||
message CustomResetParameters { |
|||
message Position { |
|||
float x = 1; |
|||
float y = 2; |
|||
float z = 3; |
|||
} |
|||
message Color { |
|||
float r = 1; |
|||
float g = 2; |
|||
float b = 3; |
|||
} |
|||
Position initialPos = 1; |
|||
Color color = 2; |
|||
} |
|||
``` |
|||
|
|||
The Python instance of the custom reset parameter looks like |
|||
|
|||
```python |
|||
from mlagents.envs.communicator_objects import CustomResetParameters |
|||
env = ... |
|||
pos = CustomResetParameters.Position(x=1, y=1, z=2) |
|||
color = CustomResetParameters.Color(r=.5, g=.1, b=1.0) |
|||
params = CustomResetParameters(initialPos=pos, color=color) |
|||
env.reset(custom_reset_parameters=params) |
|||
``` |
|||
|
|||
The academy looks like |
|||
|
|||
```csharp |
|||
public class MyAcademy : Academy |
|||
{ |
|||
public GameObject box; // This would be connected to a game object in your scene in the Unity editor. |
|||
|
|||
override public void AcademyReset() |
|||
{ |
|||
var boxParams = customResetParameters; |
|||
if (boxParams != null) |
|||
{ |
|||
var pos = boxParams.InitialPos; |
|||
var color = boxParams.Color; |
|||
box.transform.position = new Vector3(pos.X, pos.Y, pos.Z); |
|||
box.GetComponent<Renderer>().material.color = new Color(color.R, color.G, color.B); |
|||
} |
|||
} |
|||
} |
|||
``` |
|||
|
|||
### Custom Observations |
|||
|
|||
By default, Unity returns observations to Python in the form of a floating-point vector. |
|||
|
|||
You can define a custom observation message to supplement that. To do so, add fields to the `CustomObservation` protobuf message in `custom_observation.proto`. |
|||
|
|||
Then in your agent, create an instance of a custom observation via `new CommunicatorObjects.CustomObservation`. Then in `CollectObservations`, call `SetCustomObservation` with the custom observation instance as the parameter. |
|||
|
|||
In Python, the custom observation can be accessed by calling `env.step` or `env.reset` and accessing the `custom_observations` property of the return value. It will contain a list with one `CustomObservation` instance per agent. |
|||
|
|||
For example, if you have added a field called `customField` to the `CustomObservation` message, the agent code looks like: |
|||
|
|||
```csharp |
|||
class MyAgent : Agent { |
|||
override public void CollectObservations() { |
|||
var obs = new CustomObservation(); |
|||
obs.CustomField = 1.0; |
|||
SetCustomObservation(obs); |
|||
} |
|||
} |
|||
``` |
|||
|
|||
In Python, the custom field would be accessed like: |
|||
|
|||
```python |
|||
... |
|||
result = env.step(...) |
|||
result[brain_name].custom_observations[0].customField |
|||
``` |
|||
|
|||
where `brain_name` is the name of the brain attached to the agent. |
|
|||
# Training Using Concurrent Unity Instances |
|||
|
|||
As part of release v0.8, we enabled developers to run concurrent, parallel instances of the Unity executable during training. For certain scenarios, this should speed up the training. |
|||
|
|||
## How to Run Concurrent Unity Instances During Training |
|||
|
|||
Please refer to the general instructions on [Training ML-Agents](Training-ML-Agents.md). In order to run concurrent Unity instances during training, set the number of environment instances using the command line option `--num-envs=<n>` when you invoke `mlagents-learn`. Optionally, you can also set the `--base-port`, which is the starting port used for the concurrent Unity instances. |
|||
|
|||
## Considerations |
|||
|
|||
### Buffer Size |
|||
|
|||
If you are having trouble getting an agent to train, even with multiple concurrent Unity instances, you could increase `buffer_size` in the `config/trainer_config.yaml` file. A common practice is to multiply `buffer_size` by `num-envs`. |
|||
|
|||
### Resource Constraints |
|||
|
|||
Invoking concurrent Unity instances is constrained by the resources on the machine. Please use discretion when setting `--num-envs=<n>`. |
|||
|
|||
### Using num-runs and num-envs |
|||
|
|||
If you set `--num-runs=<n>` greater than 1 and are also invoking concurrent Unity instances using `--num-envs=<n>`, then the number of concurrent Unity instances is equal to `num-runs` times `num-envs`. |
|||
|
|||
### Result Variation Using Concurrent Unity Instances |
|||
|
|||
If you keep all the hyperparameters the same, but change `--num-envs=<n>`, the results and model would likely change. |
|
|||
# Creating custom protobuf messages |
|||
|
|||
Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default. |
|||
|
|||
Assume the ml-agents repository is checked out to a folder named $MLAGENTS_ROOT. Whenever you change the fields of a custom message, you must run `$MLAGENTS_ROOT/protobuf-definitions/make.bat` to create C# and Python files corresponding to the new message. Follow the directions in [this file](../protobuf-definitions/README.md) for guidance. After running it, reinstall the Python package by running `pip install $MLAGENTS_ROOT/ml-agents` and make sure your Unity project is using the newly-generated version of `$MLAGENTS_ROOT/UnitySDK`. |
|||
|
|||
## Custom message types |
|||
|
|||
There are three custom message types currently supported, described below. In each case, `env` is an instance of a `UnityEnvironment` in Python. `CustomAction` is described most thoroughly; usage of the other custom messages follows a similar template. |
|||
|
|||
### Custom actions |
|||
|
|||
By default, the Python API sends actions to Unity in the form of a floating-point list per agent and an optional string-valued text action. |
|||
|
|||
You can define a custom action type to replace or augment this by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`. |
|||
|
|||
Instances of custom actions are set via the `custom_action` parameter of `env.step`. An agent receives a custom action by defining a method with the signature |
|||
|
|||
```csharp |
|||
public virtual void AgentAction(float[] vectorAction, string textAction, CommunicatorObjects.CustomAction customAction) |
|||
``` |
|||
|
|||
Here is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk. |
|||
|
|||
`custom_action.proto` will look like |
|||
|
|||
```protobuf |
|||
syntax = "proto3"; |
|||
|
|||
option csharp_namespace = "MLAgents.CommunicatorObjects"; |
|||
package communicator_objects; |
|||
|
|||
message CustomAction { |
|||
enum Direction { |
|||
NORTH=0; |
|||
SOUTH=1; |
|||
EAST=2; |
|||
WEST=3; |
|||
} |
|||
float walkAmount = 1; |
|||
Direction direction = 2; |
|||
} |
|||
``` |
|||
|
|||
In your Python file, create an instance of a custom action: |
|||
|
|||
```python |
|||
from mlagents.envs.communicator_objects import CustomAction |
|||
env = mlagents.envs.UnityEnvironment(...) |
|||
... |
|||
action = CustomAction(direction=CustomAction.NORTH, walkAmount=2.0) |
|||
env.step(custom_action=action) |
|||
``` |
|||
|
|||
Then in your agent, |
|||
|
|||
```csharp |
|||
... |
|||
using MLAgents; |
|||
using MLAgents.CommunicatorObjects; |
|||
|
|||
class MyAgent : Agent { |
|||
... |
|||
override public void AgentAction(float[] vectorAction, string textAction, CustomAction customAction) { |
|||
switch(customAction.Direction) { |
|||
case CustomAction.Types.Direction.North: |
|||
transform.Translate(0, 0, customAction.WalkAmount); |
|||
break; |
|||
... |
|||
} |
|||
} |
|||
} |
|||
``` |
|||
|
|||
Note that the protobuffer compiler automatically configures the capitalization scheme of the C# version of the custom field names you defined in the `CustomAction` message to match C# conventions - "NORTH" becomes "North", "walkAmount" becomes "WalkAmount", etc. |
|||
|
|||
### Custom reset parameters |
|||
|
|||
By default, you can configure an environment `env ` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats. |
|||
|
|||
You can also configure an environment using a custom protobuf message. To do so, add fields to the `CustomResetParameters` protobuf message in `custom_reset_parameters.proto`, analogously to `CustomAction` above. Then pass an instance of the message to `env.reset` via the `custom_reset_parameters` keyword parameter. |
|||
|
|||
In Unity, you can then access the `customResetParameters` field of your academy to accesss the values set in your Python script. |
|||
|
|||
In this example, an academy is setting the initial position of a box based on custom reset parameters that looks like |
|||
|
|||
```protobuf |
|||
message CustomResetParameters { |
|||
message Position { |
|||
float x = 1; |
|||
float y = 2; |
|||
float z = 3; |
|||
} |
|||
message Color { |
|||
float r = 1; |
|||
float g = 2; |
|||
float b = 3; |
|||
} |
|||
Position initialPos = 1; |
|||
Color color = 2; |
|||
} |
|||
``` |
|||
|
|||
In your academy, you'd have something like |
|||
|
|||
```csharp |
|||
public class MyAcademy : Academy |
|||
{ |
|||
public GameObject box; // This would be connected to a game object in your scene in the Unity editor. |
|||
|
|||
override public void AcademyReset() |
|||
{ |
|||
var boxParams = customResetParameters; |
|||
if (boxParams != null) |
|||
{ |
|||
var pos = boxParams.InitialPos; |
|||
var color = boxParams.Color; |
|||
box.transform.position = new Vector3(pos.X, pos.Y, pos.Z); |
|||
box.GetComponent<Renderer>().material.color = new Color(color.R, color.G, color.B); |
|||
} |
|||
} |
|||
} |
|||
``` |
|||
|
|||
Then in Python, when setting up your scene, you might write |
|||
|
|||
```python |
|||
from mlagents.envs.communicator_objects import CustomResetParameters |
|||
env = ... |
|||
pos = CustomResetParameters.Position(x=1, y=1, z=2) |
|||
color = CustomResetParameters.Color(r=.5, g=.1, b=1.0) |
|||
params = CustomResetParameters(initialPos=pos, color=color) |
|||
env.reset(custom_reset_parameters=params) |
|||
``` |
|||
|
|||
### Custom observations |
|||
|
|||
By default, Unity returns observations to Python in the form of a floating-point vector. |
|||
|
|||
You can define a custom observation message to supplement that. To do so, add fields to the `CustomObservation` protobuf message in `custom_observation.proto`. |
|||
|
|||
Then in your agent, create an instance of a custom observation via `new CommunicatorObjects.CustomObservation`. Then in `CollectObservations`, call `SetCustomObservation` with the custom observation instance as the parameter. |
|||
|
|||
In Python, the custom observation can be accessed by calling `env.step` or `env.reset` and accessing the `custom_observations` property of the return value. It will contain a list with one `CustomObservation` instance per agent. |
|||
|
|||
For example, if you have added a field called `customField` to the `CustomObservation` message, you would program your agent like |
|||
|
|||
|
|||
```csharp |
|||
class MyAgent : Agent { |
|||
override public void CollectObservations() { |
|||
var obs = new CustomObservation(); |
|||
obs.CustomField = 1.0; |
|||
SetCustomObservation(obs); |
|||
} |
|||
} |
|||
``` |
|||
|
|||
Then in Python, the custom field would be accessed like |
|||
|
|||
```python |
|||
... |
|||
result = env.step(...) |
|||
result[brain_name].custom_observations[0].customField |
|||
``` |
|||
|
|||
where `brain_name` is the name of the brain attached to the agent. |
撰写
预览
正在加载...
取消
保存
Reference in new issue