浏览代码

Merge branch 'development-0.3' into hotfix/issue#333

/develop-generalizationTraining-TrainerController
Vincent Gao 7 年前
当前提交
1bc43933
共有 27 个文件被更改,包括 1577 次插入380 次删除
  1. 47
      docs/Getting-Started-with-Balance-Ball.md
  2. 13
      docs/Installation.md
  3. 7
      docs/Python-API.md
  4. 72
      docs/Training-PPO.md
  5. 77
      docs/Using-Docker.md
  6. 355
      docs/images/docker_build_settings.png
  7. 4
      python/learn.py
  8. 2
      python/unityagents/environment.py
  9. 1
      python/unitytrainers/trainer_controller.py
  10. 205
      unity-environment/Assets/ML-Agents/Editor/MLAgentsEditModeTest.cs
  11. 7
      unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs
  12. 6
      unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs
  13. 6
      unity-environment/Assets/ML-Agents/Examples/Area/Scripts/AreaAgent.cs
  14. 6
      unity-environment/Assets/ML-Agents/Examples/Area/Scripts/Push/PushAgent.cs
  15. 6
      unity-environment/Assets/ML-Agents/Examples/Area/Scripts/Wall/WallAgent.cs
  16. 4
      unity-environment/Assets/ML-Agents/Examples/Banana/Scripts/BananaAgent.cs
  17. 12
      unity-environment/Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs
  18. 8
      unity-environment/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs
  19. 45
      unity-environment/Assets/ML-Agents/Examples/Crawler/Scripts/CrawlerAgentConfigurable.cs
  20. 4
      unity-environment/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs
  21. 4
      unity-environment/Assets/ML-Agents/Examples/Hallway/Scripts/HallwayAgent.cs
  22. 12
      unity-environment/Assets/ML-Agents/Examples/Reacher/Scripts/ReacherAgent.cs
  23. 6
      unity-environment/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs
  24. 39
      unity-environment/Assets/ML-Agents/Scripts/Agent.cs
  25. 4
      unity-environment/Assets/ML-Agents/Scripts/ExternalCommunicator.cs
  26. 4
      unity-environment/Assets/ML-Agents/Template/Scripts/TemplateAgent.cs
  27. 1001
      docs/images/unity_linux_build_support.png

47
docs/Getting-Started-with-Balance-Ball.md


![Balance Ball](images/balance.png)
This walkthrough uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains
a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
horizontally or vertically. In this environment, a platform is an **agent**
that receives a reward for every step that it balances the ball. An agent is

training process just learns what values are better given particular state
observations based on the rewards received when it tries different values).
For example, an element might represent a force or torque applied to a
Rigidbody in the agent. The **Discrete** action vector space defines its
`RigidBody` in the agent. The **Discrete** action vector space defines its
actions as a table. A specific action given to the agent is an index into
this table.

OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
explaining it.
In order to train the agents within the Ball Balance environment:
1. Open `python/PPO.ipynb` notebook from Jupyter.
2. Set `env_name` to the name of your environment file earlier.
3. (optional) In order to get the best results quickly, set `max_steps` to
50000, set `buffer_size` to 5000, and set `batch_size` to 512. For this
exercise, this will train the model in approximately ~5-10 minutes.
4. (optional) Set `run_path` directory to your choice. When using TensorBoard
to observe the training statistics, it helps to set this to a sequential value
To train the agents within the Ball Balance environment, we will be using the python
package. We have provided a convenient python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
We will pass to this script the path of the environment executable that we just built. (Optionally) We can
use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When
using TensorBoard to observe the training statistics, it helps to set this to a sequential value
5. Run all cells of notebook with the exception of the last one under "Export
the trained Tensorflow graph."
To summarize, go to your command line, enter the `ml-agents` directory and type:
```
python python/learn.py <env_file_path> --run-id=<run-identifier> --train
```
The `--train` flag tells ML-Agents to run in training mode. `env_file_path` should be the path to the Unity executable that was just created.
In order to observe the training process in more detail, you can use
TensorBoard. In your command line, enter into `python` directory and then run :
Once you start training using `learn.py` in the way described in the previous section, the `ml-agents` folder will
contain a `summaries` directory. In order to observe the training process
in more detail, you can use TensorBoard. From the command line run :
`tensorboard --logdir=summaries`

### Embedding the trained model into Unity
1. Run the final cell of the notebook under "Export the trained TensorFlow
graph" to produce an `<env_name >.bytes` file.
2. Move `<env_name>.bytes` from `python/models/ppo/` into
1. The trained model is stored in `models/<run-identifier>` in the `ml-agents` folder. Once the
training is complete, there will be a `<env_name>.bytes` file in that location where `<env_name>` is the name
of the executable used during training.
2. Move `<env_name>.bytes` from `python/models/ppo/` into
`unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
3. Open the Unity Editor, and select the `3DBall` scene as described above.
4. Select the `Ball3DBrain` object from the Scene hierarchy.

13
docs/Installation.md


## Install **Unity 2017.1** or Later
[Download](https://store.unity.com/download) and install Unity.
[Download](https://store.unity.com/download) and install Unity. If you would
like to use our Docker set-up (introduced later), make sure to select the
_Linux Build Support_ component when installing Unity.
<p align="center">
<img src="images/unity_linux_build_support.png"
alt="Linux Build Support"
width="500" border="10" />
</p>
## Clone the ml-agents Repository

pip3 install .
## Docker-based Installation _[Experimental]_
## Docker-based Installation (Experimental)
If you'd like to use Docker for ML-Agents, please follow
[this guide](Using-Docker.md).

[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
make sure to cite relevant information on OS, Python version, and exact error
message (whenever possible).

7
docs/Python-API.md


## Loading a Unity Environment
Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. In python, run:
Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. If your filename is 3DBall.app, in python, run:
env = UnityEnvironment(file_name=filename, worker_id=0)
env = UnityEnvironment(file_name="3DBall", worker_id=0)
* `file_name` is the name of the environment binary (located in the root directory of the python project).
* `file_name` is the name of the environment binary (located in the root directory of the python project).
* `worker_id` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.
## Interacting with a Unity Environment

72
docs/Training-PPO.md


# Training with Proximal Policy Optimization
This document is still to be written. Refer to [Getting Started with the Balance Ball Environment](Getting-Started-with-Balance-Ball.md) for a walk-through of the PPO training process.
This section is still to be written. Refer to [Getting Started with the Balance Ball Environment](Getting-Started-with-Balance-Ball.md) for a walk-through of the PPO training process.
## Best Practices when training with PPO

### Hyperparameters
#### Batch Size
`batch_size` corresponds to how many experiences are used for each gradient descent update. This should always be a fraction
of the `buffer_size`. If you are using a continuous action space, this value should be large (in 1000s). If you are using a discrete action space, this value should be smaller (in 10s).
Typical Range (Continuous): `512` - `5120`
Typical Range (Discrete): `32` - `512`
#### Buffer Size
`buffer_size` corresponds to how many experiences (agent observations, actions and rewards obtained) should be collected before we do any
learning or updating of the model. **This should be a multiple of `batch_size`**. Typically larger `buffer_size` correspond to more stable training updates.
#### Beta (Used only in Discrete Control)
`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that discrete action space agents properly explore during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`.
Typical Range: `1e-4` - `1e-2`
Typical Range: `2048` - `409600`
#### Buffer Size
#### Batch Size
`buffer_size` corresponds to how many experiences should be collected before gradient descent is performed on them all.
This should be a multiple of `batch_size`. Typically larger buffer sizes correspond to more stable training updates.
`batch_size` is the number of experiences used for one iteration of a gradient descent update. **This should always be a fraction of the
`buffer_size`**. If you are using a continuous action space, this value should be large (in the order of 1000s). If you are using a discrete action space, this value
should be smaller (in order of 10s).
Typical Range: `2048` - `409600`
Typical Range (Continuous): `512` - `5120`
#### Epsilon
Typical Range (Discrete): `32` - `512`
`epsilon` corresponds to the acceptable threshold of divergence between the old and new policies during gradient descent updating. Setting this value small will result in more stable updates, but will also slow the training process.
Typical Range: `0.1` - `0.3`
#### Number of Epochs
#### Hidden Units
`num_epoch` is the number of passes through the experience buffer during gradient descent. The larger the `batch_size`, the
larger it is acceptable to make this. Decreasing this will ensure more stable updates, at the cost of slower learning.
`hidden_units` correspond to how many units are in each fully connected layer of the neural network. For simple problems
where the correct action is a straightforward combination of the observation inputs, this should be small. For problems where
the action is a very complex interaction between the observation variables, this should be larger.
Typical Range: `3` - `10`
Typical Range: `32` - `512`
#### Learning Rate

Typical Range: `1e-5` - `1e-3`
#### Number of Epochs
`num_epoch` is the number of passes through the experience buffer during gradient descent. The larger the batch size, the
larger it is acceptable to make this. Decreasing this will ensure more stable updates, at the cost of slower learning.
Typical Range: `3` - `10`
#### Time Horizon

#### Max Steps
`max_steps` corresponds to how many steps of the simulation (multiplied by frame-skip) are run durring the training process. This value should be increased for more complex problems.
`max_steps` corresponds to how many steps of the simulation (multiplied by frame-skip) are run during the training process. This value should be increased for more complex problems.
Typical Range: `5e5` - `1e7`
Typical Range: `5e5 - 1e7`
#### Beta
`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that agents properly explore the action space during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`.
Typical Range: `1e-4` - `1e-2`
#### Epsilon
`epsilon` corresponds to the acceptable threshold of divergence between the old and new policies during gradient descent updating. Setting this value small will result in more stable updates, but will also slow the training process.
Typical Range: `0.1` - `0.3`
#### Normalize

fewer layers are likely to train faster and more efficiently. More layers may be necessary for more complex control problems.
Typical range: `1` - `3`
#### Hidden Units
`hidden_units` correspond to how many units are in each fully connected layer of the neural network. For simple problems
where the correct action is a straightforward combination of the observation inputs, this should be small. For problems where
the action is a very complex interaction between the observation variables, this should be larger.
Typical Range: `32` - `512`
### Training Statistics

77
docs/Using-Docker.md


# Using Docker For ML Agents (Experimental)
# Using Docker For ML-Agents (Experimental)
We are currently offering an experimental solution for Windows and Mac users who would like to do training or inference using Docker. This option may be appealing to users who would like to avoid dealing with Python and TensorFlow installation on their host machines. This setup currently forces both TensorFlow and Unity to rely on _only_ the CPU for computation purposes. As such, we currently only support training using environments that only contain agents which use vector observations, rather than camera-based visual observations. For example, the [GridWorld](Example-Environments.md#gridworld) environment which use visual observations for training is not supported.
We currently offer an experimental solution for Windows and Mac users who would like to do training or inference using Docker. This option may be appealing to those who would like to avoid installing Python and TensorFlow themselves. The current setup forces both TensorFlow and Unity to _only_ rely on the CPU for computations. Consequently, our Docker support is limited to environments whose agents **do not** use camera-based visual observations. For example, the [GridWorld](Learning-Environment-Examples.md#gridworld) environment is **not** supported.
- Unity Linux Standalone Player ([Link](https://unity3d.com/get-unity/download?ref=professional&_ga=2.161111422.259506921.1519336396-1357272041.1488299149))
- Docker ([Link](https://www.docker.com/community-edition#/download))
- Unity _Linux Build Support_ Component
- [Docker](https://www.docker.com)
- Install Docker (see link above) if you don't have it setup on your machine.
- [Download](https://unity3d.com/get-unity/download) the Unity Installer and
add the _Linux Build Support_ Component
- Since Docker runs a container in an environment that is isolated from the host machine, we will be using a mounted directory, e.g. `unity-volume` in your host machine in order to share data, e.g. the Unity executable, curriculum files and tensorflow graph.
- [Download](https://www.docker.com/community-edition#/download) and
install Docker if you don't have it setup on your machine.
- Since Docker runs a container in an environment that is isolated from the host machine, a mounted directory in your host machine is used to share data, e.g. the Unity executable, curriculum files and tensorflow graph. For convenience, we created an empty `unity-volume` directory at the root of the repository for this purpose, but feel free to use any other directory. The remainder of this guide assumes that the `unity-volume` directory is the one used.
- Docker typically runs a container sharing a (linux) kernel with the host machine, this means that the
Unity environment **has** to be built for the **linux platform**. From the Build Settings Window, please select the architecture to be `x86_64` and choose the build to be `headless` (_This is important because we are running it in a container that does not have graphics drivers installed_).
Save the generated environment in the directory to be mounted (e.g. we have conveniently created an empty directory called at the top level `unity-volume`).
Using Docker for ML-Agents involves three steps: building the Unity environment with specific flags, building a Docker container and, finally, running the container. If you are not familiar with building a Unity environment for ML-Agents, please read through our [Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md) guide first.
### Build the Environment
Since Docker typically runs a container sharing a (linux) kernel with the host machine, the
Unity environment **has** to be built for the **linux platform**. When building a Unity environment, please select the following options from the the Build Settings window:
- Set the _Target Platform_ to `Linux`
- Set the _Architecture_ to `x86_64`
- `Uncheck` the _Development Build_ option
- `Check` the _Headless Mode_ option. (_This is required because the Unity binary will run in a container that does not have graphics drivers installed_.)
- Ensure that `unity-volume/<environment-name>.x86_64` and `unity-volume/environment-name_Data`. So for example, `<environment_name>` might be `3Dball` and you might want to ensure that `unity-volume/3Dball.x86_64` and `unity-volume/3Dball_Data` are both present in the directory `unity-volume`.
Then click `Build`, pick an environment name (e.g. `3DBall`) and set the output directory to `unity-volume`. After building, ensure that the file `<environment-name>.x86_64` and subdirectory `<environment-name>_Data/` are created under `unity-volume`.
### Build the Docker Container
- Make sure the docker engine is running on your machine, then build the docker container by running `docker build -t <image_name> .` . in the top level of the source directory. Replace `<image_name>` by the name of the image that you want to use, e.g. `balance.ball.v0.1`.
First, make sure the Docker engine is running on your machine. Then build the Docker container by calling the following command at the top-level of the repository:
```
docker build -t <image-name> .
```
Replace `<image-name>` with a name for the Docker image, e.g. `balance.ball.v0.1`.
- Run the container:
### Run the Docker Container
Run the Docker container by calling the following command at the top-level of the repository:
<image-name>:latest <environment-name> \
--docker-target-name=unity-volume \
--train --run-id=<run-id>
<image-name>:latest <environment-name> \
--docker-target-name=unity-volume \
--train --run-id=<run-id>
For the `3DBall` environment, for example this would be:
Notes on argument values:
- `<image-name>` and `<environment-name>`: References the image and environment names, respectively.
- `source`: Reference to the path in your host OS where you will store the Unity executable.
- `target`: Tells Docker to mount the `source` path as a disk with this name.
- `docker-target-name`: Tells the ML-Agents Python package what the name of the disk where it can read the Unity executable and store the graph. **This should therefore be identical to `target`.**
- `train`, `run-id`: ML-Agents arguments passed to `learn.py`. `train` trains the algorithm, `run-id` is used to tag each experiment with a unique identifier.
- Run the container:
For the `3DBall` environment, for example this would be:
balance.ball.v0.1:latest 3Dball \
--docker-target-name=unity-volume \
--train --run-id=<run-id>
balance.ball.v0.1:latest 3Dball \
--docker-target-name=unity-volume \
--train --run-id=3dball_first_trial
**Notes on argument values**
- `source` : Reference to the path in your host OS where you will store the Unity executable.
- `target`: Tells docker to mount the `source` path as a disk with this name.
- `docker-target-name`: Tells the ML-Agents python package what the name of the disk where it can read the Unity executable and store the graph.**This should therefore be identical to the `target`.**
- `train`, `run-id`: ML-Agents arguments passed to `learn.py`. `train` trains the algorithm, `run-id` is used to tag each experiment with a unique id.
For more details on docker mounts, look at [these](https://docs.docker.com/storage/bind-mounts/) docs from Docker.
For more detail on Docker mounts, check out [these](https://docs.docker.com/storage/bind-mounts/) docs from Docker.

355
docs/images/docker_build_settings.png

之前 之后
宽度: 631  |  高度: 597  |  大小: 67 KiB

4
python/learn.py


logger = logging.getLogger("unityagents")
_USAGE = '''
Usage:
learn (<env>) [options]
learn (<env>) [options]
learn --help
--help Show this message.
--curriculum=<file> Curriculum json file for environment [default: None].
--keep-checkpoints=<n> How many model checkpoints to keep [default: 5].
--lesson=<n> Start learning from this lesson [default: 0].

2
python/unityagents/environment.py


for i in range(self._num_brains):
self._brains[self._brain_names[i]] = BrainParameters(self._brain_names[i], p["brainParameters"][i])
self._loaded = True
logger.info("\n'{}' started successfully!".format(self._academy_name))
logger.info("\n'{0}' started successfully!\n{1}".format(self._academy_name, str(self)))
if self._num_external_brains == 0:
logger.warning(" No External Brains found in the Unity Environment. "
"You will not be able to pass actions to your agent(s).")

1
python/unitytrainers/trainer_controller.py


tf.set_random_seed(self.seed)
self.env = UnityEnvironment(file_name=env_path, worker_id=self.worker_id,
curriculum=self.curriculum_file, seed=self.seed)
self.logger.info(str(self.env))
self.env_name = os.path.basename(os.path.normpath(env_path)) # Extract out name of environment
def _get_progress(self):

205
unity-environment/Assets/ML-Agents/Editor/MLAgentsEditModeTest.cs


collectObservationsCalls += 1;
}
public override void AgentAction(float[] act)
public override void AgentAction(float[] vetorAction, string textAction)
{
agentActionCalls += 1;
AddReward(0.1f);

}
}
// This is an empty class for testing the behavior of agents and academy
// It is left empty because we are not testing any brain behavior
// TODO : Mock a brain
}

BindingFlags.Instance | BindingFlags.NonPublic);
AcademyInitializeMethod.Invoke(aca, new object[] { });
Assert.AreEqual(1, aca.initializeAcademyCalls);
Assert.AreEqual(1, aca.episodeCount);
Assert.AreEqual(0, aca.episodeCount);
Assert.AreEqual(1, aca.academyResetCalls);
Assert.AreEqual(0, aca.academyResetCalls);
Assert.AreEqual(0, aca.AcademyStepCalls);
}

Assert.AreEqual(false, agent1.IsDone());
Assert.AreEqual(false, agent2.IsDone());
// agent1 was not enabled when the academy started
// The agents have been initialized
Assert.AreEqual(1, agent2.agentResetCalls);
Assert.AreEqual(0, agent2.agentResetCalls);
Assert.AreEqual(1, agent1.initializeAgentCalls);
Assert.AreEqual(1, agent2.initializeAgentCalls);
Assert.AreEqual(0, agent1.agentActionCalls);

MethodInfo AcademyStepMethod = typeof(Academy).GetMethod("_AcademyStep",
BindingFlags.Instance | BindingFlags.NonPublic);
int numberReset = 0;
Assert.AreEqual(1, aca.episodeCount);
Assert.AreEqual(numberReset, aca.episodeCount);
Assert.AreEqual(1, aca.academyResetCalls);
Assert.AreEqual(numberReset, aca.academyResetCalls);
// The reset happens at the begining of the first step
if (i == 0)
{
numberReset += 1;
}
}
}

AgentEnableMethod.Invoke(agent1, new object[] { aca });
AcademyInitializeMethod.Invoke(aca, new object[] { });
AgentEnableMethod.Invoke(agent2, new object[] { aca });
int numberAgent1Reset = 0;
int numberAgent2Initialization = 0;
Assert.AreEqual(1, agent1.agentResetCalls);
Assert.AreEqual(0, agent2.agentResetCalls);
Assert.AreEqual(numberAgent1Reset, agent1.agentResetCalls);
// Agent2 is never reset since intialized after academy
Assert.AreEqual(0, agent2.agentResetCalls);
Assert.AreEqual(1, agent2.initializeAgentCalls);
Assert.AreEqual(numberAgent2Initialization, agent2.initializeAgentCalls);
if (i % 3 == 0)
// Agent 1 resets at the first step
if (i == 0)
numberAgent1Reset += 1;
}
//Agent 2 is only initialized at step 2
if (i == 2)
{
AgentEnableMethod.Invoke(agent2, new object[] { aca });
numberAgent2Initialization += 1;
}
// We are testing request decision and request actions when called
// at different intervals
if ((i % 3 == 0) && (i > 2))
{
//Every 3 steps after agent 2 is initialized, request decision
else if (i % 5 == 0)
else if ((i % 5 == 0) && (i > 2))
// Every 5 steps after agent 2 is initialized, request action
requestAction += 1;
agent2.RequestAction();
}

MethodInfo AcademyStepMethod = typeof(Academy).GetMethod("_AcademyStep",
BindingFlags.Instance | BindingFlags.NonPublic);
int numberReset = 1;
int numberReset = 0;
int stepsSinceReset = 0;
for (int i = 0; i < 50; i++)
{

Assert.AreEqual(false, aca.IsDone());
Assert.AreEqual(numberReset, aca.academyResetCalls);
Assert.AreEqual(i, aca.AcademyStepCalls);
// Academy resets at the first step
if (i == 0)
{
numberReset += 1;
}
if (i % 5 == 3)
// Regularly set the academy to done to check behavior
if (i % 5 == 3)
{
aca.Done();
numberReset += 1;

AgentEnableMethod.Invoke(agent2, new object[] { aca });
AcademyInitializeMethod.Invoke(aca, new object[] { });
AgentEnableMethod.Invoke(agent1, new object[] { aca });
int numberAgent1Reset = 0; // Agent1 was not enabled at Academy start
int numberAgent2Reset = 1;
int numberAcaReset = 1;
int numberAgent1Reset = 0;
int numberAgent2Reset = 0;
int numberAcaReset = 0;
int acaStepsSinceReset = 0;
int agent1StepSinceReset =0;
int agent2StepSinceReset=0;

Assert.AreEqual(numberAgent1Reset, agent1.agentResetCalls);
Assert.AreEqual(numberAgent2Reset, agent2.agentResetCalls);
acaStepsSinceReset += 1;
agent1StepSinceReset += 1;
agent2StepSinceReset += 1;
// Agent 2 and academy reset at the first step
if (i == 0)
{
numberAcaReset += 1;
numberAgent2Reset += 1;
}
//Agent 1 is only initialized at step 2
if (i == 2)
{
AgentEnableMethod.Invoke(agent1, new object[] { aca });
if (i % 100 == 3)
}
// Reset Academy every 100 steps
if (i % 100 == 3)
acaStepsSinceReset = 1;
acaStepsSinceReset = 0;
if (i % 11 == 5)
// Set agent 1 to done every 11 steps to test behavior
if (i % 11 == 5)
if (i % 13 == 3)
// Reseting agent 2 regularly
if (i % 13 == 3)
{
if (!(agent2.IsDone()||aca.IsDone()))
{

numberAgent2Reset += 1;
agent2StepSinceReset = 1;
agent2StepSinceReset = 0;
if (i % 3 == 2)
// Request a decision for agent 2 regularly
if (i % 3 == 2)
else if (i % 5 == 1)
else if (i % 5 == 1)
// Request an action without decision regularly
if (agent1.IsDone() && (((acaStepsSinceReset+1) % agent1.agentParameters.numberOfActionsBetweenDecisions==0)) || aca.IsDone())
if (agent1.IsDone() && (((acaStepsSinceReset) % agent1.agentParameters.numberOfActionsBetweenDecisions==0)) || aca.IsDone())
agent1StepSinceReset = 1;
agent1StepSinceReset = 0;
agent2StepSinceReset = 1;
agent2StepSinceReset = 0;
acaStepsSinceReset += 1;
agent1StepSinceReset += 1;
agent2StepSinceReset += 1;
//Agent 1 is only initialized at step 2
if (i < 2)
{
agent1StepSinceReset = 0;
}
AcademyStepMethod.Invoke((object)aca, new object[] { });

FieldInfo maxStep = typeof(Academy).GetField("maxSteps", BindingFlags.Instance | BindingFlags.NonPublic);
maxStep.SetValue((object)aca, 20);
int numberReset = 1;
int numberReset = 0;
Assert.AreEqual(false, aca.IsDone());
Assert.AreEqual(i, aca.AcademyStepCalls);
Assert.AreEqual(false, aca.IsDone());
Assert.AreEqual(i, aca.AcademyStepCalls);
if ((i % 20 == 0) && (i>0))
// Make sure max step is reached every 20 steps
if (i % 20 == 0)
{
numberReset += 1;
stepsSinceReset = 1;

AgentEnableMethod.Invoke(agent2, new object[] { aca });
AcademyInitializeMethod.Invoke(aca, new object[] { });
AgentEnableMethod.Invoke(agent1, new object[] { aca });
int numberAgent1Reset = 0; // Agent1 was not enabled at Academy start
int numberAgent2Reset = 1;
int numberAcaReset = 1;
int numberAgent1Reset = 0;
int numberAgent2Reset = 0;
int numberAcaReset = 0;
int acaStepsSinceReset = 0;
int agent1StepSinceReset = 0;
int agent2StepSinceReset = 0;

Assert.AreEqual(acaStepsSinceReset, aca.stepsSinceReset);
Assert.AreEqual(1, aca.initializeAcademyCalls);
Assert.AreEqual(numberAcaReset, aca.episodeCount);
Assert.AreEqual(numberAcaReset, aca.academyResetCalls);
Assert.AreEqual(numberAcaReset, aca.episodeCount);
Assert.AreEqual(numberAcaReset, aca.academyResetCalls);
agent2.RequestDecision(); // we request a decision at each step
acaStepsSinceReset += 1;
agent1StepSinceReset += 1;
agent2StepSinceReset += 1;
//At the first step, Academy and agent 2 reset
if (i == 0)
{
numberAcaReset += 1;
numberAgent2Reset += 1;
}
//Agent 1 is only initialized at step 2
if (i == 2)
{
AgentEnableMethod.Invoke(agent1, new object[] { aca });
}
// we request a decision at each step
agent2.RequestDecision();
if (i % 100 == 0)
// Make sure the academy max steps at 100
if (i % 100 == 0)
acaStepsSinceReset = 1;
agent1StepSinceReset = 1;
agent2StepSinceReset = 1;
acaStepsSinceReset = 0;
agent1StepSinceReset = 0;
agent2StepSinceReset = 0;
numberAcaReset += 1;
numberAgent1Reset += 1;
numberAgent2Reset += 1;

if ((i % 100) % 21 == 0)
//Make sure the agents reset when their max steps is reached
if (agent1StepSinceReset % 21 == 0)
agent1StepSinceReset = 1;
agent1StepSinceReset = 0;
if ((i % 100) % 31 == 0)
if (agent2StepSinceReset % 31 == 0)
agent2StepSinceReset = 1;
agent2StepSinceReset = 0;
acaStepsSinceReset += 1;
agent1StepSinceReset += 1;
agent2StepSinceReset += 1;
//Agent 1 is only initialized at step 2
if (i < 2)
{
agent1StepSinceReset = 0;
}
}
}

brain.brainParameters = new BrainParameters();
// We use event based so the agent will now try to send anything to the brain
agent1.agentParameters.onDemandDecision = false;
// agent1 will take an action at every step and request a decision every steps
// agent1 will take an action at every step and request a decision every 2 steps
agent2.agentParameters.onDemandDecision = true;
agent2.agentParameters.onDemandDecision = true;
//Here we specify that the agent does not reset when done
agent2.agentParameters.resetOnDone = false; // Here we specify that the agent does not reset when done
agent2.agentParameters.resetOnDone = false;
brain.brainParameters.vectorObservationSize = 0;
brain.brainParameters.cameraResolutions = new resolution[0];
agent1.GiveBrain(brain);

Assert.AreEqual(agent1ResetOnDone, agent1.agentOnDoneCalls);
Assert.AreEqual(agent2ResetOnDone, agent2.agentOnDoneCalls);
agent2.RequestDecision(); // we request a decision at each step
// we request a decision at each step
agent2.RequestDecision();
acaStepsSinceReset += 1;
if (agent1ResetOnDone ==0)
agent1StepSinceReset += 1;

7
unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs


}
public override void AgentAction(float[] act)
public override void AgentAction(float[] vectorAction, string textAction)
float action_z = 2f * Mathf.Clamp(act[0], -1f, 1f);
float action_z = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
float action_x = 2f * Mathf.Clamp(act[1], -1f, 1f);
float action_x = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
if ((gameObject.transform.rotation.x < 0.25f && action_x > 0f) ||
(gameObject.transform.rotation.x > -0.25f && action_x < 0f))
{

Done();
SetReward(-1f);
}
}

6
unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs


AddVectorObs((ball.transform.position.z - gameObject.transform.position.z));
}
public override void AgentAction(float[] act)
public override void AgentAction(float[] vectorAction, string textAction)
float action_z = 2f * Mathf.Clamp(act[0], -1f, 1f);
float action_z = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
float action_x = 2f * Mathf.Clamp(act[1], -1f, 1f);
float action_x = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
if ((gameObject.transform.rotation.x < 0.25f && action_x > 0f) ||
(gameObject.transform.rotation.x > -0.25f && action_x < 0f))
{

6
unity-environment/Assets/ML-Agents/Examples/Area/Scripts/AreaAgent.cs


}
}
public override void AgentAction(float[] act)
{
public override void AgentAction(float[] vectorAction, string textAction)
{
MoveAgent(act);
MoveAgent(vectorAction);
if (gameObject.transform.position.y < 0.0f || Mathf.Abs(gameObject.transform.position.x - area.transform.position.x) > 8f ||
Mathf.Abs(gameObject.transform.position.z + 5 - area.transform.position.z) > 8)

6
unity-environment/Assets/ML-Agents/Examples/Area/Scripts/Push/PushAgent.cs


}
public override void AgentAction(float[] act)
{
public override void AgentAction(float[] vectorAction, string textAction)
{
MoveAgent(act);
MoveAgent(vectorAction);
if (gameObject.transform.position.y < 0.0f || Mathf.Abs(gameObject.transform.position.x - area.transform.position.x) > 8f ||
Mathf.Abs(gameObject.transform.position.z + 5 - area.transform.position.z) > 8)

6
unity-environment/Assets/ML-Agents/Examples/Area/Scripts/Wall/WallAgent.cs


AddVectorObs(blockVelocity.z);
}
public override void AgentAction(float[] act)
{
public override void AgentAction(float[] vectorAction, string textAction)
{
MoveAgent(act);
MoveAgent(vectorAction);
if (gameObject.transform.position.y < 0.0f ||
Mathf.Abs(gameObject.transform.position.x - area.transform.position.x) > 8f ||

4
unity-environment/Assets/ML-Agents/Examples/Banana/Scripts/BananaAgent.cs


public override void AgentAction(float[] act)
public override void AgentAction(float[] vectorAction, string textAction)
MoveAgent(act);
MoveAgent(vectorAction);
}
public override void AgentReset()

12
unity-environment/Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs


AddVectorObs(position);
}
public override void AgentAction(float[] act)
{
float movement = act[0];
int direction = 0;
if (movement == 0) { direction = -1; }
if (movement == 1) { direction = 1; }
public override void AgentAction(float[] vectorAction, string textAction)
{
float movement = vectorAction[0];
int direction = 0;
if (movement == 0) { direction = -1; }
if (movement == 1) { direction = 1; }
position += direction;
if (position < minPosition) { position = minPosition; }

8
unity-environment/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs


AddVectorObs(banana.transform.position.z / 25f);
}
public override void AgentAction(float[] act)
{
float x = Mathf.Clamp(act[0], -1, 1);
float z = Mathf.Clamp(act[1], -1, 1);
public override void AgentAction(float[] vectorAction, string textAction)
{
float x = Mathf.Clamp(vectorAction[0], -1, 1);
float z = Mathf.Clamp(vectorAction[1], -1, 1);
rb.velocity = new Vector3(x, 0, z) ;
if (rb.velocity.magnitude < 0.01f){
AddReward(-1);

45
unity-environment/Assets/ML-Agents/Examples/Crawler/Scripts/CrawlerAgentConfigurable.cs


}
}
public override void AgentAction(float[] act)
public override void AgentAction(float[] vectorAction, string textAction)
for (int k = 0; k < act.Length; k++)
for (int k = 0; k < vectorAction.Length; k++)
act[k] = Mathf.Clamp(act[k], -1f, 1f);
vectorAction[k] = Mathf.Clamp(vectorAction[k], -1f, 1f);
limbRBs[0].AddTorque(-limbs[0].transform.right * strength * act[0]);
limbRBs[1].AddTorque(-limbs[1].transform.right * strength * act[1]);
limbRBs[2].AddTorque(-limbs[2].transform.right * strength * act[2]);
limbRBs[3].AddTorque(-limbs[3].transform.right * strength * act[3]);
limbRBs[0].AddTorque(-body.transform.up * strength * act[4]);
limbRBs[1].AddTorque(-body.transform.up * strength * act[5]);
limbRBs[2].AddTorque(-body.transform.up * strength * act[6]);
limbRBs[3].AddTorque(-body.transform.up * strength * act[7]);
limbRBs[4].AddTorque(-limbs[4].transform.right * strength * act[8]);
limbRBs[5].AddTorque(-limbs[5].transform.right * strength * act[9]);
limbRBs[6].AddTorque(-limbs[6].transform.right * strength * act[10]);
limbRBs[7].AddTorque(-limbs[7].transform.right * strength * act[11]);
limbRBs[0].AddTorque(-limbs[0].transform.right * strength * vectorAction[0]);
limbRBs[1].AddTorque(-limbs[1].transform.right * strength * vectorAction[1]);
limbRBs[2].AddTorque(-limbs[2].transform.right * strength * vectorAction[2]);
limbRBs[3].AddTorque(-limbs[3].transform.right * strength * vectorAction[3]);
limbRBs[0].AddTorque(-body.transform.up * strength * vectorAction[4]);
limbRBs[1].AddTorque(-body.transform.up * strength * vectorAction[5]);
limbRBs[2].AddTorque(-body.transform.up * strength * vectorAction[6]);
limbRBs[3].AddTorque(-body.transform.up * strength * vectorAction[7]);
limbRBs[4].AddTorque(-limbs[4].transform.right * strength * vectorAction[8]);
limbRBs[5].AddTorque(-limbs[5].transform.right * strength * vectorAction[9]);
limbRBs[6].AddTorque(-limbs[6].transform.right * strength * vectorAction[10]);
limbRBs[7].AddTorque(-limbs[7].transform.right * strength * vectorAction[11]);
float torque_penalty = act[0] * act[0] + act[1] * act[1] + act[2] * act[2] + act[3] * act[3]
+ act[4] * act[4] + act[5] * act[5] + act[6] * act[6] + act[7] * act[7]
+ act[8] * act[8] + act[9] * act[9] + act[10] * act[10] + act[11] * act[11];
float torque_penalty = vectorAction[0] * vectorAction[0] +
vectorAction[1] * vectorAction[1] +
vectorAction[2] * vectorAction[2] +
vectorAction[3] * vectorAction[3] +
vectorAction[4] * vectorAction[4] +
vectorAction[5] * vectorAction[5] +
vectorAction[6] * vectorAction[6] +
vectorAction[7] * vectorAction[7] +
vectorAction[8] * vectorAction[8] +
vectorAction[9] * vectorAction[9] +
vectorAction[10] * vectorAction[10] +
vectorAction[11] * vectorAction[11];
if (!IsDone())
{

4
unity-environment/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs


}
// to be implemented by the developer
public override void AgentAction(float[] act)
public override void AgentAction(float[] vectorAction, string textAction)
int action = Mathf.FloorToInt(act[0]);
int action = Mathf.FloorToInt(vectorAction[0]);
// 0 - Forward, 1 - Backward, 2 - Left, 3 - Right
Vector3 targetPos = transform.position;

4
unity-environment/Assets/ML-Agents/Examples/Hallway/Scripts/HallwayAgent.cs


agentRB.AddForce(dirToGo * academy.agentRunSpeed, ForceMode.VelocityChange); // GO
}
public override void AgentAction(float[] act)
public override void AgentAction(float[] vectorAction, string textAction)
MoveAgent(act); // perform agent actions
MoveAgent(vectorAction); //perform agent actions
bool fail = false; // did the agent or block get pushed off the edge?
if (!Physics.Raycast(agentRB.position, Vector3.down, 20)) // if the agent has gone over the edge, we done.

12
unity-environment/Assets/ML-Agents/Examples/Reacher/Scripts/ReacherAgent.cs


/// <summary>
/// The agent's four actions correspond to torques on each of the two joints.
/// </summary>
public override void AgentAction(float[] act)
{
public override void AgentAction(float[] vectorAction, string textAction)
{
float torque_x = Mathf.Clamp(act[0], -1, 1) * 100f;
float torque_z = Mathf.Clamp(act[1], -1, 1) * 100f;
float torque_x = Mathf.Clamp(vectorAction[0], -1, 1) * 100f;
float torque_z = Mathf.Clamp(vectorAction[1], -1, 1) * 100f;
torque_x = Mathf.Clamp(act[2], -1, 1) * 100f;
torque_z = Mathf.Clamp(act[3], -1, 1) * 100f;
torque_x = Mathf.Clamp(vectorAction[2], -1, 1) * 100f;
torque_z = Mathf.Clamp(vectorAction[3], -1, 1) * 100f;
rbB.AddTorque(new Vector3(torque_x, 0f, torque_z));
}

6
unity-environment/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs


}
// to be implemented by the developer
public override void AgentAction(float[] act)
public override void AgentAction(float[] vectorAction, string textAction)
moveX = 0.25f * Mathf.Clamp(act[0], -1f, 1f) * invertMult;
if (Mathf.Clamp(act[1], -1f, 1f) > 0f && gameObject.transform.position.y - transform.parent.transform.position.y < -1.5f)
moveX = 0.25f * Mathf.Clamp(vectorAction[0], -1f, 1f) * invertMult;
if (Mathf.Clamp(vectorAction[1], -1f, 1f) > 0f && gameObject.transform.position.y - transform.parent.transform.position.y < -1.5f)
{
moveY = 0.5f;
gameObject.GetComponent<Rigidbody>().velocity = new Vector3(GetComponent<Rigidbody>().velocity.x, moveY * 12f, 0f);

39
unity-environment/Assets/ML-Agents/Scripts/Agent.cs


}
/// <summary>
/// Adds a vector observation.
/// Note that the number of vector observation to add
/// Appends float values to the vector observation.
/// Note that the total number of vector observation added
/// <param name="observation">The float value to add to
/// <param name="observation">The value to add to
internal void AddVectorObs(int observation)
{
_info.vectorObservation.Add((float)observation);
}
internal void AddVectorObs(Vector3 observation)
{
_info.vectorObservation.Add(observation.x);
_info.vectorObservation.Add(observation.y);
_info.vectorObservation.Add(observation.z);
}
internal void AddVectorObs(Vector2 observation)
{
_info.vectorObservation.Add(observation.x);
_info.vectorObservation.Add(observation.y);
}
internal void AddVectorObs(float[] observation)
{
_info.vectorObservation.AddRange(observation);
}
internal void AddVectorObs(List<float> observation)
{
_info.vectorObservation.AddRange(observation);
}
/// <summary>
/// Sets the text observation.
/// </summary>
/// <param name="s">The string the text observation must be set to.</param>
internal void SetTextObs(object s)
{
_info.textObservation = s.ToString();

/// </summary>
/// <param name="action">The action the agent receives
/// from the brain.</param>
public virtual void AgentAction(float[] action)
public virtual void AgentAction(float[] vectorAction, string textAction)
{
}

if ((requestAction) && (brain != null))
{
requestAction = false;
AgentAction(_action.vectorActions);
AgentAction(_action.vectorActions, _action.textActions);
}
if ((stepCounter >= agentParameters.maxStep)

4
unity-environment/Assets/ML-Agents/Scripts/ExternalCommunicator.cs


{
var brainName = brain.gameObject.name;
if (current_agents[brainName].Count() == 0)
{
continue;
}
var memorySize = rMessage.memory[brainName].Count() / current_agents[brainName].Count();
for (int i = 0; i < current_agents[brainName].Count(); i++)

4
unity-environment/Assets/ML-Agents/Template/Scripts/TemplateAgent.cs


}
public override void AgentAction(float[] act)
{
public override void AgentAction(float[] vectorAction, string textAction)
{
}

1001
docs/images/unity_linux_build_support.png
文件差异内容过多而无法显示
查看文件

正在加载...
取消
保存