浏览代码

fix trailing whitespace in markdown (#2786)

/develop-gpu-test
GitHub 5 年前
当前提交
d009511a
共有 48 个文件被更改,包括 483 次插入480 次删除
  1. 3
      .pre-commit-config.yaml
  2. 4
      SURVEY.md
  3. 2
      UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/Barracuda.md
  4. 2
      UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/LICENSE.md
  5. 10
      UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/ReleaseNotes.md
  6. 4
      UnitySDK/README.md
  7. 34
      docs/Basic-Guide.md
  8. 16
      docs/Creating-Custom-Protobuf-Messages.md
  9. 2
      docs/FAQ.md
  10. 8
      docs/Getting-Started-with-Balance-Ball.md
  11. 2
      docs/Glossary.md
  12. 16
      docs/Installation.md
  13. 2
      docs/Learning-Environment-Best-Practices.md
  14. 112
      docs/Learning-Environment-Create-New.md
  15. 80
      docs/Learning-Environment-Design-Agents.md
  16. 16
      docs/Learning-Environment-Design.md
  17. 8
      docs/Learning-Environment-Examples.md
  18. 10
      docs/ML-Agents-Overview.md
  19. 2
      docs/Migrating.md
  20. 2
      docs/Readme.md
  21. 34
      docs/Training-Behavioral-Cloning.md
  22. 4
      docs/Training-Curriculum-Learning.md
  23. 70
      docs/Training-Generalized-Reinforcement-Learning-Agents.md
  24. 20
      docs/Training-ML-Agents.md
  25. 2
      docs/Training-Using-Concurrent-Unity-Instances.md
  26. 2
      docs/Training-on-Amazon-Web-Service.md
  27. 2
      docs/Training-on-Microsoft-Azure-Custom-Instance.md
  28. 2
      docs/Training-on-Microsoft-Azure.md
  29. 22
      docs/Unity-Inference-Engine.md
  30. 8
      docs/Using-Tensorboard.md
  31. 30
      docs/Using-Virtual-Environment.md
  32. 8
      docs/localized/KR/README.md
  33. 40
      docs/localized/KR/docs/Installation-Windows.md
  34. 12
      docs/localized/KR/docs/Installation.md
  35. 34
      docs/localized/KR/docs/Training-Imitation-Learning.md
  36. 48
      docs/localized/KR/docs/Training-PPO.md
  37. 10
      docs/localized/KR/docs/Using-Docker.md
  38. 4
      docs/localized/zh-CN/README.md
  39. 66
      docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
  40. 8
      docs/localized/zh-CN/docs/Installation.md
  41. 24
      docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
  42. 8
      docs/localized/zh-CN/docs/Learning-Environment-Design.md
  43. 2
      docs/localized/zh-CN/docs/Learning-Environment-Examples.md
  44. 60
      docs/localized/zh-CN/docs/ML-Agents-Overview.md
  45. 4
      docs/localized/zh-CN/docs/Readme.md
  46. 92
      gym-unity/README.md
  47. 6
      ml-agents-envs/README.md
  48. 6
      protobuf-definitions/README.md

3
.pre-commit-config.yaml


.*_pb2_grpc.py
)$
additional_dependencies: [flake8-comprehensions]
- id: trailing-whitespace
name: trailing-whitespace-markdown
types: [markdown]
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.4.1 # Use the ref you want to point at

4
SURVEY.md


# Unity ML-Agents Toolkit Survey
Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to let us know about it.
Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to let us know about it.
[Fill out the survey](https://goo.gl/forms/qFMYSYr5TlINvG6f1)
[Fill out the survey](https://goo.gl/forms/qFMYSYr5TlINvG6f1)

2
UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/Barracuda.md


Tanh
```
P.S. some of these operations are under limited support and not all configurations are properly supported
P.S. some of these operations are under limited support and not all configurations are properly supported
P.P.S. Python 3.5 or 3.6 is recommended

2
UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/LICENSE.md


Barracuda cross-platform Neural Net engine copyright © 2018 Unity Technologies ApS
Licensed under the Unity Companion License for Unity-dependent projects--see [Unity Companion License](http://www.unity3d.com/legal/licenses/Unity_Companion_License).
Licensed under the Unity Companion License for Unity-dependent projects--see [Unity Companion License](http://www.unity3d.com/legal/licenses/Unity_Companion_License).
Unless expressly provided otherwise, the Software under this license is made available strictly on an “AS IS” BASIS WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. Please review the license for details on these and other terms and conditions.

10
UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/ReleaseNotes.md


- TF importer: made detection of actual output node from LSTM/GRU pattern more bullet proof by skipping Const nodes.
- TF importer: improved InstanceNormalization handling.
- TF importer: fixed SquareDifference pattern.
- TF importer: fixed Conv2DBackpropInput (transpose convolution) import.
- TF importer: fixed Conv2DBackpropInput (transpose convolution) import.
- Fixed Conv2D performance regression on some GPUs.
- Fixed TextureAsTensorData.Download() to work properly with InterpretDepthAs.Channels.
- Fixed bug when identity/nop layers would reuse input as an output and later causing premature release of that tensor as part of intermediate data cleanup.

## 0.2.0
- Version bumped to 0.2.0 as it brings breaking API changes, for details look below.
- Version bumped to 0.2.0 as it brings breaking API changes, for details look below.
- Significantly reduced temporary memory allocations by introducing internal allocator support. Now memory is re-used between layer execution as much as possible.
- Improved small workload performance on CSharp backend
- Added parallel implementation for multiple activation functions on CSharp backend

- Added `Summary()` method to `Worker`. Currently returns allocator information.
- Tabs to spaces! Aiming at higher salary (https://stackoverflow.blog/2017/06/15/developers-use-spaces-make-money-use-tabs/).
- Renamed worker type enum members: `CSharp` -> `CSharpRef`, `CSharpFast` -> `CSharp`, `Compute` -> `ComputeRef`, `ComputeFast` -> `Compute`.
- Implemented new optimized `ComputePrecompiled` worker. This worker caches Compute kernels and state beforehand to reduce CPU overhead.
- Implemented new optimized `ComputePrecompiled` worker. This worker caches Compute kernels and state beforehand to reduce CPU overhead.
- Added `ExecuteAsync()` to `IWorker` interface, it returns `IEnumerator`, which enables you to control how many layers to schedule per frame (one iteration == one layer).
- Added `Log` op support on Compute workers.
- Optimized activation functions and ScaleBias by accessing tensor as continuous array. Gained ~2.0ms on 4 batch MobileNet (MBP2016).

- Fixed compilation issues on Xbox One.
- TexConv2D support was temporary disabled.
- Barracuda logging now can be configured via static fields of ``Barracuda.D`` class, it allows both disable specific logging levels or just disable stack trace collection (helps with performance when profiling).
- Compute Concat implementation now will fall back to C# implementation instead of throwing exception when unsupported configuration is encountered.
- Fixed several ``ComputeBuffer`` release issues.
- Compute Concat implementation now will fall back to C# implementation instead of throwing exception when unsupported configuration is encountered.
- Fixed several ``ComputeBuffer`` release issues.
- Added constructor for ``Tensor`` that allows to pass in data array.
- Improved Flatten handling in TensorFlow models.
- Added helper func ``ModelLoader.LoadFromStreamingAssets``.

4
UnitySDK/README.md


# Unity ML-Agents SDK
Contains the ML-Agents Unity Project, including
both the core plugin (in `Scripts`), as well as a set
Contains the ML-Agents Unity Project, including
both the core plugin (in `Scripts`), as well as a set
of example environments (in `Examples`).

34
docs/Basic-Guide.md


## Setting up the ML-Agents Toolkit within Unity
In order to use the ML-Agents toolkit within Unity, you first need to change a few
Unity settings.
Unity settings.
1. Launch Unity
2. On the Projects dialog, choose the **Open** option at the top of the window.

## Running a Pre-trained Model
We include pre-trained models for our agents (`.nn` files) and we use the
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models
inside Unity. In this section, we will use the pre-trained model for the
We include pre-trained models for our agents (`.nn` files) and we use the
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models
inside Unity. In this section, we will use the pre-trained model for the
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
3. In the **Project** window, drag the **3DBallLearning** Model located in
3. In the **Project** window, drag the **3DBallLearning** Model located in
4. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** windows now contains **3DBallLearning** as `Model`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
once using the search bar in the Scene Hierarchy.
8. Select the **InferenceDevice** to use for this model (CPU or GPU) on the Agent.
4. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** windows now contains **3DBallLearning** as `Model`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
once using the search bar in the Scene Hierarchy.
8. Select the **InferenceDevice** to use for this model (CPU or GPU) on the Agent.
_Note: CPU is faster for the majority of ML-Agents toolkit generated models_
9. Click the **Play** button and you will see the platforms balance the balls
using the pre-trained model.

### Setting up the environment for training
In order to setup the Agents for Training, you will need to edit the
In order to setup the Agents for Training, you will need to edit the
same `Behavior Parameters`. You can make sure all your agents have the same
same `Behavior Parameters`. You can make sure all your agents have the same
The `Behavior Name` corresponds to the name of the model that will be
The `Behavior Name` corresponds to the name of the model that will be
generated by the training process and is used to select the hyperparameters
from the training configuration file.

16
docs/Creating-Custom-Protobuf-Messages.md


# Creating Custom Protobuf Messages
Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default.
Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default.
## Implementing a Custom Message

By default, the Python API sends actions to Unity in the form of a floating point list and an optional string-valued text action for each agent.
You can define a custom action type, to either replace or augment the default, by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`.
You can define a custom action type, to either replace or augment the default, by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`.
Instances of custom actions are set via the `custom_action` parameter of the `env.step`. An agent receives a custom action by defining a method with the signature:

Below is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk.
Below is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk.
The `custom_action.proto` file looks like:

EAST=2;
WEST=3;
}
float walkAmount = 1;
float walkAmount = 1;
Direction direction = 2;
}
```

### Custom Reset Parameters
By default, you can configure an environment `env` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats.
By default, you can configure an environment `env` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats.
You can also configure the environment reset using a custom protobuf message. To do this, add fields to the `CustomResetParameters` protobuf message in `custom_reset_parameters.proto`, analogously to `CustomAction` above. Then pass an instance of the message to `env.reset` via the `custom_reset_parameters` keyword parameter.

### Custom Observations
By default, Unity returns observations to Python in the form of a floating-point vector.
By default, Unity returns observations to Python in the form of a floating-point vector.
You can define a custom observation message to supplement that. To do so, add fields to the `CustomObservation` protobuf message in `custom_observation.proto`.
You can define a custom observation message to supplement that. To do so, add fields to the `CustomObservation` protobuf message in `custom_observation.proto`.
Then in your agent, create an instance of a custom observation via `new CommunicatorObjects.CustomObservation`. Then in `CollectObservations`, call `SetCustomObservation` with the custom observation instance as the parameter.

var obs = new CustomObservation();
obs.CustomField = 1.0;
SetCustomObservation(obs);
}
}
}
```

2
docs/FAQ.md


There may be a number of possible causes:
* _Cause_: There may be no agent in the scene
* _Cause_: There may be no agent in the scene
* _Cause_: On OSX, the firewall may be preventing communication with the
environment. _Solution_: Add the built environment binary to the list of
exceptions on the firewall by following

8
docs/Getting-Started-with-Balance-Ball.md


"Agent" GameObjects. The base Agent object has a few properties that affect its
behavior:
* **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
* **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
determines how an Agent makes decisions. More on Behavior Parameters in
the next section.
* **Visual Observations** — Defines any Camera objects used by the Agent to

training generalizes to more than a specific starting position and agent cube
attitude.
* agent.CollectObservations() — Called every simulation step. Responsible for
collecting the Agent's observations of the environment. Since the Behavior
collecting the Agent's observations of the environment. Since the Behavior
Parameters of the Agent are set with vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` such that vector size adds up to 8.

negative reward for dropping the ball. An Agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
* agent.Heuristic() - When the `Use Heuristic` checkbox is checked in the Behavior
* agent.Heuristic() - When the `Use Heuristic` checkbox is checked in the Behavior
keyboard inputs into actions.
keyboard inputs into actions.
#### Behavior Parameters : Vector Observation Space

2
docs/Glossary.md


logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given
* **Trainer** - Python class which is responsible for training a given
group of Agents.

16
docs/Installation.md


## Environment Setup
We now support a single mechanism for installing ML-Agents on Mac/Windows/Linux using Virtual
Environments. For more information on Virtual Environments and installation instructions,
Environments. For more information on Virtual Environments and installation instructions,
follow this [guide](Using-Virtual-Environment.md).
### Clone the ML-Agents Toolkit Repository

It also contains many [example environments](Learning-Environment-Examples.md)
to help you get started.
The `ml-agents` subdirectory contains a Python package which provides deep reinforcement
The `ml-agents` subdirectory contains a Python package which provides deep reinforcement
the `ml-agents` package depends on.
the `ml-agents` package depends on.
In order to use ML-Agents toolkit, you need Python 3.6.1 or higher.
In order to use ML-Agents toolkit, you need Python 3.6.1 or higher.
[Download](https://www.python.org/downloads/) and install the latest version of Python if you do not already have it.
If your Python environment doesn't include `pip3`, see these

pip3 install mlagents
```
Note that this will install `ml-agents` from PyPi, _not_ from the cloned repo.
Note that this will install `ml-agents` from PyPi, _not_ from the cloned repo.
parameters you can use with `mlagents-learn`.
parameters you can use with `mlagents-learn`.
By installing the `mlagents` package, the dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
Some of the primary dependencies include:

### Installing for Development
If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install
If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install
the packages from the cloned repo rather than from PyPi. To do this, you will need to install
`ml-agents` and `ml-agents-envs` separately. From the repo's root directory, run:

Running pip with the `-e` flag will let you make changes to the Python files directly and have those
reflected when you run `mlagents-learn`. It is important to install these packages in this order as the
`mlagents` package depends on `mlagents_envs`, and installing it in the other
`mlagents` package depends on `mlagents_envs`, and installing it in the other
order will download `mlagents_envs` from PyPi.
## Next Steps

2
docs/Learning-Environment-Best-Practices.md


lessons which progressively increase in difficulty are presented to the agent
([learn more here](Training-Curriculum-Learning.md)).
* When possible, it is often helpful to ensure that you can complete the task by
using a heuristic to control the agent. To do so, check the `Use Heuristic`
using a heuristic to control the agent. To do so, check the `Use Heuristic`
checkbox on the Agent and implement the `Heuristic()` method on the Agent.
* It is often helpful to make many copies of the agent, and give them the same
`Behavior Name`. In this way the learning process can get more feedback

112
docs/Learning-Environment-Create-New.md


importing the ML-Agents assets into it:
1. Launch the Unity Editor and create a new project named "RollerBall".
2. Make sure that the Scripting Runtime Version for the project is set to use
**.NET 4.x Equivalent** (This is an experimental option in Unity 2017,
2. Make sure that the Scripting Runtime Version for the project is set to use
**.NET 4.x Equivalent** (This is an experimental option in Unity 2017,
4. Drag the `ML-Agents` folder from `UnitySDK/Assets` to the Unity
4. Drag the `ML-Agents` folder from `UnitySDK/Assets` to the Unity
Editor Project window.
Your Unity **Project** window should contain the following assets:

1. In the Unity Project window, double-click the `RollerAcademy` script to open
it in your code editor. (By default new scripts are placed directly in the
**Assets** folder.)
2. In the code editor, add the statement, `using MLAgents;`.
2. In the code editor, add the statement, `using MLAgents;`.
3. Change the base class from `MonoBehaviour` to `Academy`.
4. Delete the `Start()` and `Update()` methods that were added by default.

The default settings for the Academy properties are also fine for this
environment, so we don't need to change anything for the RollerAcademy component
in the Inspector window.
in the Inspector window.
![The Academy properties](images/mlagents-NewTutAcademy.png)

1. In the Unity Project window, double-click the `RollerAgent` script to open it
in your code editor.
2. In the editor, add the `using MLAgents;` statement and then change the base
2. In the editor, add the `using MLAgents;` statement and then change the base
class from `MonoBehaviour` to `Agent`.
3. Delete the `Update()` method, but we will use the `Start()` function, so
leave it alone for now.

this reference, add a public field of type `Transform` to the RollerAgent class.
Public fields of a component in Unity get displayed in the Inspector window,
allowing you to choose which GameObject to use as the target in the Unity
Editor.
Editor.
To reset the Agent's velocity (and later to apply force to move the
agent) we need a reference to the Rigidbody component. A

In our case, the information our Agent collects includes:
* Position of the target.
* Position of the target.
* Position of the Agent itself.
* Position of the Agent itself.
```csharp
AddVectorObs(this.transform.position);

### Rewards
Reinforcement learning requires rewards. Assign rewards in the `AgentAction()`
function. The learning algorithm uses the rewards assigned to the Agent during
function. The learning algorithm uses the rewards assigned to the Agent during
assigned task. In this case, the Agent is given a reward of 1.0 for reaching the
assigned task. In this case, the Agent is given a reward of 1.0 for reaching the
reward of 1.0 and marks the agent as finished by calling the `Done()` method
reward of 1.0 and marks the agent as finished by calling the `Done()` method
on the Agent.
```csharp

## Testing the Environment
It is always a good idea to test your environment manually before embarking on
an extended training run. To do so, you will need to implement the `Heuristic()`
method on the RollerAgent class. This will allow you control the Agent using
direct keyboard control.
an extended training run. To do so, you will need to implement the `Heuristic()`
method on the RollerAgent class. This will allow you control the Agent using
direct keyboard control.
The `Heuristic()` method will look like this :

## Training the Environment
The process is
the same as described in [Training ML-Agents](Training-ML-Agents.md). Note that the
the same as described in [Training ML-Agents](Training-ML-Agents.md). Note that the
pass to the `mlagents-learn` program. Using the default settings specified
pass to the `mlagents-learn` program. Using the default settings specified
RollerAgent takes about 300,000 steps to train. However, you can change the
RollerAgent takes about 300,000 steps to train. However, you can change the
Since this example creates a very simple training environment with only a few inputs
and outputs, using small batch and buffer sizes speeds up the training considerably.
However, if you add more complexity to the environment or change the reward or
observation functions, you might also find that training performs better with different
Since this example creates a very simple training environment with only a few inputs
and outputs, using small batch and buffer sizes speeds up the training considerably.
However, if you add more complexity to the environment or change the reward or
observation functions, you might also find that training performs better with different
**Note:** In addition to setting these hyperparameter values, the Agent
**Note:** In addition to setting these hyperparameter values, the Agent
in this simple environment, speeds up training.
in this simple environment, speeds up training.
To train in the editor, run the following Python command from a Terminal or Console
To train in the editor, run the following Python command from a Terminal or Console
(where `config.yaml` is a copy of `trainer_config.yaml` that you have edited
(where `config.yaml` is a copy of `trainer_config.yaml` that you have edited
**Note:** If you get a `command not found` error when running this command, make sure
that you have followed the *Install Python and mlagents Package* section of the
**Note:** If you get a `command not found` error when running this command, make sure
that you have followed the *Install Python and mlagents Package* section of the
To monitor the statistics of Agent performance during training, use
[TensorBoard](Using-Tensorboard.md).
To monitor the statistics of Agent performance during training, use
[TensorBoard](Using-Tensorboard.md).
In particular, the *cumulative_reward* and *value_estimate* statistics show how
well the Agent is achieving the task. In this example, the maximum reward an
In particular, the *cumulative_reward* and *value_estimate* statistics show how
well the Agent is achieving the task. In this example, the maximum reward an
**Note:** If you use TensorBoard, always increment or change the `run-id`
you pass to the `mlagents-learn` command for each training run. If you use
the same id value, the statistics for multiple runs are combined and become
**Note:** If you use TensorBoard, always increment or change the `run-id`
you pass to the `mlagents-learn` command for each training run. If you use
the same id value, the statistics for multiple runs are combined and become
In many of the [example environments](Learning-Environment-Examples.md), many copies of
In many of the [example environments](Learning-Environment-Examples.md), many copies of
parallelize your RollerBall environment.
parallelize your RollerBall environment.
1. Right-click on your Project Hierarchy and create a new empty GameObject.
Name it TrainingArea.
2. Reset the TrainingArea’s Transform so that it is at (0,0,0) with Rotation (0,0,0)
and Scale (1,1,1).
3. Drag the Floor, Target, and RollerAgent GameObjects in the Hierarchy into the
TrainingArea GameObject.
4. Drag the TrainingArea GameObject, along with its attached GameObjects, into your
1. Right-click on your Project Hierarchy and create a new empty GameObject.
Name it TrainingArea.
2. Reset the TrainingArea’s Transform so that it is at (0,0,0) with Rotation (0,0,0)
and Scale (1,1,1).
3. Drag the Floor, Target, and RollerAgent GameObjects in the Hierarchy into the
TrainingArea GameObject.
4. Drag the TrainingArea GameObject, along with its attached GameObjects, into your
5. You can now instantiate copies of the TrainingArea prefab. Drag them into your scene,
positioning them so that they do not overlap.
5. You can now instantiate copies of the TrainingArea prefab. Drag them into your scene,
positioning them so that they do not overlap.
### Editing the Scripts
### Editing the Scripts
You will notice that in the previous section, we wrote our scripts assuming that our
TrainingArea was at (0,0,0), performing checks such as `this.transform.position.y < 0`
to determine whether our agent has fallen off the platform. We will need to change
this if we are to use multiple TrainingAreas throughout the scene.
You will notice that in the previous section, we wrote our scripts assuming that our
TrainingArea was at (0,0,0), performing checks such as `this.transform.position.y < 0`
to determine whether our agent has fallen off the platform. We will need to change
this if we are to use multiple TrainingAreas throughout the scene.
A quick way to adapt our current code is to use
localPosition rather than position, so that our position reference is in reference
to the prefab TrainingArea's location, and not global coordinates.
A quick way to adapt our current code is to use
localPosition rather than position, so that our position reference is in reference
to the prefab TrainingArea's location, and not global coordinates.
This is only one way to achieve this objective. Refer to the
This is only one way to achieve this objective. Refer to the
[example environments](Learning-Environment-Examples.md) for other ways we can achieve relative positioning.
## Review: Scene Layout

There are two kinds of game objects you need to include in your scene in order
to use Unity ML-Agents: an Academy and one or more Agents.
to use Unity ML-Agents: an Academy and one or more Agents.
Keep in mind:

80
docs/Learning-Environment-Design-Agents.md


The Policy class abstracts out the decision making logic from the Agent itself so
that you can use the same Policy in multiple Agents. How a Policy makes its
decisions depends on the kind of Policy it is. You can change the Policy of an
Agent by changing its `Behavior Parameters`. If you check `Use Heuristic`, the
Agent will use its `Heuristic()` method to make decisions which can allow you to
decisions depends on the kind of Policy it is. You can change the Policy of an
Agent by changing its `Behavior Parameters`. If you check `Use Heuristic`, the
Agent will use its `Heuristic()` method to make decisions which can allow you to
## Decisions
The observation-decision-action-reward cycle repeats after a configurable number

agent in a robotic simulator that must provide fine-control of joint torques
should make its decisions every step of the simulation. On the other hand, an
agent that only needs to make decisions when certain game or simulation events
occur, should use on-demand decision making.
occur, should use on-demand decision making.
To control the frequency of step-based decision making, set the **Decision
Frequency** value for the Agent object in the Unity Inspector window. Agents

When you turn on **On Demand Decisions** for an Agent, your agent code must call
the `Agent.RequestDecision()` function. This function call starts one iteration
of the observation-decision-action-reward cycle. The Agent's
`CollectObservations()` method is called, the Policy makes a decision and
`CollectObservations()` method is called, the Policy makes a decision and
returns it by calling the
`AgentAction()` method. The Policy waits for the Agent to request the next
decision before starting another iteration.

When you use vector observations for an Agent, implement the
`Agent.CollectObservations()` method to create the feature vector. When you use
**Visual Observations**, you only need to identify which Unity Camera objects
or RenderTextures will provide images and the base Agent class handles the rest.
You do not need to implement the `CollectObservations()` method when your Agent
**Visual Observations**, you only need to identify which Unity Camera objects
or RenderTextures will provide images and the base Agent class handles the rest.
You do not need to implement the `CollectObservations()` method when your Agent
uses visual observations (unless it also uses vector observations).
### Vector Observation Space: Feature Vectors

### Multiple Visual Observations
Visual observations use rendered textures directly or from one or more
cameras in a scene. The Policy vectorizes the textures into a 3D Tensor which
can be fed into a convolutional neural network (CNN). For more information on
CNNs, see [this guide](http://cs231n.github.io/convolutional-networks/). You
Visual observations use rendered textures directly or from one or more
cameras in a scene. The Policy vectorizes the textures into a 3D Tensor which
can be fed into a convolutional neural network (CNN). For more information on
CNNs, see [this guide](http://cs231n.github.io/convolutional-networks/). You
Agents using visual observations can capture state of arbitrary complexity and
are useful when the state is difficult to describe numerically. However, they
are also typically less efficient and slower to train, and sometimes don't
Agents using visual observations can capture state of arbitrary complexity and
are useful when the state is difficult to describe numerically. However, they
are also typically less efficient and slower to train, and sometimes don't
Visual observations can be derived from Cameras or RenderTextures within your scene.
To add a visual observation to an Agent, either click on the `Add Camera` or
`Add RenderTexture` button in the Agent inspector. Then drag the camera or
render texture you want to add to the `Camera` or `RenderTexture` field.
You can have more than one camera or render texture and even use a combination
Visual observations can be derived from Cameras or RenderTextures within your scene.
To add a visual observation to an Agent, either click on the `Add Camera` or
`Add RenderTexture` button in the Agent inspector. Then drag the camera or
render texture you want to add to the `Camera` or `RenderTexture` field.
You can have more than one camera or render texture and even use a combination
of both attached to an Agent.
![Agent Camera](images/visual-observation.png)

specify the number of Resolutions the Agent is using for its visual observations.
For each visual observation, set the width and height of the image (in pixels)
and whether or not the observation is color or grayscale (when `Black And White`
is checked).
is checked).
three **Visual Observations** have to be added to the **Behavior Parameters**.
During runtime, if a combination of `Cameras` and `RenderTextures` is used, all
three **Visual Observations** have to be added to the **Behavior Parameters**.
During runtime, if a combination of `Cameras` and `RenderTextures` is used, all
order they appear in the editor.
order they appear in the editor.
RenderTexture observations will throw an `Exception` if the width/height doesn't
RenderTexture observations will throw an `Exception` if the width/height doesn't
When using `RenderTexture` visual observations, a handy feature for debugging is
adding a `Canvas`, then adding a `Raw Image` with it's texture set to the Agent's
`RenderTexture`. This will render the agent observation on the game screen.
When using `RenderTexture` visual observations, a handy feature for debugging is
adding a `Canvas`, then adding a `Raw Image` with it's texture set to the Agent's
`RenderTexture`. This will render the agent observation on the game screen.
The [GridWorld environment](Learning-Environment-Examples.md#gridworld)
is an example on how to use a RenderTexture for both debugging and observation. Note
that in this example, a Camera is rendered to a RenderTexture, which is then used for
observations and debugging. To update the RenderTexture, the Camera must be asked to
render every time a decision is requested within the game code. When using Cameras
The [GridWorld environment](Learning-Environment-Examples.md#gridworld)
is an example on how to use a RenderTexture for both debugging and observation. Note
that in this example, a Camera is rendered to a RenderTexture, which is then used for
observations and debugging. To update the RenderTexture, the Camera must be asked to
render every time a decision is requested within the game code. When using Cameras
as observations directly, this is done automatically by the Agent.
![Agent RenderTexture Debug](images/gridworld.png)

is an array of indices. The number of indices in the array is determined by the
number of branches defined in the `Branches Size` property. Each branch
corresponds to an action table, you can specify the size of each table by
modifying the `Branches` property.
modifying the `Branches` property.
Neither the Policy nor the training algorithm know anything about what the action
values themselves mean. The training algorithm simply tries different values for

with values ranging from zero to one.
Note that when you are programming actions for an agent, it is often helpful to
test your action logic using the `Heuristic()` method of the Agent,
test your action logic using the `Heuristic()` method of the Agent,
which lets you map keyboard
commands to actions.

Perhaps the best advice is to start simple and only add complexity as needed. In
general, you should reward results rather than actions you think will lead to
the desired results. To help develop your rewards, you can use the Monitor class
to display the cumulative reward received by an Agent. You can even use the
to display the cumulative reward received by an Agent. You can even use the
Agent's Heuristic to control the Agent while watching how it accumulates rewards.
Allocate rewards to an Agent by calling the `AddReward()` method in the

platform.
Note that all of these environments make use of the `Done()` method, which manually
terminates an episode when a termination condition is reached. This can be
terminates an episode when a termination condition is reached. This can be
called independently of the `Max Step` property.
## Agent Properties

* `Branches` (Discrete) - An array of integers, defines multiple concurrent
discrete actions. The values in the `Branches` array correspond to the
number of possible discrete values for each action branch.
* `Model` - The neural network model used for inference (obtained after
* `Model` - The neural network model used for inference (obtained after
* `Visual Observations` - A list of `Cameras` or `RenderTextures` which will
* `Visual Observations` - A list of `Cameras` or `RenderTextures` which will
be used to generate observations.
* `Max Step` - The per-agent maximum number of steps. Once this number is
reached, the Agent will be reset if `Reset On Done` is checked.

16
docs/Learning-Environment-Design.md


To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
a single Academy subclass and as many Agent subclasses
as you need.
as you need.
Agent instances should be attached to the GameObject representing that Agent.
### Academy

The Agent class represents an actor in the scene that collects observations and
carries out actions. The Agent class is typically attached to the GameObject in
the scene that otherwise represents the actor — for example, to a player object
in a football game or a car object in a vehicle simulation. Every Agent must
in a football game or a car object in a vehicle simulation. Every Agent must
have appropriate `Behavior Parameters`.
To create an Agent, extend the Agent class and implement the essential

You must also determine how an Agent finishes its task or times out. You can
manually set an Agent to done in your `AgentAction()` function when the Agent
has finished (or irrevocably failed) its task by calling the `Done()` function.
You can also set the Agent's `Max Steps` property to a positive value and the
Agent will consider itself done after it has taken that many steps. If you
set an Agent's `ResetOnDone` property to true, then the Agent can attempt its
task several times in one episode. (Use the `Agent.AgentReset()` function to
has finished (or irrevocably failed) its task by calling the `Done()` function.
You can also set the Agent's `Max Steps` property to a positive value and the
Agent will consider itself done after it has taken that many steps. If you
set an Agent's `ResetOnDone` property to true, then the Agent can attempt its
task several times in one episode. (Use the `Agent.AgentReset()` function to
prepare the Agent to start again.)
See [Agents](Learning-Environment-Design-Agents.md) for detailed information

properties that can be set differently for a training scene versus a regular
scene. The Academy's **Configuration** properties control rendering and time
scale. You can set the **Training Configuration** to minimize the time Unity
spends rendering graphics in order to speed up training.
spends rendering graphics in order to speed up training.
When you create a training environment in Unity, you must set up the scene so
that it can be controlled by the external training process. Considerations
include:

8
docs/Learning-Environment-Examples.md


* Default: 1
* Recommended Minimum: 0.2
* Recommended Maximum: 5
* gravity: Magnitude of gravity
* gravity: Magnitude of gravity
* Default: 9.81
* Recommended Minimum: 4
* Recommended Maximum: 105

* Reset Parameters: Three
* angle: Angle of the racket from the vertical (Y) axis.
* Default: 55
* Recommended Minimum: 35
* Recommended Minimum: 35
* Recommended Maximum: 65
* gravity: Magnitude of gravity
* Default: 9.81

* Set-up: A platforming environment where the agent can jump over a wall.
* Goal: The agent must use the block to scale the wall and reach the goal.
* Agents: The environment contains one agent linked to two different
Models. The Policy the agent is linked to changes depending on the
* Agents: The environment contains one agent linked to two different
Models. The Policy the agent is linked to changes depending on the
height of the wall. The change of Policy is done in the WallJumpAgent class.
* Agent Reward Function:
* -0.0005 for every step.

10
docs/ML-Agents-Overview.md


an Agent, and each Agent has a Policy. The Policy receives observations
and rewards from the Agent and returns actions. The Academy ensures that all the
Agents are in sync in addition to controlling environment-wide
settings.
settings.
## Training Modes

learn the best policy for each medic. Once training concludes, the learned
policy for each medic can be exported. Given that all our implementations are
based on TensorFlow, the learned policy is just a TensorFlow model file. Then
during the inference phase, we use the
during the inference phase, we use the
TensorFlow model generated from the training phase. Now during the inference
phase, the medics still continue to generate their observations, but instead of
being sent to the Python API, they will be fed into their (internal, embedded)

In the previous mode, the Agents were used for training to generate
a TensorFlow model that the Agents can later use. However,
any user of the ML-Agents toolkit can leverage their own algorithms for
training. In this case, the behaviors of all the Agents in the scene
training. In this case, the behaviors of all the Agents in the scene
will be controlled within Python.
You can even turn your environment into a [gym.](../gym-unity/README.md)

to perform, rather than attempting to have it learn via trial-and-error methods.
For example, instead of training the medic by setting up its reward function,
this mode allows providing real examples from a game controller on how the medic
should behave. More specifically, in this mode, the Agent must use its heuristic
should behave. More specifically, in this mode, the Agent must use its heuristic
to generate action, and all the actions performed with the controller (in addition
to the agent observations) will be recorded. The
imitation learning algorithm will then use these pairs of observations and

signals with same or different `Behavior Parameters`. In this
scenario, agents must compete with one another to either win a competition, or
obtain some limited set of resources. All team sports fall into this scenario.
- Ecosystem. Multiple interacting agents with independent reward signals with
- Ecosystem. Multiple interacting agents with independent reward signals with
same or different `Behavior Parameters`. This scenario can be thought
of as creating a small world in which animals with different goals all
interact, such as a savanna in which there might be zebras, elephants and

2
docs/Migrating.md


### Important Changes
* The definition of the gRPC service has changed.
* The online BC training feature has been removed.
* The online BC training feature has been removed.
* The BroadcastHub has been deprecated. If there is a training Python process, all LearningBrains in the scene will automatically be trained. If there is no Python process, inference will be used.
* The Brain ScriptableObjects have been deprecated. The Brain Parameters are now on the Agent and are referred to as Behavior Parameters. Make sure the Behavior Parameters is attached to the Agent GameObject.
* Several changes were made to the setup for visual observations (i.e. using Cameras or RenderTextures):

2
docs/Readme.md


* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
### Cloud Training (Deprecated)
Here are the cloud training set-up guides for Azure and AWS. We no longer use them ourselves and
Here are the cloud training set-up guides for Azure and AWS. We no longer use them ourselves and
so they may not be work correctly. We've decided to keep them up just in case they are helpful to
you.

34
docs/Training-Behavioral-Cloning.md


# Training with Behavioral Cloning
There are a variety of possible imitation learning algorithms which can
be used, the simplest one of them is Behavioral Cloning. It works by collecting
demonstrations from a teacher, and then simply uses them to directly learn a
policy, in the same way the supervised learning for image classification
There are a variety of possible imitation learning algorithms which can
be used, the simplest one of them is Behavioral Cloning. It works by collecting
demonstrations from a teacher, and then simply uses them to directly learn a
policy, in the same way the supervised learning for image classification
With offline behavioral cloning, we can use demonstrations (`.demo` files)
With offline behavioral cloning, we can use demonstrations (`.demo` files)
1. Choose an agent you would like to learn to imitate some set of demonstrations.
2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)).
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
3. Build the scene(make sure the Agent is not using its heuristic).
4. Open the `config/offline_bc_config.yaml` file.
5. Modify the `demo_path` parameter in the file to reference the path to the
demonstration file recorded in step 2. In our case this is:
1. Choose an agent you would like to learn to imitate some set of demonstrations.
2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)).
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
3. Build the scene(make sure the Agent is not using its heuristic).
4. Open the `config/offline_bc_config.yaml` file.
5. Modify the `demo_path` parameter in the file to reference the path to the
demonstration file recorded in step 2. In our case this is:
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml`
as the config parameter, and include the `--run-id` and `--train` as usual.
Provide your environment as the `--env` parameter if it has been compiled
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml`
as the config parameter, and include the `--run-id` and `--train` as usual.
Provide your environment as the `--env` parameter if it has been compiled
This will use the demonstration file to train a neural network driven agent
to directly imitate the actions provided in the demonstration. The environment
This will use the demonstration file to train a neural network driven agent
to directly imitate the actions provided in the demonstration. The environment
will launch and be used for evaluating the agent's performance during training.

4
docs/Training-Curriculum-Learning.md


## How-To
Each group of Agents under the same `Behavior Name` in an environment can have
Each group of Agents under the same `Behavior Name` in an environment can have
a corresponding curriculum. These
curriculums are held in what we call a metacurriculum. A metacurriculum allows
different groups of Agents to follow different curriculums within the same environment.

We will save this file into our metacurriculum folder with the name of its
corresponding `Behavior Name`. For example, in the Wall Jump environment, there are two
different `Behaviors Name` set via script in `WallJumpAgent.cs`
different `Behaviors Name` set via script in `WallJumpAgent.cs`
---BigWallBrainLearning and SmallWallBrainLearning. If we want to define a curriculum for
the BigWallBrainLearning, we will save `BigWallBrainLearning.json` into
`config/curricula/wall-jump/`.

70
docs/Training-Generalized-Reinforcement-Learning-Agents.md


agents are unable to generalize to any tweaks or variations in the environment.
This is analogous to a model being trained and tested on an identical dataset
in supervised learning. This becomes problematic in cases where environments
are randomly instantiated with varying objects or properties.
are randomly instantiated with varying objects or properties.
To make agents robust and generalizable to different environments, the agent
should be trained over multiple variations of the environment. Using this approach

## How to Enable Generalization Using Reset Parameters
We first need to provide a way to modify the environment by supplying a set of `Reset Parameters`
and vary them over time. This provision can be done either deterministically or randomly.
and vary them over time. This provision can be done either deterministically or randomly.
This is done by assigning each `Reset Parameter` a `sampler-type`(such as a uniform sampler),
This is done by assigning each `Reset Parameter` a `sampler-type`(such as a uniform sampler),
`Reset Parameter`, the parameter maintains the default value throughout the
training procedure, remaining unchanged. The samplers for all the `Reset Parameters`
are handled by a **Sampler Manager**, which also handles the generation of new
values for the reset parameters when needed.
`Reset Parameter`, the parameter maintains the default value throughout the
training procedure, remaining unchanged. The samplers for all the `Reset Parameters`
are handled by a **Sampler Manager**, which also handles the generation of new
values for the reset parameters when needed.
To setup the Sampler Manager, we create a YAML file that specifies how we wish to
generate new samples for each `Reset Parameters`. In this file, we specify the samplers and the
`resampling-interval` (the number of simulation steps after which reset parameters are
To setup the Sampler Manager, we create a YAML file that specifies how we wish to
generate new samples for each `Reset Parameters`. In this file, we specify the samplers and the
`resampling-interval` (the number of simulation steps after which reset parameters are
resampled). Below is an example of a sampler file for the 3D ball environment.
```yaml

Below is the explanation of the fields in the above example.
* `resampling-interval` - Specifies the number of steps for the agent to
train under a particular environment configuration before resetting the
* `resampling-interval` - Specifies the number of steps for the agent to
train under a particular environment configuration before resetting the
* `Reset Parameter` - Name of the `Reset Parameter` like `mass`, `gravity` and `scale`. This should match the name
specified in the academy of the intended environment for which the agent is
being trained. If a parameter specified in the file doesn't exist in the
* `Reset Parameter` - Name of the `Reset Parameter` like `mass`, `gravity` and `scale`. This should match the name
specified in the academy of the intended environment for which the agent is
being trained. If a parameter specified in the file doesn't exist in the
* `sampler-type` - Specify the sampler type to use for the `Reset Parameter`.
This is a string that should exist in the `Sampler Factory` (explained
* `sampler-type` - Specify the sampler type to use for the `Reset Parameter`.
This is a string that should exist in the `Sampler Factory` (explained
* `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
In the example above, this would correspond to the `intervals`
under the `sampler-type` `"multirange_uniform"` for the `Reset Parameter` called gravity`.
The key name should match the name of the corresponding argument in the sampler definition.
* `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
In the example above, this would correspond to the `intervals`
under the `sampler-type` `"multirange_uniform"` for the `Reset Parameter` called gravity`.
The key name should match the name of the corresponding argument in the sampler definition.
(See below)
The Sampler Manager allocates a sampler type for each `Reset Parameter` by using the *Sampler Factory*,

Below is a list of included `sampler-type` as part of the toolkit.
* `uniform` - Uniform sampler
* Uniformly samples a single float value between defined endpoints.
The sub-arguments for this sampler to specify the interval
endpoints are as below. The sampling is done in the range of
* Uniformly samples a single float value between defined endpoints.
The sub-arguments for this sampler to specify the interval
endpoints are as below. The sampling is done in the range of
* `gaussian` - Gaussian sampler
* `gaussian` - Gaussian sampler
the mean and standard deviation. The sub-arguments to specify the
the mean and standard deviation. The sub-arguments to specify the
* Uniformly samples a single float value between the specified intervals.
Samples by first performing a weight pick of an interval from the list
of intervals (weighted based on interval width) and samples uniformly
from the selected interval (half-closed interval, same as the uniform
sampler). This sampler can take an arbitrary number of intervals in a
list in the following format:
* Uniformly samples a single float value between the specified intervals.
Samples by first performing a weight pick of an interval from the list
of intervals (weighted based on interval width) and samples uniformly
from the selected interval (half-closed interval, same as the uniform
sampler). This sampler can take an arbitrary number of intervals in a
list in the following format:
* **sub-arguments** - `intervals`
The implementation of the samplers can be found at `ml-agents-envs/mlagents/envs/sampler_class.py`.

If you want to define your own sampler type, you must first inherit the *Sampler*
base class (included in the `sampler_class` file) and preserve the interface.
Once the class for the required method is specified, it must be registered in the Sampler Factory.
Once the class for the required method is specified, it must be registered in the Sampler Factory.
This can be done by subscribing to the *register_sampler* method of the SamplerFactory. The command
is as follows:

sampling setup, we would run
```sh
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml
--run-id=3D-Ball-generalization --train
```

20
docs/Training-ML-Agents.md


And then opening the URL: [localhost:6006](http://localhost:6006).
**Note:** The default port TensorBoard uses is 6006. If there is an existing session
running on port 6006 a new session can be launched on an open port using the --port
running on port 6006 a new session can be launched on an open port using the --port
option.
When training is finished, you can find the saved model in the `models` folder

Default is set to 1. Set to higher values when benchmarking performance and
multiple training sessions is desired. Training sessions are independent, and
do not improve learning performance.
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to
collect experiences from when training. Defaults to 1.
* `--run-id=<path>`: Specifies an identifier for each training run. This
identifier is used to name the subdirectories in which the trained model and

All arguments after this flag will be passed to the executable. For example, setting
`mlagents-learn config/trainer_config.yaml --env-args --num-orcs 42` would result in
` --num-orcs 42` passed to the executable.
* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance
will get assigned a port sequentially, starting from the `base-port`. Each instance
will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs
* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance
will get assigned a port sequentially, starting from the `base-port`. Each instance
will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs
given to each instance from 0 to `num_envs - 1`. Default is 5005. __Note:__ When
training using the Editor rather than an executable, the base port will be ignored.
* `--slow`: Specify this option to run the Unity environment at normal, game

The training config files `config/trainer_config.yaml`, `config/sac_trainer_config.yaml`,
`config/gail_config.yaml` and `config/offline_bc_config.yaml` specifies the training method,
the hyperparameters, and a few additional values to use when training with Proximal Policy
Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial Imitation Learning)
with PPO, and online and offline Behavioral Cloning(BC)/Imitation. These files are divided
the hyperparameters, and a few additional values to use when training with Proximal Policy
Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial Imitation Learning)
with PPO, and online and offline Behavioral Cloning(BC)/Imitation. These files are divided
The **default** section defines the default values for all the available settings. You can
The **default** section defines the default values for all the available settings. You can
override sections after the appropriate `Behavior Name`. Sections for the
override sections after the appropriate `Behavior Name`. Sections for the
example environments are included in the provided config file.
| **Setting** | **Description** | **Applies To Trainer\*** |

2
docs/Training-Using-Concurrent-Unity-Instances.md


# Training Using Concurrent Unity Instances
As part of release v0.8, we enabled developers to run concurrent, parallel instances of the Unity executable during training. For certain scenarios, this should speed up the training.
As part of release v0.8, we enabled developers to run concurrent, parallel instances of the Unity executable during training. For certain scenarios, this should speed up the training.
## How to Run Concurrent Unity Instances During Training

2
docs/Training-on-Amazon-Web-Service.md


# Training on Amazon Web Service
Note: We no longer use this guide ourselves and so it may not work correctly. We've
Note: We no longer use this guide ourselves and so it may not work correctly. We've
decided to keep it up just in case it is helpful to you.
This page contains instructions for setting up an EC2 instance on Amazon Web

2
docs/Training-on-Microsoft-Azure-Custom-Instance.md


6. Navigate to [http://developer.nvidia.com](http://developer.nvidia.com) and
create an account and verify it.
7. Download (to your own computer) cuDNN from [this url](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v6/prod/8.0_20170307/Ubuntu16_04_x64/libcudnn6_6.0.20-1+cuda8.0_amd64-deb).
7. Download (to your own computer) cuDNN from [this url](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v6/prod/8.0_20170307/Ubuntu16_04_x64/libcudnn6_6.0.20-1+cuda8.0_amd64-deb).
8. Copy the deb package to your VM:

2
docs/Training-on-Microsoft-Azure.md


# Training on Microsoft Azure (works with ML-Agents toolkit v0.3)
Note: We no longer use this guide ourselves and so it may not work correctly. We've
Note: We no longer use this guide ourselves and so it may not work correctly. We've
decided to keep it up just in case it is helpful to you.
This page contains instructions for setting up training on Microsoft Azure

22
docs/Unity-Inference-Engine.md


The ML-Agents toolkit allows you to use pre-trained neural network models
inside your Unity games. This support is possible thanks to the Unity Inference
Engine. The Unity Inference Engine is using
[compute shaders](https://docs.unity3d.com/Manual/class-ComputeShader.html)
to run the neural network within Unity.
Engine. The Unity Inference Engine is using
[compute shaders](https://docs.unity3d.com/Manual/class-ComputeShader.html)
to run the neural network within Unity.
Scripting Backends : The Unity Inference Engine is generally faster with
Scripting Backends : The Unity Inference Engine is generally faster with
In the Editor, It is not possible to use the Unity Inference Engine with
GPU device selected when Editor Graphics Emulation is set to __OpenGL(ES)
3.0 or 2.0 emulation__. Also there might be non-fatal build time errors
when target platform includes Graphics API that does not support
In the Editor, It is not possible to use the Unity Inference Engine with
GPU device selected when Editor Graphics Emulation is set to __OpenGL(ES)
3.0 or 2.0 emulation__. Also there might be non-fatal build time errors
when target platform includes Graphics API that does not support
__Unity Compute Shaders__.
The Unity Inference Engine supposedly works on any Unity supported platform
but we only tested for the following platforms :

## Using the Unity Inference Engine
When using a model, drag the `.nn` file into the **Model** field
in the Inspector of the Agent.
When using a model, drag the `.nn` file into the **Model** field
in the Inspector of the Agent.
You should use the GPU only if you use the
You should use the GPU only if you use the
ResNet visual encoder or have a large number of agents with visual observations.

8
docs/Using-Tensorboard.md


4. Open a browser window and navigate to [localhost:6006](http://localhost:6006).
**Note:** The default port TensorBoard uses is 6006. If there is an existing session
running on port 6006 a new session can be launched on an open port using the --port
running on port 6006 a new session can be launched on an open port using the --port
option.
**Note:** If you don't assign a `run-id` identifier, `mlagents-learn` uses the

* `Environment/Cumulative Reward` - The mean cumulative episode reward over all agents. Should
increase during a successful training session.
* `Environment/Episode Length` - The mean length of each episode in the environment for all agents.
### Policy Statistics

* `Policy/Learning Rate` (PPO; BC) - How large a step the training algorithm takes as it searches
for the optimal policy. Should decrease over time.
* `Policy/Value Estimate` (PPO) - The mean value estimate for all states visited by the agent. Should increase during a successful training session.
* `Policy/Curiosity Reward` (PPO+Curiosity) - This corresponds to the mean cumulative intrinsic reward generated per-episode.

* `Losses/Inverse Loss` (PPO+Curiosity) - The mean magnitude of the forward model
loss function. Corresponds to how well the model is able to predict the action
taken between two observations.
* `Losses/Cloning Loss` (BC) - The mean magnitude of the behavioral cloning loss. Corresponds to how well the model imitates the demonstration data.

30
docs/Using-Virtual-Environment.md


# Using Virtual Environment
## What is a Virtual Environment?
A Virtual Environment is a self contained directory tree that contains a Python installation
for a particular version of Python, plus a number of additional packages. To learn more about
A Virtual Environment is a self contained directory tree that contains a Python installation
for a particular version of Python, plus a number of additional packages. To learn more about
A Virtual Environment keeps all dependencies for the Python project separate from dependencies
A Virtual Environment keeps all dependencies for the Python project separate from dependencies
1. It enables using and testing of different library versions by quickly
1. It enables using and testing of different library versions by quickly
different version.
different version.
Requirement - Python 3.6 must be installed on the machine you would like
to run ML-Agents on (either local laptop/desktop or remote server). Python 3.6 can be
installed from [here](https://www.python.org/downloads/).
Requirement - Python 3.6 must be installed on the machine you would like
to run ML-Agents on (either local laptop/desktop or remote server). Python 3.6 can be
installed from [here](https://www.python.org/downloads/).
## Installing Pip (Required)

1. Check pip version using `pip3 -V`
Note (for Ubuntu users): If the `ModuleNotFoundError: No module named 'distutils.util'` error is encountered, then
python3-distutils needs to be installed. Install python3-distutils using `sudo apt-get install python3-distutils`
python3-distutils needs to be installed. Install python3-distutils using `sudo apt-get install python3-distutils`
1. To create a new environment named `sample-env` execute `$ python3 -m venv ~/python-envs/sample-env`
1. To create a new environment named `sample-env` execute `$ python3 -m venv ~/python-envs/sample-env`
1. Verify pip version is the same as in the __Installing Pip__ section. In case it is not the latest, upgrade to
the latest pip version using `pip3 install --upgrade pip`
1. Verify pip version is the same as in the __Installing Pip__ section. In case it is not the latest, upgrade to
the latest pip version using `pip3 install --upgrade pip`
## Ubuntu Setup
## Ubuntu Setup
1. Install the python3-venv package using `$ sudo apt-get install python3-venv`
1. Follow the steps in the Mac OS X installation.

1. Create a folder where the virtual environments will reside `$ md python-envs`
1. To create a new environment named `sample-env` execute `$ python3 -m venv python-envs\sample-env`
1. To create a new environment named `sample-env` execute `$ python3 -m venv python-envs\sample-env`
1. Verify pip version is the same as in the __Installing Pip__ section. In case it is not the latest, upgrade to
1. Verify pip version is the same as in the __Installing Pip__ section. In case it is not the latest, upgrade to
the latest pip version using `pip3 install --upgrade pip`
1. Install ML-Agents package using `$ pip3 install mlagents`
1. To deactivate the environment execute `$ deactivate`

8
docs/localized/KR/README.md


[![license badge](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)
**Unity Machine Learning Agents Toolkit** (ML-Agents) 은 지능형 에이전트를 학습시키기 위한
환경을 제공하여 게임 또는 시뮬레이션을 만들 수 있게 해주는 오픈소스 유니티 플러그인 입니다. 사용하기 쉬운
환경을 제공하여 게임 또는 시뮬레이션을 만들 수 있게 해주는 오픈소스 유니티 플러그인 입니다. 사용하기 쉬운
NPC의 행동 제어(다중 에이전트, 적대적 에이전트 등), 게임 빌드 테스트 자동화, 그리고 출시 전 게임 설계 검증 등을 포함한 다양한 목적을 위해 사용될 수 있습니다.
NPC의 행동 제어(다중 에이전트, 적대적 에이전트 등), 게임 빌드 테스트 자동화, 그리고 출시 전 게임 설계 검증 등을 포함한 다양한 목적을 위해 사용될 수 있습니다.
ML-Agents toolkit은 유니티의 풍부한 환경에서 인공지능 에이전트 개발을 위한 중심 플랫폼을 제공함으로써 더욱 광범위한 연구와 게임 개발이 진행되도록 하며 이에 따라 게임 개발자들과 AI 연구원들 모두에게 도움을 줍니다.
## 특징

## 커뮤니티 그리고 피드백
ML-Agents toolkit은 오픈소스 프로젝트이며 컨트리뷰션을 환영합니다. 만약 컨트리뷰션을 원하시는 경우
ML-Agents toolkit은 오픈소스 프로젝트이며 컨트리뷰션을 환영합니다. 만약 컨트리뷰션을 원하시는 경우
발전하고 성장할 수 있습니다. 단 몇 분만 사용하여 [저희에게 알려주세요](https://github.com/Unity-Technologies/ml-agents/issues/1454).
발전하고 성장할 수 있습니다. 단 몇 분만 사용하여 [저희에게 알려주세요](https://github.com/Unity-Technologies/ml-agents/issues/1454).
다른 의견과 피드백은 ML-Agents 팀과 직접 연락부탁드립니다. (ml-agents@unity3d.com)

40
docs/localized/KR/docs/Installation-Windows.md


# Windows �����ڸ� ���� ML-Agents Toolkit ��ġ ����
ML-Agents toolkit�� Windows 10�� �����մϴ�. �ٸ� ������ Windows ���ε� ML-Agents toolkit��
������ �� ������ �������� �ʾҽ��ϴ�. ����, ML-Agents toolkit�� Windows VM(Bootcamp �Ǵ� ���� ó��
������ �� ������ �������� �ʾҽ��ϴ�. ����, ML-Agents toolkit�� Windows VM(Bootcamp �Ǵ� ���� ó��
�� ���̵��� ���� GPU ���� �н�(�����ڸ� ����)�� ���� ���� ������ �ٷ��ϴ�.
�� ���̵��� ���� GPU ���� �н�(�����ڸ� ����)�� ���� ���� ������ �ٷ��ϴ�.
����, ML-Agents toolkit�� ���� GPU ���� �н��� �ʿ����� ������ ���� ���� �Ǵ� Ư�� ���׿� �ʿ��� �� �ֽ��ϴ�.
## �ܰ� 1: Anaconda�� ���� Python ��ġ

Python 2�� ���̻� �������� �ʱ� ������ Python 3.5 �Ǵ� 3.6�� �ʿ��մϴ�. �� ���̵忡�� �츮��
Python 2�� ���̻� �������� �ʱ� ������ Python 3.5 �Ǵ� 3.6�� �ʿ��մϴ�. �� ���̵忡�� �츮��
Python 3.6 ������ Anaconda 5.1 ������ ������ ���Դϴ�.
([64-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86_64.exe)
�Ǵ� [32-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86.exe)

"conda is not recognized as internal or external command" ���� ������ ���� ���Դϴ�.
�̸� �ذ��ϱ� ���� ��Ȯ�� ȯ�� ���� ������ �ʿ��մϴ�.
Ž�� â���� `ȯ�� ����`�� Ÿ���� �Ͽ� (������ Ű�� �����ų� ���� �Ʒ� ������ ��ư�� ���� �� �� �ֽ��ϴ�).
Ž�� â���� `ȯ�� ����`�� Ÿ���� �Ͽ� (������ Ű�� �����ų� ���� �Ʒ� ������ ��ư�� ���� �� �� �ֽ��ϴ�).
__�ý��� ȯ�� ���� ����__ �ɼ��� �ҷ��ɴϴ�.
<p align="center">

## �ܰ� 3: �ʼ� ���̽� ��Ű�� ��ġ
ML-Agents toolkit�� ���� ���̽� ��Ű���� �������Դϴ�. `pip`�� �����Ͽ� �� ���̽� ���Ӽ����� ��ġ�Ͻʽÿ�.
ML-Agents toolkit�� ���� ���̽� ��Ű���� �������Դϴ�. `pip`�� �����Ͽ� �� ���̽� ���Ӽ����� ��ġ�Ͻʽÿ�.
ML-Agents Toolkit ������ �����Ұ� ���� ��ǻ�Ϳ� �����Ǿ����� �ʾҴٸ� �����Ͻʽÿ�. Git�� ([�ٿ��ε�](https://git-scm.com/download/win))�ϰ�
������Ų �� ���� ���ɾ Anaconda ������Ʈâ�� �Է��Ͽ� ������ �� �ֽ��ϴ�. _(���� �� ������Ʈ â�� �����ִٸ� `activate ml-agents`�� Ÿ�����Ͽ�

`ml-agents` ���� �����丮���� ����Ƽ ȯ���� �԰� �����ϴ� ���� ��ȭ�н� Ʈ���̳� ���̽� ��Ű���� ���ԵǾ� �ֽ��ϴ�.
`ml-agents-envs` ���� �����丮���� `ml-agents` ��Ű���� ���ӵǴ� ����Ƽ�� �������̽��� ���� ���̽� API�� ���ԵǾ� �ֽ��ϴ�.
`ml-agents-envs` ���� �����丮���� `ml-agents` ��Ű���� ���ӵǴ� ����Ƽ�� �������̽��� ���� ���̽� API�� ���ԵǾ� �ֽ��ϴ�.
`gym-unity` ���� �����丮���� OpenAI Gym�� �������̽��� ���� ��Ű���� ���ԵǾ� �ֽ��ϴ�.

`--no-cache-dir`�� pip���� ij���� ��Ȱ��ȭ �Ѵٴ� ���Դϴ�.
### ������ ���� ��ġ
### ������ ���� ��ġ
�̸� ����, `ml-agents` �� `ml-agents-envs` �� ���� ��ġ�ؾ� �մϴ�.
�������� ������ `C:\Downloads`�� ��ġ�� �ֽ��ϴ�. ������ �����ϰų� �ٿ��ε��� ��
�̸� ����, `ml-agents` �� `ml-agents-envs` �� ���� ��ġ�ؾ� �մϴ�.
�������� ������ `C:\Downloads`�� ��ġ�� �ֽ��ϴ�. ������ �����ϰų� �ٿ��ε��� ��
�������� ���� �����丮���� ������ �����Ͻʽÿ�:
```console

pip install -e .
```
`-e` �÷��׸� �����Ͽ� pip�� ���� �ϸ� ���̽� ������ ���� ������ �� �ְ� `mlagents-learn`�� ������ �� �ݿ��˴ϴ�.
`-e` �÷��׸� �����Ͽ� pip�� ���� �ϸ� ���̽� ������ ���� ������ �� �ְ� `mlagents-learn`�� ������ �� �ݿ��˴ϴ�.
�� �������� ��Ű���� ��ġ�ϴ� ���� �߿��մϴ�.
�� �������� ��Ű���� ��ġ�ϴ� ���� �߿��մϴ�.
## (�ɼ�) Step 4: ML-Agents Toolkit�� ������ GPU �н�
## (�ɼ�) Step 4: ML-Agents Toolkit�� ������ GPU �н�
ML-Agents toolkit�� ���� GPU�� �ʿ����� ������ �н� �߿� PPO �˰����� �ӵ��� ũ�� ������ ���մϴ�(������ ���Ŀ� GPU�� ������ �� �� �ֽ��ϴ�).
�� ���̵��� GPU�� ������ �н��� �ϰ� ���� ���� �����ڸ� ���� ���̵� �Դϴ�. ���� GPU�� CUDA�� ȣȯ�Ǵ��� Ȯ���ؾ� �մϴ�.

### Nvidia CUDA toolkit ��ġ
Nvidia ��ī�̺꿡�� CUDA ��Ŷ(toolkit) 9.0�� [�ٿ��ε�](https://developer.nvidia.com/cuda-toolkit-archive)�ϰ� ��ġ�Ͻʽÿ�.
ML-Agents toolkit�� ������Ű�� ���� CUDA ��Ŷ�� GPU ���� ���̺귯��,
ML-Agents toolkit�� ������Ű�� ���� CUDA ��Ŷ�� GPU ���� ���̺귯��,
������-����ȭ ����, C/C++(���־� ��Ʃ���� 2017) �����Ϸ�, ��Ÿ�� ���̺귯���� �����մϴ�.
�� ���̵忡���� [9.0.176](https://developer.nvidia.com/compute/cuda/9.0/Prod/network_installers/cuda_9.0.176_win10_network-exe))������ �����մϴ�.

### Nvidia cuDNN ���̺귯�� ��ġ
Nvidia���� cuDNN ���̺귯���� [�ٿ��ε�](https://developer.nvidia.com/cudnn)�ϰ� ��ġ�Ͻʽÿ�.
Nvidia���� cuDNN ���̺귯���� [�ٿ��ε�](https://developer.nvidia.com/cudnn)�ϰ� ��ġ�Ͻʽÿ�.
cuDNN�� ���� �Ű����� ���� �⺻�� �Ǵ� GPU ���� ���̺귯��. �ٿ��ε� ���� Nvidia Developer Program�� �����ؾ��� ���Դϴ�(����).
<p align="center">

�����ϰ� cuDNN [�ٿ��ε� ������](https://developer.nvidia.com/cudnn)�� ���ư��ʽÿ�.
ª�� �������翡 �����ؾ� �� ���� �ֽ��ϴ�. When you get to the list
cuDNN ������ ����Ʈ���� __�ܰ� 1���� ��ġ�� CUDA ��Ŷ�� �´� ������ �ٿ��ε��ϰ� �ִ��� Ȯ���Ͻʽÿ�.__ �� ���̵忡����,
cuDNN ������ ����Ʈ���� __�ܰ� 1���� ��ġ�� CUDA ��Ŷ�� �´� ������ �ٿ��ε��ϰ� �ִ��� Ȯ���Ͻʽÿ�.__ �� ���̵忡����,
cuDNN ������ �ٿ��ε� �� �Ŀ�, CUDA ��Ŷ �����丮�ȿ� ������ ����(���� ����)�ؾ� �մϴ�.
cuDNN ������ �ٿ��ε� �� �Ŀ�, CUDA ��Ŷ �����丮�ȿ� ������ ����(���� ����)�ؾ� �մϴ�.
cuDNN zip ���� �ȿ��� ������ ���� `bin`, `include`, �׸��� `lib`�� �ֽ��ϴ�.
<p align="center">

</p>
�� ������ ������ CUDA ��Ŷ �����丮�ȿ� �����Ͻʽÿ�.
�� ������ ������ CUDA ��Ŷ �����丮�ȿ� �����Ͻʽÿ�.
CUDA ��Ŷ �����丮�� `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`�� ��ġ�� �ֽ��ϴ�.
<p align="center">

1���� ȯ�� ������ 2���� ���� ������ �߰��ؾ� �մϴ�.
ȯ�� ������ �����ϱ� ����, Ž�� â���� `ȯ�� ����`�� Ÿ���� �Ͽ� (������ Ű�� �����ų� ���� �Ʒ� ������ ��ư�� ���� �� �� �ֽ��ϴ�).
ȯ�� ������ �����ϱ� ����, Ž�� â���� `ȯ�� ����`�� Ÿ���� �Ͽ� (������ Ű�� �����ų� ���� �Ʒ� ������ ��ư�� ���� �� �� �ֽ��ϴ�).
__�ý��� ȯ�� ���� ����__ �ɼ��� �ҷ��ɴϴ�.
<p align="center">

12
docs/localized/KR/docs/Installation.md


</p>
## Windows 사용자
Windows에서 환경을 설정하기 위해, [세부 사항](Installation-Windows.md)에 설정 방법에 대해 작성하였습니다.
Windows에서 환경을 설정하기 위해, [세부 사항](Installation-Windows.md)에 설정 방법에 대해 작성하였습니다.
Mac과 Linux는 다음 가이드를 확인해주십시오.
## Mac 또는 Unix 사용자

pip3 install mlagents
```
이 명령어를 통해 PyPi로 부터(복제된 저장소가 아닌) `ml-agents`가 설치될 것입니다.
이 명령어를 통해 PyPi로 부터(복제된 저장소가 아닌) `ml-agents`가 설치될 것입니다.
명령어를 실행하면 유니티 로고와 `mlagents-learn`에서 사용할 수 있는 명령어 라인 매개변수들을 볼 수 있습니다.
명령어를 실행하면 유니티 로고와 `mlagents-learn`에서 사용할 수 있는 명령어 라인 매개변수들을 볼 수 있습니다.
- 만약 Anaconda를 사용하고 TensorFlow에 문제가 있다면, 다음
- 만약 Anaconda를 사용하고 TensorFlow에 문제가 있다면, 다음
[링크](https://www.tensorflow.org/install/pip)에서 Anaconda 환경에서 어떻게 TensorFlow를 설치하는지 확인하십시오.
### 개발을 위한 설치방법

`-e` 플래그를 사용하여 pip를 실행 하면 파이썬 파일을 직접 변경할 수 있고 `mlagents-learn`를 실행할 때 반영됩니다.
`mlagents` 패키지가 `mlagents_envs`에 의존적이고, 다른 순서로 설치하면 PyPi로 부터 `mlagents_envs`
설치할 수 있기 때문에 이 순서대로 패키지를 설치하는 것은 중요합니다.
설치할 수 있기 때문에 이 순서대로 패키지를 설치하는 것은 중요합니다.
## 도커 기반 설치

[기초 가이드](Basic-Guide.md) 페이지에는 유니티 내에서 ML-Agents toolkit의 설정 및 학습된 모델 실행,
[기초 가이드](Basic-Guide.md) 페이지에는 유니티 내에서 ML-Agents toolkit의 설정 및 학습된 모델 실행,
환경 구축, 학습 방법에 대한 여러 짧은 튜토리얼을 포함하고 있습니다.
## 도움말

34
docs/localized/KR/docs/Training-Imitation-Learning.md


유니티 에디터를 이용하여 에이전트의 플레이를 기록하고 에셋으로 저장하는 것이 가능합니다. 이런 플레이 데이터에는 기록을 진행하는 동안의 관측, 행동 그리고 보상 정보가 포함됩니다. 이것들은 데이터를 통해 관리가 가능하며 Behavioral Cloning과 같은 오프라인 학습에 사용될 수 있습니다. (아래 내용 참고)
에이전트의 플레이 데이터를 기록하기 위해서는 씬(Scene)에서 `Agent` 컴포넌트를 포함하고 있는 GameObject에 `Demonstration Recorder` 컴포넌트를 추가해주어야 합니다. 일단 추가되고나면 에이전트로부터 플레이 데이터를 기록할 수 있게 됩니다.
에이전트의 플레이 데이터를 기록하기 위해서는 씬(Scene)에서 `Agent` 컴포넌트를 포함하고 있는 GameObject에 `Demonstration Recorder` 컴포넌트를 추가해주어야 합니다. 일단 추가되고나면 에이전트로부터 플레이 데이터를 기록할 수 있게 됩니다.
<p align="center">
<img src="images/demo_component.png"

`Record`가 체크되는 경우 씬이 실행되면 데이터가 생성됩니다. 환경의 난이도에 따라 모방학습에 사용하기 위해 몇분에서 몇시간 정도 플레이 데이터를 수집해야합니다. 충분한 데이터가 기록되었으면 유니티 상에서 게임의 실행을 정지합니다. 그렇게 하면 `.demo` 파일이 `Assets/Demonstations` 폴더 내부에 생성됩니다. 이 파일에는 에이전트의 플레이 데이터가 저장되어 있습니다. 이 파일을 클릭하면 인스펙터 상에 데모 파일에 대한 정보를 아래와 같이 알려줍니다.
`Record`가 체크되는 경우 씬이 실행되면 데이터가 생성됩니다. 환경의 난이도에 따라 모방학습에 사용하기 위해 몇분에서 몇시간 정도 플레이 데이터를 수집해야합니다. 충분한 데이터가 기록되었으면 유니티 상에서 게임의 실행을 정지합니다. 그렇게 하면 `.demo` 파일이 `Assets/Demonstations` 폴더 내부에 생성됩니다. 이 파일에는 에이전트의 플레이 데이터가 저장되어 있습니다. 이 파일을 클릭하면 인스펙터 상에 데모 파일에 대한 정보를 아래와 같이 알려줍니다.
<p align="center">
<img src="images/demo_inspector.png"

## Behavioral Cloning을 통한 학습
모방학습을 위한 다양한 알고리즘이 존재하며 모방학습 알고리즘 중 가장 간단한 알고리즘이 Behavioral Cloning 입니다. 이 알고리즘은 마치 이미지 분류를 위한 지도학습 (Supervised Learning)이나 기타 고전적인 머신러닝 기법들처럼 전문가의 플레이로부터 수집된 데이터를 직접적으로 모방하도록 정책 (Policy)을 학습합니다.
모방학습을 위한 다양한 알고리즘이 존재하며 모방학습 알고리즘 중 가장 간단한 알고리즘이 Behavioral Cloning 입니다. 이 알고리즘은 마치 이미지 분류를 위한 지도학습 (Supervised Learning)이나 기타 고전적인 머신러닝 기법들처럼 전문가의 플레이로부터 수집된 데이터를 직접적으로 모방하도록 정책 (Policy)을 학습합니다.
오프라인 Behavioral Cloning에서 우리는 에이전트의 행동을 학습하기 위해 `Demonstration Recorder`를 통해 생성된 `demo` 파일을 데이터 셋으로 이용합니다.
오프라인 Behavioral Cloning에서 우리는 에이전트의 행동을 학습하기 위해 `Demonstration Recorder`를 통해 생성된 `demo` 파일을 데이터 셋으로 이용합니다.
2. `Demonstration Recorder`를 이용하여 전문가의 플레이를 기록합니다. (위의 내용 참고)
앞으로 설명을 위해 이 기록된 파일의 이름을 `AgentRecording.demo`라고 하겠습니다.
2. `Demonstration Recorder`를 이용하여 전문가의 플레이를 기록합니다. (위의 내용 참고)
앞으로 설명을 위해 이 기록된 파일의 이름을 `AgentRecording.demo`라고 하겠습니다.
4. `config/offline_bc_config.yaml` 파일을 열어줍니다.
4. `config/offline_bc_config.yaml` 파일을 열어줍니다.
위 방법은 데모 파일을 이용하여 에이전트가 직접적으로 전문가의 행동을 따라하도록 인공신경망을 학습하는 기법입니다. 환경은 학습이 진행되는 동안 에이전트의 성능을 평가하기 위해 실행되며 사용될 것입니다.
위 방법은 데모 파일을 이용하여 에이전트가 직접적으로 전문가의 행동을 따라하도록 인공신경망을 학습하는 기법입니다. 환경은 학습이 진행되는 동안 에이전트의 성능을 평가하기 위해 실행되며 사용될 것입니다.
### 온라인 학습

1. 먼저 두개의 브레인들을 생성합니다. 하나는 "선생님"이 될 것이고 하나는 "학생"이 될 것입니다. 이번 예시에서는 두개의 브레인 에셋의 이름을 각각 "Teacher"와 "Student"로 설정할 것입니다.
2. "Teacher" 브레인은 반드시 **플레이어 브레인 (Player Brain)**이어야 합니다.
1. 먼저 두개의 브레인들을 생성합니다. 하나는 "선생님"이 될 것이고 하나는 "학생"이 될 것입니다. 이번 예시에서는 두개의 브레인 에셋의 이름을 각각 "Teacher"와 "Student"로 설정할 것입니다.
2. "Teacher" 브레인은 반드시 **플레이어 브레인 (Player Brain)**이어야 합니다.
4. "Teacher" 브레인과 "Student" 브레인의 파라미터는 에이전트에서 설정한대로 동일하게 설정되어야 합니다.
4. "Teacher" 브레인과 "Student" 브레인의 파라미터는 에이전트에서 설정한대로 동일하게 설정되어야 합니다.