浏览代码

Merge branch 'master' into develop-action-spec

/develop/action-spec-gym
GitHub 4 年前
当前提交
23800f33
共有 45 个文件被更改,包括 452 次插入335 次删除
  1. 2
      .github/ISSUE_TEMPLATE/bug_report.md
  2. 11
      .yamato/test_versions.metafile
  3. 2
      .yamato/training-int-tests.yml
  4. 2
      README.md
  5. 7
      com.unity.ml-agents/CHANGELOG.md
  6. 2
      com.unity.ml-agents/package.json
  7. 2
      docs/Background-Machine-Learning.md
  8. 10
      docs/Getting-Started.md
  9. 24
      docs/Installation.md
  10. 4
      docs/Learning-Environment-Executable.md
  11. 8
      docs/ML-Agents-Overview.md
  12. 2
      docs/Readme.md
  13. 2
      docs/Training-Configuration-File.md
  14. 35
      docs/Training-ML-Agents.md
  15. 2
      docs/Training-on-Amazon-Web-Service.md
  16. 5
      docs/Unity-Inference-Engine.md
  17. 1
      ml-agents/mlagents/tf_utils/__init__.py
  18. 63
      ml-agents/mlagents/tf_utils/tf.py
  19. 1
      ml-agents/mlagents/torch_utils/__init__.py
  20. 66
      ml-agents/mlagents/torch_utils/torch.py
  21. 11
      ml-agents/mlagents/trainers/cli_utils.py
  22. 16
      ml-agents/mlagents/trainers/ghost/trainer.py
  23. 6
      ml-agents/mlagents/trainers/learn.py
  24. 15
      ml-agents/mlagents/trainers/policy/torch_policy.py
  25. 4
      ml-agents/mlagents/trainers/ppo/optimizer_torch.py
  26. 39
      ml-agents/mlagents/trainers/ppo/trainer.py
  27. 61
      ml-agents/mlagents/trainers/sac/trainer.py
  28. 2
      ml-agents/mlagents/trainers/settings.py
  29. 86
      ml-agents/mlagents/trainers/stats.py
  30. 44
      ml-agents/mlagents/trainers/tests/tensorflow/test_ghost.py
  31. 20
      ml-agents/mlagents/trainers/tests/test_stats.py
  32. 44
      ml-agents/mlagents/trainers/tests/torch/test_ghost.py
  33. 1
      ml-agents/mlagents/trainers/tests/torch/test_policy.py
  34. 3
      ml-agents/mlagents/trainers/torch/distributions.py
  35. 34
      ml-agents/mlagents/trainers/trainer/rl_trainer.py
  36. 18
      ml-agents/mlagents/trainers/trainer/trainer_factory.py
  37. 10
      ml-agents/mlagents/trainers/trainer_controller.py
  38. 12
      ml-agents/mlagents/trainers/training_status.py
  39. 9
      ml-agents/setup.py
  40. 21
      ml-agents/tests/yamato/training_int_tests.py
  41. 5
      ml-agents/tests/yamato/yamato_utils.py
  42. 1
      test_constraints_min_version.txt
  43. 4
      test_requirements.txt
  44. 35
      docs/Background-PyTorch.md
  45. 35
      docs/Background-TensorFlow.md

2
.github/ISSUE_TEMPLATE/bug_report.md


- Unity Version: [e.g. Unity 2020.1f1]
- OS + version: [e.g. Windows 10]
- _ML-Agents version_: (e.g. ML-Agents v0.8, or latest `develop` branch from source)
- _TensorFlow version_: (you can run `pip3 show tensorflow` to get this)
- _Torch version_: (you can run `pip3 show torch` to get this)
- _Environment_: (which example environment you used to reproduce the error)
**NOTE:** We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.

11
.yamato/test_versions.metafile


# List of editor versions for standalone-build-test and its dependencies.
# csharp_backcompat_version is used in training-int-tests to determine the
# older package version to run the backwards compat tests against.
csharp_backcompat_version: 1.0.0
csharp_backcompat_version: 1.0.0
# Waiting on a barracuda fix, see https://jira.unity3d.com/browse/MLA-1464
# - version: 2020.2
csharp_backcompat_version: 1.0.0
- version: 2020.2
# 2020.2 moved the AssetImporters namespace
# but we didn't handle this until 1.2.0
csharp_backcompat_version: 1.2.0

2
.yamato/training-int-tests.yml


# If we make a breaking change to the communication protocol, these will need
# to be disabled until the next release.
- python -u -m ml-agents.tests.yamato.training_int_tests --python=0.16.0
- python -u -m ml-agents.tests.yamato.training_int_tests --csharp=1.0.0
- python -u -m ml-agents.tests.yamato.training_int_tests --csharp={{ editor.csharp_backcompat_version }}
dependencies:
- .yamato/standalone-build-test.yml#test_mac_standalone_{{ editor.version }}
triggers:

2
README.md


project that enables games and simulations to serve as environments for
training intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through a
simple-to-use Python API. We also provide implementations (based on TensorFlow)
simple-to-use Python API. We also provide implementations (based on PyTorch)
of state-of-the-art algorithms to enable game developers and hobbyists to easily
train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be
used for multiple purposes, including controlling NPC behavior (in a variety of

7
com.unity.ml-agents/CHANGELOG.md


### Major Changes
#### com.unity.ml-agents (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- PyTorch trainers are now the default. See the
[installation docs](https://github.com/Unity-Technologies/ml-agents/blob/mastere/docs/Installation.md) for
more information on installing PyTorch. For the time being, TensorFlow is still available;
you can use the TensorFlow backend by adding `--tensorflow` to the CLI, or
adding `framework: tensorflow` in the configuration YAML. (#4517)
- The Barracuda dependency was upgraded to 1.1.2 (#4571)
#### ml-agents / ml-agents-envs / gym-unity (Python)
### Bug Fixes

Previously, this would result in an infinite loop and cause the editor to hang. (#4573)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- Fixed an issue where runs could not be resumed when using TensorFlow and Ghost Training. (#4593)
## [1.5.0-preview] - 2020-10-14

2
com.unity.ml-agents/package.json


"unity": "2018.4",
"description": "Use state-of-the-art machine learning to create intelligent character behaviors in any Unity environment (games, robotics, film, etc.).",
"dependencies": {
"com.unity.barracuda": "1.1.1-preview",
"com.unity.barracuda": "1.1.2-preview",
"com.unity.modules.imageconversion": "1.0.0",
"com.unity.modules.jsonserialize": "1.0.0",
"com.unity.modules.physics": "1.0.0",

2
docs/Background-Machine-Learning.md


one where the number of observations an agent perceives and the number of
actions they can take are large). Many of the algorithms we provide in ML-Agents
use some form of deep learning, built on top of the open-source library,
[TensorFlow](Background-TensorFlow.md).
[PyTorch](Background-PyTorch.md).

10
docs/Getting-Started.md


## Running a pre-trained model
We include pre-trained models for our agents (`.nn` files) and we use the
We include pre-trained models for our agents (`.onnx` files) and we use the
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models inside
Unity. In this section, we will use the pre-trained model for the 3D Ball
example.

## Training a new model with Reinforcement Learning
While we provide pre-trained `.nn` files for the agents in this environment, any
While we provide pre-trained models for the agents in this environment, any
environment you make yourself will require training agents from scratch to
generate a new model file. In this section we will demonstrate how to use the
reinforcement learning algorithms that are part of the ML-Agents Python package

use it with compatible Agents (the Agents that generated the model). **Note:**
Do not just close the Unity Window once the `Saved Model` message appears.
Either wait for the training process to close the window or press `Ctrl+C` at
the command-line prompt. If you close the window manually, the `.nn` file
the command-line prompt. If you close the window manually, the `.onnx` file
containing the trained model is not exported into the ml-agents folder.
If you've quit the training early using `Ctrl+C` and want to resume training,

mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun --resume
```
Your trained model will be at `results/<run-identifier>/<behavior_name>.nn` where
Your trained model will be at `results/<run-identifier>/<behavior_name>.onnx` where
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding
to the model. This file corresponds to your model's latest checkpoint. You can
now embed this trained model into your Agents by following the steps below,

`Project/Assets/ML-Agents/Examples/3DBall/TFModels/`.
1. Open the Unity Editor, and select the **3DBall** scene as described above.
1. Select the **3DBall** prefab Agent object.
1. Drag the `<behavior_name>.nn` file from the Project window of the Editor to
1. Drag the `<behavior_name>.onnx` file from the Project window of the Editor to
the **Model** placeholder in the **Ball3DAgent** inspector window.
1. Press the **Play** button at the top of the Editor.

24
docs/Installation.md


[instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
on installing it.
Although we do not provide support for Anaconda installation on Windows, the
previous
[Windows Anaconda Installation (Deprecated) guide](Installation-Anaconda-Windows.md)
is still available.
### Clone the ML-Agents Toolkit Repository (Optional)
Now that you have installed Unity and Python, you can now install the Unity and

dependencies for each project and are supported on Mac / Windows / Linux. We
offer a dedicated [guide on Virtual Environments](Using-Virtual-Environment.md).
#### (Windows) Installing PyTorch
On Windows, you'll have to install the PyTorch package separately prior to
installing ML-Agents. Activate your virtual environment and run from the command line:
```sh
pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html
```
Note that on Windows, you may also need Microsoft's
[Visual C++ Redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads)
if you don't have it already. See the [PyTorch installation guide](https://pytorch.org/get-started/locally/)
for more installation options and versions.
#### Installing `mlagents`
To install the `mlagents` Python package, activate your virtual environment and
run from the command line:

By installing the `mlagents` package, the dependencies listed in the
[setup.py file](../ml-agents/setup.py) are also installed. These include
[TensorFlow](Background-TensorFlow.md) (Requires a CPU w/ AVX support).
[PyTorch](Background-PyTorch.md) (Requires a CPU w/ AVX support).
#### Advanced: Local Installation for Development

the repository's root directory, run:
```sh
pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html
pip3 install -e ./ml-agents-envs
pip3 install -e ./ml-agents
```

4
docs/Learning-Environment-Executable.md


```
You can press Ctrl+C to stop the training, and your trained model will be at
`results/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
`results/<run-identifier>/<behavior_name>.onnx`, which corresponds to your model's
latest checkpoint. (**Note:** There is a known bug on Windows that causes the
saving of the model to fail when you early terminate the training, it's
recommended to wait until Step has reached the max_steps parameter you set in

`Project/Assets/ML-Agents/Examples/3DBall/TFModels/`.
1. Open the Unity Editor, and select the **3DBall** scene as described above.
1. Select the **3DBall** prefab from the Project window and select **Agent**.
1. Drag the `<behavior_name>.nn` file from the Project window of the Editor to
1. Drag the `<behavior_name>.onnx` file from the Project window of the Editor to
the **Model** placeholder in the **Ball3DAgent** inspector window.
1. Press the **Play** button at the top of the Editor.

8
docs/ML-Agents-Overview.md


for training intelligent agents. Agents can be trained using reinforcement
learning, imitation learning, neuroevolution, or other machine learning methods
through a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers and
PyTorch) of state-of-the-art algorithms to enable game developers and
hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games. These
trained agents can be used for multiple purposes, including controlling NPC
behavior (in a variety of settings such as multi-agent and adversarial),

that include overviews and helpful resources on the
[Unity Engine](Background-Unity.md),
[machine learning](Background-Machine-Learning.md) and
[TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing the
[PyTorch](Background-PyTorch.md). We **strongly** recommend browsing the
machine learning concepts or have not previously heard of TensorFlow.
machine learning concepts or have not previously heard of PyTorch.
The remainder of this page contains a deep dive into ML-Agents, its key
components, different training modes and scenarios. By the end of it, you should

### Custom Training and Inference
In the previous mode, the Agents were used for training to generate a TensorFlow
In the previous mode, the Agents were used for training to generate a PyTorch
model that the Agents can later use. However, any user of the ML-Agents Toolkit
can leverage their own algorithms for training. In this case, the behaviors of
all the Agents in the scene will be controlled within Python. You can even turn

2
docs/Readme.md


- [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
- [Background: Unity](Background-Unity.md)
- [Background: Machine Learning](Background-Machine-Learning.md)
- [Background: TensorFlow](Background-TensorFlow.md)
- [Background: PyTorch](Background-PyTorch.md)
- [Example Environments](Learning-Environment-Examples.md)
## Creating Learning Environments

2
docs/Training-Configuration-File.md


| `time_horizon` | (default = `64`) How many steps of experience to collect per-agent before adding it to the experience buffer. When this limit is reached before the end of an episode, a value estimate is used to predict the overall expected reward from the agent's current state. As such, this parameter trades off between a less biased, but higher variance estimate (long time horizon) and more biased, but less varied estimate (short time horizon). In cases where there are frequent rewards within an episode, or episodes are prohibitively large, a smaller number can be more ideal. This number should be large enough to capture all the important behavior within a sequence of an agent's actions. <br><br> Typical range: `32` - `2048` |
| `max_steps` | (default = `500000`) Total number of steps (i.e., observation collected and action taken) that must be taken in the environment (or across all environments if using multiple in parallel) before ending the training process. If you have multiple agents with the same behavior name within your environment, all steps taken by those agents will contribute to the same `max_steps` count. <br><br>Typical range: `5e5` - `1e7` |
| `keep_checkpoints` | (default = `5`) The maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the checkpoint_interval option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. |
| `checkpoint_interval` | (default = `500000`) The number of experiences collected between each checkpoint by the trainer. A maximum of `keep_checkpoints` checkpoints are saved before old ones are deleted. Each checkpoint saves the `.nn` (and `.onnx` if applicable) files in `results/` folder.|
| `checkpoint_interval` | (default = `500000`) The number of experiences collected between each checkpoint by the trainer. A maximum of `keep_checkpoints` checkpoints are saved before old ones are deleted. Each checkpoint saves the `.onnx` (and `.nn` if using TensorFlow) files in `results/` folder.|
| `init_path` | (default = None) Initialize trainer from a previously saved model. Note that the prior run should have used the same trainer configurations as the current run, and have been saved with the same version of ML-Agents. <br><br>You should provide the full path to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`. This option is provided in case you want to initialize different behaviors from different runs; in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize all models from the same run. |
| `threaded` | (default = `true`) By default, model updates can happen while the environment is being stepped. This violates the [on-policy](https://spinningup.openai.com/en/latest/user/algorithms.html#the-on-policy-algorithms) assumption of PPO slightly in exchange for a training speedup. To maintain the strict on-policyness of PPO, you can disable parallel updates by setting `threaded` to `false`. There is usually no reason to turn `threaded` off for SAC. |
| `hyperparameters -> learning_rate` | (default = `3e-4`) Initial learning rate for gradient descent. Corresponds to the strength of each gradient descent update step. This should typically be decreased if training is unstable, and the reward does not consistently increase. <br><br>Typical range: `1e-5` - `1e-3` |

35
docs/Training-ML-Agents.md


- [Curriculum Learning](#curriculum)
- [Training with a Curriculum](#training-with-a-curriculum)
- [Training Using Concurrent Unity Instances](#training-using-concurrent-unity-instances)
- [Using PyTorch (Experimental)](#using-pytorch-experimental)
For a broad overview of reinforcement learning, imitation learning and all the
training scenarios, methods and options within the ML-Agents Toolkit, see

values. See [Using TensorBoard](Using-Tensorboard.md) for more details on how
to visualize the training metrics.
1. Models: these contain the model checkpoints that
are updated throughout training and the final model file (`.nn`). This final
are updated throughout training and the final model file (`.onnx`). This final
model file is generated once either when training completes or is
interrupted.
1. Timers file (under `results/<run-identifier>/run_logs`): this contains aggregated

- **Result Variation Using Concurrent Unity Instances** - If you keep all the
hyperparameters the same, but change `--num-envs=<n>`, the results and model
would likely change.
### Using PyTorch (Experimental)
ML-Agents, by default, uses TensorFlow as its backend, but experimental support
for PyTorch has been added. To use PyTorch, the `torch` Python package must
be installed, and PyTorch must be enabled for your trainer.
#### Installing PyTorch
If you've already installed ML-Agents, follow the
[official PyTorch install instructions](https://pytorch.org/get-started/locally/) for
your platform and configuration. Note that on Windows, you may also need Microsoft's
[Visual C++ Redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) if you don't have it already.
If you're installing or upgrading ML-Agents on Linux or Mac, you can also run
`pip3 install mlagents[torch]` instead of `pip3 install mlagents`
during [installation](Installation.md). On Windows, install ML-Agents first and then
separately install PyTorch.
#### Enabling PyTorch
PyTorch can be enabled in one of two ways. First, by adding `--torch` to the
`mlagents-learn` command. This will make all behaviors train with PyTorch.
Second, by changing the `framework` option for your agent behavior in the
configuration YAML as below. This will use PyTorch just for that behavior.
```yaml
behaviors:
YourAgentBehavior:
framework: pytorch
```

2
docs/Training-on-Amazon-Web-Service.md


# Download and install the latest Nvidia driver for ubuntu
# Please refer to http://download.nvidia.com/XFree86/Linux-#x86_64/latest.txt
$ wget http://download.nvidia.com/XFree86/Linux-x86_64/390.87/NVIDIA-Linux-x86_64-390.87.run
$ sudo /bin/bash ./NVIDIA-Linux-x86_64-390.67.run --accept-license --no-questions --ui=none
$ sudo /bin/bash ./NVIDIA-Linux-x86_64-390.87.run --accept-license --no-questions --ui=none
# Disable Nouveau as it will clash with the Nvidia driver
$ sudo echo 'blacklist nouveau' | sudo tee -a /etc/modprobe.d/blacklist.conf

5
docs/Unity-Inference-Engine.md


[industry-standard open format](https://onnx.ai/about.html) produced by the
[tf2onnx package](https://github.com/onnx/tensorflow-onnx).
Export to ONNX is currently considered beta. To enable it, make sure
`tf2onnx>=1.5.5` is installed in pip. tf2onnx does not currently support
tensorflow 2.0.0 or later, or earlier than 1.12.0.
Export to ONNX is used if using PyTorch (the default). To enable it
while using TensorFlow, make sure `tf2onnx>=1.6.1` is installed in pip.
## Using the Unity Inference Engine

1
ml-agents/mlagents/tf_utils/__init__.py


from mlagents.tf_utils.tf import tf as tf # noqa
from mlagents.tf_utils.tf import set_warnings_enabled # noqa
from mlagents.tf_utils.tf import generate_session_config # noqa
from mlagents.tf_utils.tf import is_available # noqa

63
ml-agents/mlagents/tf_utils/tf.py


# This should be the only place that we import tensorflow directly.
# Everywhere else is caught by the banned-modules setting for flake8
import tensorflow as tf # noqa I201
try:
import tensorflow as tf # noqa I201
# LooseVersion handles things "1.2.3a" or "4.5.6-rc7" fairly sensibly.
_is_tensorflow2 = LooseVersion(tf.__version__) >= LooseVersion("2.0.0")
# LooseVersion handles things "1.2.3a" or "4.5.6-rc7" fairly sensibly.
_is_tensorflow2 = LooseVersion(tf.__version__) >= LooseVersion("2.0.0")
if _is_tensorflow2:
import tensorflow.compat.v1 as tf
if _is_tensorflow2:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
tf_logging = tf.logging
else:
try:
# Newer versions of tf 1.x will complain that tf.logging is deprecated
tf_logging = tf.compat.v1.logging
except AttributeError:
# Fall back to the safe import, even if it might generate a warning or two.
tf.disable_v2_behavior()
else:
try:
# Newer versions of tf 1.x will complain that tf.logging is deprecated
tf_logging = tf.compat.v1.logging
except AttributeError:
# Fall back to the safe import, even if it might generate a warning or two.
tf_logging = tf.logging
except ImportError:
tf = None
def is_available():
"""
Returns whether Torch is available in this Python environment
"""
return tf is not None
def set_warnings_enabled(is_enabled: bool) -> None:

"""
level = tf_logging.WARN if is_enabled else tf_logging.ERROR
tf_logging.set_verbosity(level)
if is_available():
level = tf_logging.WARN if is_enabled else tf_logging.ERROR
tf_logging.set_verbosity(level)
def generate_session_config() -> tf.ConfigProto:
def generate_session_config() -> "tf.ConfigProto":
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
# For multi-GPU training, set allow_soft_placement to True to allow
# placing the operation into an alternative device automatically
# to prevent from exceptions if the device doesn't suppport the operation
# or the device does not exist
config.allow_soft_placement = True
return config
if is_available():
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
# For multi-GPU training, set allow_soft_placement to True to allow
# placing the operation into an alternative device automatically
# to prevent from exceptions if the device doesn't suppport the operation
# or the device does not exist
config.allow_soft_placement = True
return config
else:
return None

1
ml-agents/mlagents/torch_utils/__init__.py


from mlagents.torch_utils.torch import torch as torch # noqa
from mlagents.torch_utils.torch import nn # noqa
from mlagents.torch_utils.torch import is_available # noqa
from mlagents.torch_utils.torch import default_device # noqa

66
ml-agents/mlagents/torch_utils/torch.py


import os
from distutils.version import LooseVersion
import pkg_resources
# Detect availability of torch package here.
# NOTE: this try/except is temporary until torch is required for ML-Agents.
try:
# This should be the only place that we import torch directly.
# Everywhere else is caught by the banned-modules setting for flake8
import torch # noqa I201
torch.set_num_threads(cpu_utils.get_num_threads_to_use())
os.environ["KMP_BLOCKTIME"] = "0"
def assert_torch_installed():
# Check that torch version 1.6.0 or later has been installed. If not, refer
# user to the PyTorch webpage for install instructions.
torch_pkg = None
try:
torch_pkg = pkg_resources.get_distribution("torch")
except pkg_resources.DistributionNotFound:
pass
assert torch_pkg is not None and LooseVersion(torch_pkg.version) >= LooseVersion(
"1.6.0"
), (
"A compatible version of PyTorch was not installed. Please visit the PyTorch homepage "
+ "(https://pytorch.org/get-started/locally/) and follow the instructions to install. "
+ "Version 1.6.0 and later are supported."
)
# Known PyLint compatibility with PyTorch https://github.com/pytorch/pytorch/issues/701
# pylint: disable=E1101
if torch.cuda.is_available():
torch.set_default_tensor_type(torch.cuda.FloatTensor)
device = torch.device("cuda")
else:
torch.set_default_tensor_type(torch.FloatTensor)
device = torch.device("cpu")
nn = torch.nn
# pylint: disable=E1101
except ImportError:
torch = None
nn = None
device = None
assert_torch_installed()
# This should be the only place that we import torch directly.
# Everywhere else is caught by the banned-modules setting for flake8
import torch # noqa I201
torch.set_num_threads(cpu_utils.get_num_threads_to_use())
os.environ["KMP_BLOCKTIME"] = "0"
# Known PyLint compatibility with PyTorch https://github.com/pytorch/pytorch/issues/701
# pylint: disable=E1101
if torch.cuda.is_available():
torch.set_default_tensor_type(torch.cuda.FloatTensor)
device = torch.device("cuda")
else:
torch.set_default_tensor_type(torch.FloatTensor)
device = torch.device("cpu")
nn = torch.nn
def is_available():
"""
Returns whether Torch is available in this Python environment
"""
return torch is not None

11
ml-agents/mlagents/trainers/cli_utils.py


"--torch",
default=False,
action=DetectDefaultStoreTrue,
help="(Experimental) Use the PyTorch framework instead of TensorFlow. Install PyTorch "
"before using this option",
help="Use the PyTorch framework. Note that this option is not required anymore as PyTorch is the"
"default framework, and will be removed in the next release.",
)
argparser.add_argument(
"--tensorflow",
default=False,
action=DetectDefaultStoreTrue,
help="(Deprecated) Use the TensorFlow framework instead of PyTorch. Install TensorFlow "
"before using this option.",
)
eng_conf = argparser.add_argument_group(title="Engine Configuration")

16
ml-agents/mlagents/trainers/ghost/trainer.py


@property
def reward_buffer(self) -> Deque[float]:
"""
Returns the reward buffer. The reward buffer contains the cumulative
rewards of the most recent episodes completed by agents using this
trainer.
:return: the reward buffer.
"""
Returns the reward buffer. The reward buffer contains the cumulative
rewards of the most recent episodes completed by agents using this
trainer.
:return: the reward buffer.
"""
return self.trainer.reward_buffer
@property

policy = self.trainer.create_policy(
parsed_behavior_id, behavior_spec, create_graph=True
)
self.trainer.model_saver.initialize_or_load(policy)
team_id = parsed_behavior_id.team_id
self.controller.subscribe_team_id(team_id, self)

self._save_snapshot() # Need to save after trainer initializes policy
self._learning_team = self.controller.get_learning_team
self.wrapped_trainer_team = team_id
else:
# Load the weights of the ghost policy from the wrapped one
policy.load_weights(
self.trainer.get_policy(parsed_behavior_id).get_weights()
)
return policy
def add_policy(

6
ml-agents/mlagents/trainers/learn.py


# # Unity ML-Agents Toolkit
from mlagents import torch_utils
import yaml
import os

ml-agents: {mlagents.trainers.__version__},
ml-agents-envs: {mlagents_envs.__version__},
Communicator API: {UnityEnvironment.API_VERSION},
TensorFlow: {tf_utils.tf.__version__}"""
PyTorch: {torch_utils.torch.__version__}"""
def parse_command_line(argv: Optional[List[str]] = None) -> RunOptions:

init_path=maybe_init_path,
multi_gpu=False,
force_torch="torch" in DetectDefault.non_default_args,
force_tensorflow="tensorflow" in DetectDefault.non_default_args,
)
# Create controller and begin training.
tc = TrainerController(

add_timer_metadata("mlagents_version", mlagents.trainers.__version__)
add_timer_metadata("mlagents_envs_version", mlagents_envs.__version__)
add_timer_metadata("communication_protocol_version", UnityEnvironment.API_VERSION)
add_timer_metadata("tensorflow_version", tf_utils.tf.__version__)
add_timer_metadata("pytorch_version", torch_utils.torch.__version__)
add_timer_metadata("numpy_version", np.__version__)
if options.env_settings.seed == -1:

15
ml-agents/mlagents/trainers/policy/torch_policy.py


actions = actions[:, :, 0]
else:
actions = actions[:, 0, :]
return (actions, all_logs if all_log_probs else log_probs, entropies, memories)
# Use the sum of entropy across actions, not the mean
entropy_sum = torch.sum(entropies, dim=1)
return (
actions,
all_logs if all_log_probs else log_probs,
entropy_sum,
memories,
)
def evaluate_actions(
self,

)
action_list = [actions[..., i] for i in range(actions.shape[-1])]
log_probs, entropies, _ = ModelUtils.get_probs_and_entropy(action_list, dists)
return log_probs, entropies, value_heads
# Use the sum of entropy across actions, not the mean
entropy_sum = torch.sum(entropies, dim=1)
return log_probs, entropy_sum, value_heads
@timed
def evaluate(

4
ml-agents/mlagents/trainers/ppo/optimizer_torch.py


self.optimizer.step()
update_stats = {
"Losses/Policy Loss": policy_loss.item(),
# NOTE: abs() is not technically correct, but matches the behavior in TensorFlow.
# TODO: After PyTorch is default, change to something more correct.
"Losses/Policy Loss": torch.abs(policy_loss).item(),
"Losses/Value Loss": value_loss.item(),
"Policy/Learning Rate": decay_lr,
"Policy/Epsilon": decay_eps,

39
ml-agents/mlagents/trainers/ppo/trainer.py


from mlagents_envs.base_env import BehaviorSpec
from mlagents.trainers.trainer.rl_trainer import RLTrainer
from mlagents.trainers.policy import Policy
from mlagents.trainers.policy.tf_policy import TFPolicy
from mlagents.trainers.ppo.optimizer_tf import PPOOptimizer
from mlagents.trainers.policy.torch_policy import TorchPolicy
from mlagents.trainers.ppo.optimizer_torch import TorchPPOOptimizer
from mlagents.trainers.tf.components.reward_signals import RewardSignal
from mlagents import torch_utils
from mlagents.trainers.torch.components.reward_providers.base_reward_provider import (
BaseRewardProvider,
)
from mlagents import tf_utils
if torch_utils.is_available():
from mlagents.trainers.policy.torch_policy import TorchPolicy
from mlagents.trainers.ppo.optimizer_torch import TorchPPOOptimizer
if tf_utils.is_available():
from mlagents.trainers.policy.tf_policy import TFPolicy
from mlagents.trainers.ppo.optimizer_tf import PPOOptimizer
TorchPolicy = None # type: ignore
TorchPPOOptimizer = None # type: ignore
TFPolicy = None # type: ignore
PPOOptimizer = None # type: ignore
logger = get_logger(__name__)

for name, v in value_estimates.items():
agent_buffer_trajectory[f"{name}_value_estimates"].extend(v)
if isinstance(self.optimizer.reward_signals[name], RewardSignal):
if isinstance(self.optimizer.reward_signals[name], BaseRewardProvider):
self.optimizer.reward_signals[name].value_name, np.mean(v)
f"Policy/{self.optimizer.reward_signals[name].name.capitalize()} Value Estimate",
np.mean(v),
f"Policy/{self.optimizer.reward_signals[name].name.capitalize()} Value Estimate",
np.mean(v),
self.optimizer.reward_signals[name].value_name, np.mean(v)
)
# Evaluate all reward functions

for name, reward_signal in self.optimizer.reward_signals.items():
if isinstance(reward_signal, RewardSignal):
evaluate_result = reward_signal.evaluate_batch(
agent_buffer_trajectory
).scaled_reward
else:
# BaseRewardProvider is a PyTorch-based reward signal
if isinstance(reward_signal, BaseRewardProvider):
else: # reward_signal is a TensorFlow-based RewardSignal class
evaluate_result = reward_signal.evaluate_batch(
agent_buffer_trajectory
).scaled_reward
agent_buffer_trajectory[f"{name}_rewards"].extend(evaluate_result)
# Report the reward signals
self.collected_rewards[name][agent_id] += np.sum(evaluate_result)

61
ml-agents/mlagents/trainers/sac/trainer.py


from mlagents_envs.logging_util import get_logger
from mlagents_envs.timers import timed
from mlagents_envs.base_env import BehaviorSpec
from mlagents.trainers.policy.tf_policy import TFPolicy
from mlagents.trainers.sac.optimizer_tf import SACOptimizer
from mlagents.trainers.policy.torch_policy import TorchPolicy
from mlagents.trainers.sac.optimizer_torch import TorchSACOptimizer
from mlagents.trainers.tf.components.reward_signals import RewardSignal
from mlagents import torch_utils
from mlagents.trainers.torch.components.reward_providers import BaseRewardProvider
from mlagents import tf_utils
if torch_utils.is_available():
from mlagents.trainers.policy.torch_policy import TorchPolicy
from mlagents.trainers.sac.optimizer_torch import TorchSACOptimizer
if tf_utils.is_available():
from mlagents.trainers.policy.tf_policy import TFPolicy
from mlagents.trainers.sac.optimizer_tf import SACOptimizer
TorchPolicy = None # type: ignore
TorchSACOptimizer = None # type: ignore
TFPolicy = None # type: ignore
SACOptimizer = None # type: ignore
logger = get_logger(__name__)

self.seed = seed
self.policy: Policy = None # type: ignore
self.optimizer: SACOptimizer = None # type: ignore
self.optimizer: TorchSACOptimizer = None # type: ignore
self.hyperparameters: SACSettings = cast(
SACSettings, trainer_settings.hyperparameters
)

agent_buffer_trajectory["environment_rewards"]
)
for name, reward_signal in self.optimizer.reward_signals.items():
if isinstance(reward_signal, RewardSignal):
evaluate_result = reward_signal.evaluate_batch(
agent_buffer_trajectory
).scaled_reward
else:
# BaseRewardProvider is a PyTorch-based reward signal
if isinstance(reward_signal, BaseRewardProvider):
else: # reward_signal uses TensorFlow
evaluate_result = reward_signal.evaluate_batch(
agent_buffer_trajectory
).scaled_reward
# Report the reward signals
self.collected_rewards[name][agent_id] += np.sum(evaluate_result)

)
for name, v in value_estimates.items():
if isinstance(self.optimizer.reward_signals[name], RewardSignal):
self._stats_reporter.add_stat(
self.optimizer.reward_signals[name].value_name, np.mean(v)
)
else:
# BaseRewardProvider is a PyTorch-based reward signal
if isinstance(self.optimizer.reward_signals[name], BaseRewardProvider):
)
else: # TensorFlow reward signal
self._stats_reporter.add_stat(
self.optimizer.reward_signals[name].value_name, np.mean(v)
)
# Bootstrap using the last step rather than the bootstrap step if max step is reached.

)
# Get rewards for each reward
for name, signal in self.optimizer.reward_signals.items():
if isinstance(signal, RewardSignal):
# BaseRewardProvider is a PyTorch-based reward signal
if isinstance(signal, BaseRewardProvider):
sampled_minibatch[f"{name}_rewards"] = (
signal.evaluate(sampled_minibatch) * signal.strength
)
else: # reward_signal is a TensorFlow-based RewardSignal class
else:
sampled_minibatch[f"{name}_rewards"] = (
signal.evaluate(sampled_minibatch) * signal.strength
)
update_stats = self.optimizer.update(sampled_minibatch, n_sequences)
for stat_name, value in update_stats.items():

reward_signal_minibatches = {}
for name, signal in self.optimizer.reward_signals.items():
logger.debug(f"Updating {name} at step {self.step}")
if isinstance(signal, RewardSignal):
# BaseRewardProvider is a PyTorch-based reward signal
if not isinstance(signal, BaseRewardProvider):
# Some signals don't need a minibatch to be sampled - so we don't!
if signal.update_dict:
reward_signal_minibatches[name] = buffer.sample_mini_batch(

else:
else: # TensorFlow reward signal
if name != "extrinsic":
reward_signal_minibatches[name] = buffer.sample_mini_batch(
self.hyperparameters.batch_size,

for stat, stat_list in batch_update_stats.items():
self._stats_reporter.add_stat(stat, np.mean(stat_list))
def create_sac_optimizer(self) -> SACOptimizer:
def create_sac_optimizer(self) -> TorchSACOptimizer:
if self.framework == FrameworkType.PYTORCH:
return TorchSACOptimizer( # type: ignore
cast(TorchPolicy, self.policy), self.trainer_settings # type: ignore

2
ml-agents/mlagents/trainers/settings.py


threaded: bool = True
self_play: Optional[SelfPlaySettings] = None
behavioral_cloning: Optional[BehavioralCloningSettings] = None
framework: FrameworkType = FrameworkType.TENSORFLOW
framework: FrameworkType = FrameworkType.PYTORCH
cattr.register_structure_hook(
Dict[RewardSignalType, RewardSignalSettings], RewardSignalSettings.structure

86
ml-agents/mlagents/trainers/stats.py


from collections import defaultdict
from enum import Enum
from typing import List, Dict, NamedTuple, Any, Optional
from typing import List, Dict, NamedTuple, Any
import numpy as np
import abc
import os

from mlagents_envs.logging_util import get_logger
from mlagents_envs.timers import set_gauge
from mlagents.tf_utils import tf, generate_session_config
from torch.utils.tensorboard import SummaryWriter
def _dict_to_str(param_dict: Dict[str, Any], num_tabs: int) -> str:
"""
Takes a parameter dictionary and converts it to a human-readable string.
Recurses if there are multiple levels of dict. Used to print out hyperparameters.
param: param_dict: A Dictionary of key, value parameters.
return: A string version of this dictionary.
"""
if not isinstance(param_dict, dict):
return str(param_dict)
else:
append_newline = "\n" if num_tabs > 0 else ""
return append_newline + "\n".join(
[
"\t"
+ " " * num_tabs
+ "{}:\t{}".format(x, _dict_to_str(param_dict[x], num_tabs + 1))
for x in param_dict
]
)
class StatsSummary(NamedTuple):

if property_type == StatsPropertyType.HYPERPARAMETERS:
logger.info(
"""Hyperparameters for behavior name {}: \n{}""".format(
category, self._dict_to_str(value, 0)
category, _dict_to_str(value, 0)
)
)
elif property_type == StatsPropertyType.SELF_PLAY:

def _dict_to_str(self, param_dict: Dict[str, Any], num_tabs: int) -> str:
"""
Takes a parameter dictionary and converts it to a human-readable string.
Recurses if there are multiple levels of dict. Used to print out hyperparameters.
param: param_dict: A Dictionary of key, value parameters.
return: A string version of this dictionary.
"""
if not isinstance(param_dict, dict):
return str(param_dict)
else:
append_newline = "\n" if num_tabs > 0 else ""
return append_newline + "\n".join(
[
"\t"
+ " " * num_tabs
+ "{}:\t{}".format(
x, self._dict_to_str(param_dict[x], num_tabs + 1)
)
for x in param_dict
]
)
class TensorboardWriter(StatsWriter):
def __init__(self, base_dir: str, clear_past_data: bool = False):

:param clear_past_data: Whether or not to clean up existing Tensorboard files associated with the base_dir and
category.
"""
self.summary_writers: Dict[str, tf.summary.FileWriter] = {}
self.summary_writers: Dict[str, SummaryWriter] = {}
self.base_dir: str = base_dir
self._clear_past_data = clear_past_data

self._maybe_create_summary_writer(category)
for key, value in values.items():
summary = tf.Summary()
summary.value.add(tag=f"{key}", simple_value=value.mean)
self.summary_writers[category].add_summary(summary, step)
self.summary_writers[category].add_scalar(f"{key}", value.mean, step)
self.summary_writers[category].flush()
def _maybe_create_summary_writer(self, category: str) -> None:

os.makedirs(filewriter_dir, exist_ok=True)
if self._clear_past_data:
self._delete_all_events_files(filewriter_dir)
self.summary_writers[category] = tf.summary.FileWriter(filewriter_dir)
self.summary_writers[category] = SummaryWriter(filewriter_dir)
def _delete_all_events_files(self, directory_name: str) -> None:
for file_name in os.listdir(directory_name):

) -> None:
if property_type == StatsPropertyType.HYPERPARAMETERS:
assert isinstance(value, dict)
summary = self._dict_to_tensorboard("Hyperparameters", value)
summary = _dict_to_str(value, 0)
self.summary_writers[category].add_summary(summary, 0)
def _dict_to_tensorboard(
self, name: str, input_dict: Dict[str, Any]
) -> Optional[bytes]:
"""
Convert a dict to a Tensorboard-encoded string.
:param name: The name of the text.
:param input_dict: A dictionary that will be displayed in a table on Tensorboard.
"""
try:
with tf.Session(config=generate_session_config()) as sess:
s_op = tf.summary.text(
name,
tf.convert_to_tensor(
[[str(x), str(input_dict[x])] for x in input_dict]
),
)
s = sess.run(s_op)
return s
except Exception:
logger.warning(
f"Could not write {name} summary for Tensorboard: {input_dict}"
)
return None
self.summary_writers[category].add_text("Hyperparameters", summary)
self.summary_writers[category].flush()
class StatsReporter:

44
ml-agents/mlagents/trainers/tests/tensorflow/test_ghost.py


np.testing.assert_array_equal(w, lw)
def test_resume(dummy_config, tmp_path):
mock_specs = mb.setup_test_behavior_specs(
True, False, vector_action_space=[2], vector_obs_space=1
)
behavior_id_team0 = "test_brain?team=0"
behavior_id_team1 = "test_brain?team=1"
brain_name = BehaviorIdentifiers.from_name_behavior_id(behavior_id_team0).brain_name
tmp_path = tmp_path.as_posix()
ppo_trainer = PPOTrainer(brain_name, 0, dummy_config, True, False, 0, tmp_path)
controller = GhostController(100)
trainer = GhostTrainer(
ppo_trainer, brain_name, controller, 0, dummy_config, True, tmp_path
)
parsed_behavior_id0 = BehaviorIdentifiers.from_name_behavior_id(behavior_id_team0)
policy = trainer.create_policy(parsed_behavior_id0, mock_specs)
trainer.add_policy(parsed_behavior_id0, policy)
parsed_behavior_id1 = BehaviorIdentifiers.from_name_behavior_id(behavior_id_team1)
policy = trainer.create_policy(parsed_behavior_id1, mock_specs)
trainer.add_policy(parsed_behavior_id1, policy)
trainer.save_model()
# Make a new trainer, check that the policies are the same
ppo_trainer2 = PPOTrainer(brain_name, 0, dummy_config, True, True, 0, tmp_path)
trainer2 = GhostTrainer(
ppo_trainer2, brain_name, controller, 0, dummy_config, True, tmp_path
)
policy = trainer2.create_policy(parsed_behavior_id0, mock_specs)
trainer2.add_policy(parsed_behavior_id0, policy)
policy = trainer2.create_policy(parsed_behavior_id1, mock_specs)
trainer2.add_policy(parsed_behavior_id1, policy)
trainer1_policy = trainer.get_policy(parsed_behavior_id1.behavior_id)
trainer2_policy = trainer2.get_policy(parsed_behavior_id1.behavior_id)
weights = trainer1_policy.get_weights()
weights2 = trainer2_policy.get_weights()
for w, lw in zip(weights, weights2):
np.testing.assert_array_equal(w, lw)
def test_process_trajectory(dummy_config):
mock_specs = mb.setup_test_behavior_specs(
True, False, vector_action_space=[2], vector_obs_space=1

20
ml-agents/mlagents/trainers/tests/test_stats.py


)
@mock.patch("mlagents.tf_utils.tf.Summary")
@mock.patch("mlagents.tf_utils.tf.summary.FileWriter")
def test_tensorboard_writer(mock_filewriter, mock_summary):
@mock.patch("mlagents.trainers.stats.SummaryWriter")
def test_tensorboard_writer(mock_summary):
# Test write_stats
category = "category1"
with tempfile.TemporaryDirectory(prefix="unittest-") as base_dir:

basedir=base_dir, category=category
)
assert os.path.exists(filewriter_dir)
mock_filewriter.assert_called_once_with(filewriter_dir)
mock_summary.assert_called_once_with(filewriter_dir)
mock_summary.return_value.value.add.assert_called_once_with(
tag="key1", simple_value=1.0
)
mock_filewriter.return_value.add_summary.assert_called_once_with(
mock_summary.return_value, 10
)
mock_filewriter.return_value.flush.assert_called_once()
mock_summary.return_value.add_scalar.assert_called_once_with("key1", 1.0, 10)
mock_summary.return_value.flush.assert_called_once()
assert mock_filewriter.return_value.add_summary.call_count > 1
assert mock_summary.return_value.add_text.call_count >= 1
def test_tensorboard_writer_clear(tmp_path):

},
10,
)
# Test hyperparameter writing - no good way to parse the TB string though.
# Test hyperparameter writing
console_writer.add_property(
"category1", StatsPropertyType.HYPERPARAMETERS, {"example": 1.0}
)

44
ml-agents/mlagents/trainers/tests/torch/test_ghost.py


np.testing.assert_array_equal(w, lw)
def test_resume(dummy_config, tmp_path):
mock_specs = mb.setup_test_behavior_specs(
True, False, vector_action_space=[2], vector_obs_space=1
)
behavior_id_team0 = "test_brain?team=0"
behavior_id_team1 = "test_brain?team=1"
brain_name = BehaviorIdentifiers.from_name_behavior_id(behavior_id_team0).brain_name
tmp_path = tmp_path.as_posix()
ppo_trainer = PPOTrainer(brain_name, 0, dummy_config, True, False, 0, tmp_path)
controller = GhostController(100)
trainer = GhostTrainer(
ppo_trainer, brain_name, controller, 0, dummy_config, True, tmp_path
)
parsed_behavior_id0 = BehaviorIdentifiers.from_name_behavior_id(behavior_id_team0)
policy = trainer.create_policy(parsed_behavior_id0, mock_specs)
trainer.add_policy(parsed_behavior_id0, policy)
parsed_behavior_id1 = BehaviorIdentifiers.from_name_behavior_id(behavior_id_team1)
policy = trainer.create_policy(parsed_behavior_id1, mock_specs)
trainer.add_policy(parsed_behavior_id1, policy)
trainer.save_model()
# Make a new trainer, check that the policies are the same
ppo_trainer2 = PPOTrainer(brain_name, 0, dummy_config, True, True, 0, tmp_path)
trainer2 = GhostTrainer(
ppo_trainer2, brain_name, controller, 0, dummy_config, True, tmp_path
)
policy = trainer2.create_policy(parsed_behavior_id0, mock_specs)
trainer2.add_policy(parsed_behavior_id0, policy)
policy = trainer2.create_policy(parsed_behavior_id1, mock_specs)
trainer2.add_policy(parsed_behavior_id1, policy)
trainer1_policy = trainer.get_policy(parsed_behavior_id1.behavior_id)
trainer2_policy = trainer2.get_policy(parsed_behavior_id1.behavior_id)
weights = trainer1_policy.get_weights()
weights2 = trainer2_policy.get_weights()
for w, lw in zip(weights, weights2):
np.testing.assert_array_equal(w, lw)
def test_process_trajectory(dummy_config):
mock_specs = mb.setup_test_behavior_specs(
True, False, vector_action_space=[2], vector_obs_space=1

1
ml-agents/mlagents/trainers/tests/torch/test_policy.py


memories=memories,
seq_len=policy.sequence_length,
)
assert log_probs.shape == (64, policy.behavior_spec.action_spec.size)
assert entropy.shape == (64,)
for val in values.values():

3
ml-agents/mlagents/trainers/torch/distributions.py


if self.conditional_sigma:
log_sigma = torch.clamp(self.log_sigma(inputs), min=-20, max=2)
else:
log_sigma = self.log_sigma
# Expand so that entropy matches batch size
log_sigma = self.log_sigma.expand(inputs.shape[0], -1)
if self.tanh_squash:
return [TanhGaussianDistInstance(mu, torch.exp(log_sigma))]
else:

34
ml-agents/mlagents/trainers/trainer/rl_trainer.py


from mlagents.trainers.optimizer import Optimizer
from mlagents.trainers.buffer import AgentBuffer
from mlagents.trainers.trainer import Trainer
from mlagents.trainers.tf.components.reward_signals import (
RewardSignalResult,
RewardSignal,
from mlagents.trainers.torch.components.reward_providers.base_reward_provider import (
BaseRewardProvider,
from mlagents.trainers.policy.tf_policy import TFPolicy
from mlagents.trainers.policy.torch_policy import TorchPolicy
from mlagents.trainers.model_saver.torch_model_saver import TorchModelSaver
from mlagents.trainers.behavior_id_utils import BehaviorIdentifiers
from mlagents.trainers.agent_processor import AgentManagerQueue
from mlagents.trainers.trajectory import Trajectory

from mlagents.trainers.model_saver.tf_model_saver import TFModelSaver
from mlagents import torch_utils
from mlagents import tf_utils
if torch_utils.is_available():
from mlagents.trainers.policy.torch_policy import TorchPolicy
from mlagents.trainers.model_saver.torch_model_saver import TorchModelSaver
if tf_utils.is_available():
from mlagents.trainers.policy.tf_policy import TFPolicy
from mlagents.trainers.model_saver.tf_model_saver import TFModelSaver
TorchPolicy = None # type: ignore
TorchSaver = None # type: ignore
RewardSignalResults = Dict[str, RewardSignalResult]
TFPolicy = None # type: ignore
TFModelSaver = None # type: ignore
logger = get_logger(__name__)

StatsPropertyType.HYPERPARAMETERS, self.trainer_settings.as_dict()
)
self.framework = self.trainer_settings.framework
if self.framework == FrameworkType.PYTORCH and not torch_utils.is_available():
if self.framework == FrameworkType.TENSORFLOW and not tf_utils.is_available():
"To use the experimental PyTorch backend, install the PyTorch Python package first."
"To use the TensorFlow backend, install the TensorFlow Python package first."
)
logger.debug(f"Using framework {self.framework.value}")

self.reward_buffer.appendleft(rewards.get(agent_id, 0))
rewards[agent_id] = 0
else:
if isinstance(optimizer.reward_signals[name], RewardSignal):
if isinstance(optimizer.reward_signals[name], BaseRewardProvider):
optimizer.reward_signals[name].stat_name,
f"Policy/{optimizer.reward_signals[name].name.capitalize()} Reward",
f"Policy/{optimizer.reward_signals[name].name.capitalize()} Reward",
optimizer.reward_signals[name].stat_name,
rewards.get(agent_id, 0),
)
rewards[agent_id] = 0

18
ml-agents/mlagents/trainers/trainer/trainer_factory.py


init_path: str = None,
multi_gpu: bool = False,
force_torch: bool = False,
force_tensorflow: bool = False,
):
"""
The TrainerFactory generates the Trainers based on the configuration passed as

:param init_path: Path from which to load model.
:param multi_gpu: If True, multi-gpu will be used. (currently not available)
:param force_torch: If True, the Trainers will all use the PyTorch framework
instead of the TensorFlow framework.
instead of what is specified in the config YAML.
:param force_tensorflow: If True, thee Trainers will all use the TensorFlow
framework.
"""
self.trainer_config = trainer_config
self.output_path = output_path

self.multi_gpu = multi_gpu
self.ghost_controller = GhostController()
self._force_torch = force_torch
self._force_tf = force_tensorflow
def generate(self, behavior_name: str) -> Trainer:
if behavior_name not in self.trainer_config.keys():

trainer_settings = self.trainer_config[behavior_name]
if self._force_torch:
trainer_settings.framework = FrameworkType.PYTORCH
logger.warning(
"Note that specifying --torch is not required anymore as PyTorch is the default framework."
)
if self._force_tf:
trainer_settings.framework = FrameworkType.TENSORFLOW
logger.warning(
"Setting the framework to TensorFlow. TensorFlow trainers will be deprecated in the future."
)
if self._force_torch:
logger.warning(
"Both --torch and --tensorflow CLI options were specified. Using TensorFlow."
)
return TrainerFactory._initialize_trainer(
trainer_settings,
behavior_name,

10
ml-agents/mlagents/trainers/trainer_controller.py


import numpy as np
from mlagents.tf_utils import tf
from mlagents import tf_utils
from mlagents_envs.logging_util import get_logger
from mlagents.trainers.env_manager import EnvManager, EnvironmentStep

self.trainer_threads: List[threading.Thread] = []
self.kill_trainers = False
np.random.seed(training_seed)
tf.set_random_seed(training_seed)
if torch_utils.is_available():
torch_utils.torch.manual_seed(training_seed)
if tf_utils.is_available():
tf.set_random_seed(training_seed)
torch_utils.torch.manual_seed(training_seed)
self.rank = get_rank()
@timed

@timed
def start_learning(self, env_manager: EnvManager) -> None:
self._create_output_path(self.output_path)
tf.reset_default_graph()
if tf_utils.is_available():
tf.reset_default_graph()
try:
# Initial reset
self._reset_env(env_manager)

12
ml-agents/mlagents/trainers/training_status.py


import attr
import cattr
from mlagents.tf_utils import tf
from mlagents.torch_utils import torch
from mlagents.tf_utils import tf, is_available as tf_is_available
from mlagents_envs.logging_util import get_logger
from mlagents.trainers import __version__
from mlagents.trainers.exception import TrainerError

STATUS_FORMAT_VERSION = "0.1.0"
STATUS_FORMAT_VERSION = "0.2.0"
class StatusType(Enum):

class StatusMetaData:
stats_format_version: str = STATUS_FORMAT_VERSION
mlagents_version: str = __version__
tensorflow_version: str = tf.__version__
torch_version: str = torch.__version__
tensorflow_version: str = tf.__version__ if tf_is_available() else -1
def to_dict(self) -> Dict[str, str]:
return cattr.unstructure(self)

if self.tensorflow_version != other.tensorflow_version:
logger.warning(
"Tensorflow checkpoint was saved with a different version of Tensorflow. Model may not resume properly."
)
if self.torch_version != other.torch_version:
logger.warning(
"PyTorch checkpoint was saved with a different version of PyTorch. Model may not resume properly."
)

9
ml-agents/setup.py


"Pillow>=4.2.1",
"protobuf>=3.6",
"pyyaml>=3.1.0",
"tensorflow>=1.14,<3.0",
# Windows ver. of PyTorch doesn't work from PyPi
'torch>=1.6.0;platform_system!="Windows"',
"tensorboard>=1.15",
# We don't actually need six, but tensorflow does, and pip seems
# to get confused and install the wrong version.
"six>=1.12.0",
],
python_requires=">=3.6.1",
entry_points={

]
},
cmdclass={"verify": VerifyVersionCommand},
extras_require={"torch": ["torch>=1.5.0"]},
extras_require={"tensorflow": ["tensorflow>=1.14,<3.0", "six>=1.12.0"]},
)

21
ml-agents/tests/yamato/training_int_tests.py