浏览代码

Merge branch 'develop-agentprocessor-teammanager' into develop-coma2-trainer

/develop/action-slice
Andrew Cohen 3 年前
当前提交
9060da06
共有 61 个文件被更改,包括 2479 次插入1570 次删除
  1. 2
      Project/Assets/ML-Agents/Examples/Match3/Prefabs/Match3VisualObs.prefab
  2. 1001
      Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VectorObs.onnx
  3. 11
      README.md
  4. 16
      com.unity.ml-agents.extensions/Documentation~/Grid-Sensor.md
  5. 2
      com.unity.ml-agents.extensions/Documentation~/Match3.md
  6. 9
      com.unity.ml-agents.extensions/Documentation~/com.unity.ml-agents.extensions.md
  7. 12
      com.unity.ml-agents.extensions/Runtime/Input/InputActionActuator.cs
  8. 9
      com.unity.ml-agents.extensions/Runtime/Match3/Match3Actuator.cs
  9. 2
      com.unity.ml-agents.extensions/package.json
  10. 4
      com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
  11. 4
      com.unity.ml-agents/Runtime/Academy.cs
  12. 1
      com.unity.ml-agents/Runtime/Actuators/ActionSpec.cs
  13. 2
      com.unity.ml-agents/Runtime/Actuators/IActionReceiver.cs
  14. 2
      com.unity.ml-agents/Runtime/Actuators/IDiscreteActionMask.cs
  15. 8
      com.unity.ml-agents/Runtime/Actuators/VectorActuator.cs
  16. 47
      com.unity.ml-agents/Runtime/Agent.cs
  17. 31
      com.unity.ml-agents/Runtime/Analytics/Events.cs
  18. 18
      com.unity.ml-agents/Runtime/Analytics/InferenceAnalytics.cs
  19. 17
      com.unity.ml-agents/Runtime/Analytics/TrainingAnalytics.cs
  20. 9
      com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
  21. 2
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
  22. 2
      com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
  23. 10
      com.unity.ml-agents/Runtime/Policies/BarracudaPolicy.cs
  24. 6
      com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs
  25. 10
      com.unity.ml-agents/Runtime/Policies/RemotePolicy.cs
  26. 38
      com.unity.ml-agents/Runtime/Sensors/IBuiltInSensor.cs
  27. 6
      com.unity.ml-agents/Runtime/Sensors/ObservationWriter.cs
  28. 5
      com.unity.ml-agents/Runtime/SideChannels/TrainingAnalyticsSideChannel.cs
  29. 18
      com.unity.ml-agents/Tests/Editor/Analytics/InferenceAnalyticsTests.cs
  30. 38
      com.unity.ml-agents/Tests/Editor/Analytics/TrainingAnalyticsTest.cs
  31. 86
      config/ppo/Match3.yaml
  32. 1
      config/ppo/PyramidsRND.yaml
  33. 4
      docs/Installation-Anaconda-Windows.md
  34. 6
      docs/Installation.md
  35. 4
      docs/Learning-Environment-Examples.md
  36. 2
      docs/Training-on-Amazon-Web-Service.md
  37. 4
      docs/Unity-Inference-Engine.md
  38. 9
      ml-agents/mlagents/trainers/action_info.py
  39. 49
      ml-agents/mlagents/trainers/agent_processor.py
  40. 53
      ml-agents/mlagents/trainers/buffer.py
  41. 15
      ml-agents/mlagents/trainers/policy/policy.py
  42. 2
      ml-agents/mlagents/trainers/ppo/trainer.py
  43. 62
      ml-agents/mlagents/trainers/sac/optimizer_torch.py
  44. 9
      ml-agents/mlagents/trainers/sac/trainer.py
  45. 6
      ml-agents/mlagents/trainers/tests/test_agent_processor.py
  46. 23
      ml-agents/mlagents/trainers/tests/test_buffer.py
  47. 34
      ml-agents/mlagents/trainers/tests/test_trajectory.py
  48. 12
      ml-agents/mlagents/trainers/tests/torch/test_ppo.py
  49. 2
      ml-agents/mlagents/trainers/tests/torch/test_sac.py
  50. 4
      ml-agents/mlagents/trainers/tests/torch/test_simple_rl.py
  51. 79
      ml-agents/mlagents/trainers/torch/agent_action.py
  52. 39
      ml-agents/mlagents/trainers/trajectory.py
  53. 1
      utils/make_readme_table.py
  54. 69
      utils/validate_release_links.py
  55. 1001
      Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx
  56. 15
      Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx.meta
  57. 49
      com.unity.ml-agents/Runtime/Actuators/IBuiltInActuator.cs
  58. 3
      com.unity.ml-agents/Runtime/Actuators/IBuiltInActuator.cs.meta
  59. 52
      ml-agents/mlagents/trainers/tests/torch/test_agent_action.py
  60. 1001
      Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn
  61. 11
      Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn.meta

2
Project/Assets/ML-Agents/Examples/Match3/Prefabs/Match3VisualObs.prefab


VectorActionDescriptions: []
VectorActionSpaceType: 0
hasUpgradedBrainParametersWithActionSpec: 1
m_Model: {fileID: 11400000, guid: 48d14da88fea74d0693c691c6e3f2e34, type: 3}
m_Model: {fileID: 11400000, guid: 28ccdfd7cb3d941ce8af0ab89e06130a, type: 3}
m_InferenceDevice: 2
m_BehaviorType: 0
m_BehaviorName: Match3VisualObs

1001
Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VectorObs.onnx
文件差异内容过多而无法显示
查看文件

11
README.md


# Unity ML-Agents Toolkit
[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/)
[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/)
[![license badge](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)

## Releases & Documentation
**Our latest, stable release is `Release 12`. Click
[here](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Readme.md)
**Our latest, stable release is `Release 13`. Click
[here](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/Readme.md)
to get started with the latest release of ML-Agents.**
The table below lists all our releases, including our `master` branch which is

| **Version** | **Release Date** | **Source** | **Documentation** | **Download** | **Python Package** | **Unity Package** |
|:-------:|:------:|:-------------:|:-------:|:------------:|:------------:|:------------:|
| **master (unstable)** | -- | [source](https://github.com/Unity-Technologies/ml-agents/tree/master) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/master/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/master.zip) | -- | -- |
| **Release 12** | **December 22, 2020** | **[source](https://github.com/Unity-Technologies/ml-agents/tree/release_12)** | **[docs](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Readme.md)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/release_12.zip)** | **[0.23.0](https://pypi.org/project/mlagents/0.23.0/)** | **[1.7.2](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.7/manual/index.html)** |
| **Release 13** | **February 17, 2021** | **[source](https://github.com/Unity-Technologies/ml-agents/tree/release_13)** | **[docs](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/Readme.md)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/release_13.zip)** | **[0.24.0](https://pypi.org/project/mlagents/0.24.0/)** | **[1.8.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.8/manual/index.html)** |
| **Release 12** | December 22, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/release_12) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/release_12.zip) | [0.23.0](https://pypi.org/project/mlagents/0.23.0/) | [1.7.2](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.7/manual/index.html) |
| **Release 11** | December 21, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/release_11) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/release_11_docs/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/release_11.zip) | [0.23.0](https://pypi.org/project/mlagents/0.23.0/) | [1.7.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.7/manual/index.html) |
| **Release 10** | November 18, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/release_10) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/release_10_docs/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/release_10.zip) | [0.22.0](https://pypi.org/project/mlagents/0.22.0/) | [1.6.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.6/manual/index.html) |
| **Verified Package 1.0.6** | **November 16, 2020** | **[source](https://github.com/Unity-Technologies/ml-agents/tree/com.unity.ml-agents_1.0.6)** | **[docs](https://github.com/Unity-Technologies/ml-agents/blob/release_2_verified_docs/docs/Readme.md)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/com.unity.ml-agents_1.0.6.zip)** | **[0.16.1](https://pypi.org/project/mlagents/0.16.1/)** | **[1.0.6](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.0/manual/index.html)** |

| **Release 7** | September 16, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/release_7) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/release_7_docs/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/release_7.zip) | [0.20.0](https://pypi.org/project/mlagents/0.20.0/) | [1.4.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.4/manual/index.html) |
If you are a researcher interested in a discussion of Unity as an AI platform,
see a pre-print of our

16
com.unity.ml-agents.extensions/Documentation~/Grid-Sensor.md


# Contribution
An image can be thought of as a matrix of a predefined width (W) and a height (H) and each pixel can be thought of as simply an array of length 3 (in the case of RGB), `[Red, Green, Blue]` holding the different channel information of the color (channel) intensities at that pixel location. Thus an image is just a 3 dimensional matrix of size WxHx3. A Grid Observation can be thought of as a generalization of this setup where in place of a pixel there is a "cell" which is an array of length N representing different channel intensities at that cell position. From a Convolutional Neural Network point of view, the introduction of multiple channels in an "image" isn't a new concept. One such example is using an RGB-Depth image which is used in several robotics applications. The distinction of Grid Observations is what the data within the channels represents. Instead of limiting the channels to color intensities, the channels within a cell of a Grid Observation generalize to any data that can be represented by a single number (float or int).
Before jumping into the details of the Grid Sensor, an important thing to note is the agent performance and qualitatively different behavior over raycasts. Unity MLAgent's comes with a suite of example environments. One in particular, the [Food Collector](https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Examples.md#food-collector), has been the focus of the Grid Sensor development.
The Food Collector environment can be described as:
* Set-up: A multi-agent environment where agents compete to collect food.
* Goal: The agents must learn to collect as many green food spheres as possible while avoiding red spheres.
* Agents: The environment contains 5 agents with same Behavior Parameters.
When applying the Grid Sensor to this environment, in place of the Raycast Vector Sensor or the Camera Sensor, a Mean Reward of 40-50 is observed. This performance is on par with what is seen by agents trained with RayCasts but the side-by-side comparison of trained agents, shows a qualitative difference in behavior. A deeper study and interpretation of the qualitative differences between agents trained with Raycasts and Vector Sensors verses Grid Sensors is left to future studies.
<img src="images/gridobs-vs-vectorobs.gif" align="middle" width="3000"/>
## Overview
There are three main phases to the observation process of the Grid Sensor:

### Channel Based
The Channel Based Grid Observations represent obsevations in a normalized form with 0 to 1. To distinguish between categorical and continuous data, one would use the ChannelDepth array to signify the ranges that the values in the `channelValues` array could take. If one sets ChannelDepth[i] to be 1, it is assumed that the value of `channelValues[i]` is already normalized. Else ChannelDepth[i] represents the total number of possible values that `channelValues[i]` can take and will be used for normalization.
The Channel Based Grid Observations is perhaps the simplest in terms of usability and similarity with other machine learning applications. Each grid is of size WxHxC where C is the number of channels. To distinguish between categorical and continuous data, one would use the ChannelDepth array to signify the ranges that the values in the `channelValues` array could take. If one sets ChannelDepth[i] to be 1, it is assumed that the value of `channelValues[i]` is already normalized. Else ChannelDepth[i] represents the total number of possible values that `channelValues[i]` can take.
As the "enemy" is in the second position of the observed tags, its value can be normalized by:
For ObjectType, "weapon", "enemy" will be represented respectively as:
```
weapon = DetectableObjects.IndexOfTag("weapon")/ChannelDepth[0] = 1/2 = 0.5;

2
com.unity.ml-agents.extensions/Documentation~/Match3.md


This implementation includes:
* C# implementation catered toward a Match-3 setup including concepts around encoding for moves based on [Human Like Playtesting with Deep Learning](https://www.researchgate.net/publication/328307928_Human-Like_Playtesting_with_Deep_Learning)
* An example Match-3 scene with ML-Agents implemented (located under /Project/Assets/ML-Agents/Examples/Match3). More information, on Match-3 example [here](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/docs/Learning-Environment-Examples.md#match-3).
* An example Match-3 scene with ML-Agents implemented (located under /Project/Assets/ML-Agents/Examples/Match3). More information, on Match-3 example [here](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/docs/Learning-Environment-Examples.md#match-3).
### Feedback
If you are a Match-3 developer and are trying to leverage ML-Agents for this scenario, [we want to hear from you](https://forms.gle/TBsB9jc8WshgzViU9). Additionally, we are also looking for interested Match-3 teams to speak with us for 45 minutes. If you are interested, please indicate that in the [form](https://forms.gle/TBsB9jc8WshgzViU9). If selected, we will provide gift cards as a token of appreciation.

9
com.unity.ml-agents.extensions/Documentation~/com.unity.ml-agents.extensions.md


recommended ways to install the package:
### Local Installation
[Clone the repository](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Installation.md#clone-the-ml-agents-toolkit-repository-optional) and follow the
[Local Installation for Development](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Installation.md#advanced-local-installation-for-development-1)
[Clone the repository](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/Installation.md#clone-the-ml-agents-toolkit-repository-optional) and follow the
[Local Installation for Development](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/Installation.md#advanced-local-installation-for-development-1)
![Package Manager git URL](https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/images/unity_package_manager_git_url.png)
![Package Manager git URL](https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/images/unity_package_manager_git_url.png)
In the dialog that appears, enter
```
git+https://github.com/Unity-Technologies/ml-agents.git?path=com.unity.ml-agents.extensions

- No way to customize the action space of the `InputActuatorComponent`
## Need Help?
The main [README](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/README.md) contains links for contacting the team or getting support.
The main [README](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/README.md) contains links for contacting the team or getting support.

12
com.unity.ml-agents.extensions/Runtime/Input/InputActionActuator.cs


/// <see cref="Agent"/>'s <see cref="BehaviorParameters"/> indicate that the Agent is running in Heuristic Mode,
/// this Actuator will write actions from the <see cref="InputSystem"/> to the <see cref="ActionBuffers"/> object.
/// </summary>
public class InputActionActuator : IActuator, IHeuristicProvider
public class InputActionActuator : IActuator, IHeuristicProvider, IBuiltInActuator
{
readonly BehaviorParameters m_BehaviorParameters;
readonly InputAction m_Action;

/// <param name="adaptor">The <see cref="IRLActionInputAdaptor"/> that will convert data between ML-Agents
/// and the <see cref="InputSystem"/>.</param>
public InputActionActuator(InputDevice inputDevice, BehaviorParameters behaviorParameters,
InputAction action,
IRLActionInputAdaptor adaptor)
InputAction action,
IRLActionInputAdaptor adaptor)
{
m_BehaviorParameters = behaviorParameters;
Name = $"InputActionActuator-{action.name}";

Profiler.BeginSample("InputActionActuator.Heuristic");
m_InputAdaptor.WriteToHeuristic(m_Action, actionBuffersOut);
Profiler.EndSample();
}
/// <inheritdoc/>
public BuiltInActuatorType GetBuiltInActuatorType()
{
return BuiltInActuatorType.InputActionActuator;
}
}
}

9
com.unity.ml-agents.extensions/Runtime/Match3/Match3Actuator.cs


/// Actuator for a Match3 game. It translates valid moves (defined by AbstractBoard.IsMoveValid())
/// in action masks, and applies the action to the board via AbstractBoard.MakeMove().
/// </summary>
public class Match3Actuator : IActuator, IHeuristicProvider
public class Match3Actuator : IActuator, IHeuristicProvider, IBuiltInActuator
{
protected AbstractBoard m_Board;
protected System.Random m_Random;

/// <inheritdoc/>
public void ResetData()
{
}
/// <inheritdoc/>
public BuiltInActuatorType GetBuiltInActuatorType()
{
return BuiltInActuatorType.Match3Actuator;
}
IEnumerable<int> InvalidMoveIndices()

{
return 1;
}
}
}

2
com.unity.ml-agents.extensions/package.json


{
"name": "com.unity.ml-agents.extensions",
"displayName": "ML Agents Extensions",
"version": "0.0.1-preview",
"version": "0.1.0-preview",
"unity": "2018.4",
"description": "A source-only package for new features based on ML-Agents",
"dependencies": {

4
com.unity.ml-agents/Documentation~/com.unity.ml-agents.md


[unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
[unity inference engine]: https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
[package manager documentation]: https://docs.unity3d.com/Manual/upm-ui-install.html
[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Installation.md
[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Installation.md
[ML-Agents GitHub repo]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/com.unity.ml-agents.extensions
[ML-Agents GitHub repo]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/com.unity.ml-agents.extensions

4
com.unity.ml-agents/Runtime/Academy.cs


* API. For more information on each of these entities, in addition to how to
* set-up a learning environment and train the behavior of characters in a
* Unity scene, please browse our documentation pages on GitHub:
* https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/
* https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/
*/
namespace Unity.MLAgents

/// fall back to inference or heuristic decisions. (You can also set agents to always use
/// inference or heuristics.)
/// </remarks>
[HelpURL("https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/" +
[HelpURL("https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/" +
"docs/Learning-Environment-Design.md")]
public class Academy : IDisposable
{

1
com.unity.ml-agents/Runtime/Actuators/ActionSpec.cs


/// <param name="numContinuousActions">The number of continuous actions available.</param>
/// <param name="discreteBranchSizes">The array of branch sizes for the discrete actions. Each index
/// contains the number of actions available for that branch.</param>
/// <returns>An ActionSpec initialized with the specified action sizes.</returns>
public ActionSpec(int numContinuousActions = 0, int[] discreteBranchSizes = null)
{
m_NumContinuousActions = numContinuousActions;

2
com.unity.ml-agents/Runtime/Actuators/IActionReceiver.cs


///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <seealso cref="IActionReceiver.OnActionReceived"/>
void WriteDiscreteActionMask(IDiscreteActionMask actionMask);

2
com.unity.ml-agents/Runtime/Actuators/IDiscreteActionMask.cs


///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <param name="branch">The branch for which the actions will be masked.</param>
/// <param name="actionIndices">The indices of the masked actions.</param>

8
com.unity.ml-agents/Runtime/Actuators/VectorActuator.cs


/// <summary>
/// IActuator implementation that forwards calls to an <see cref="IActionReceiver"/> and an <see cref="IHeuristicProvider"/>.
/// </summary>
internal class VectorActuator : IActuator, IHeuristicProvider
internal class VectorActuator : IActuator, IHeuristicProvider, IBuiltInActuator
{
IActionReceiver m_ActionReceiver;
IHeuristicProvider m_HeuristicProvider;

/// <inheritdoc />
public string Name { get; }
/// <inheritdoc />
public virtual BuiltInActuatorType GetBuiltInActuatorType()
{
return BuiltInActuatorType.VectorActuator;
}
}
}

47
com.unity.ml-agents/Runtime/Agent.cs


}
/// <summary>
/// Simple wrapper around VectorActuator that overrides GetBuiltInActuatorType
/// so that it can be distinguished from a standard VectorActuator.
/// </summary>
internal class AgentVectorActuator : VectorActuator
{
public AgentVectorActuator(IActionReceiver actionReceiver,
IHeuristicProvider heuristicProvider,
ActionSpec actionSpec,
string name = "VectorActuator"
) : base(actionReceiver, heuristicProvider, actionSpec, name)
{ }
public override BuiltInActuatorType GetBuiltInActuatorType()
{
return BuiltInActuatorType.AgentVectorActuator;
}
}
/// <summary>
/// An agent is an actor that can observe its environment, decide on the
/// best course of action using those observations, and execute those actions
/// within the environment.

/// [OnDisable()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnDisable.html]
/// [OnBeforeSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnBeforeSerialize.html
/// [OnAfterSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnAfterSerialize.html
/// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md
/// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design.md
/// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md
/// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design.md
/// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Readme.md
/// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Readme.md
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/" +
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/" +
"docs/Learning-Environment-Design-Agents.md")]
[Serializable]
[RequireComponent(typeof(BehaviorParameters))]

/// for information about mixing reward signals from curiosity and Generative Adversarial
/// Imitation Learning (GAIL) with rewards supplied through this method.
///
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
/// </remarks>
/// <param name="reward">The new value of the reward.</param>
public void SetReward(float reward)

/// for information about mixing reward signals from curiosity and Generative Adversarial
/// Imitation Learning (GAIL) with rewards supplied through this method.
///
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
///</remarks>
/// <param name="increment">Incremental reward value.</param>
public void AddReward(float increment)

/// implementing a simple heuristic function can aid in debugging agent actions and interactions
/// with its environment.
///
/// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#recording-demonstrations
/// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#recording-demonstrations
/// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
/// <example>

// Support legacy OnActionReceived
// TODO don't set this up if the sizes are 0?
var param = m_PolicyFactory.BrainParameters;
m_VectorActuator = new VectorActuator(this, this, param.ActionSpec);
m_VectorActuator = new AgentVectorActuator(this, this, param.ActionSpec);
m_ActuatorManager = new ActuatorManager(attachedActuators.Length + 1);
m_LegacyActionCache = new float[m_VectorActuator.TotalNumberOfActions()];
m_LegacyHeuristicCache = new float[m_VectorActuator.TotalNumberOfActions()];

/// For more information about observations, see [Observations and Sensors].
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
/// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
/// </remarks>
public virtual void CollectObservations(VectorSensor sensor)
{

///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <seealso cref="IActionReceiver.OnActionReceived"/>
public virtual void WriteDiscreteActionMask(IDiscreteActionMask actionMask)

///
/// For more information about implementing agent actions see [Agents - Actions].
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <param name="actions">
/// Struct containing the buffers of actions to be executed at this step.

31
com.unity.ml-agents/Runtime/Analytics/Events.cs


public int InferenceDevice;
public List<EventObservationSpec> ObservationSpecs;
public EventActionSpec ActionSpec;
public List<EventActuatorInfo> ActuatorInfos;
public int MemorySize;
public long TotalWeightSizeBytes;
public string ModelHash;

NumContinuousActions = actionSpec.NumContinuousActions,
NumDiscreteActions = actionSpec.NumDiscreteActions,
BranchSizes = branchSizes,
};
}
}
/// <summary>
/// Information about an actuator.
/// </summary>
[Serializable]
internal struct EventActuatorInfo
{
public int BuiltInActuatorType;
public int NumContinuousActions;
public int NumDiscreteActions;
public static EventActuatorInfo FromActuator(IActuator actuator)
{
BuiltInActuatorType builtInActuatorType = Actuators.BuiltInActuatorType.Unknown;
if (actuator is IBuiltInActuator builtInActuator)
{
builtInActuatorType = builtInActuator.GetBuiltInActuatorType();
}
var actionSpec = actuator.ActionSpec;
return new EventActuatorInfo
{
BuiltInActuatorType = (int)builtInActuatorType,
NumContinuousActions = actionSpec.NumContinuousActions,
NumDiscreteActions = actionSpec.NumDiscreteActions
};
}
}

public string BehaviorName;
public List<EventObservationSpec> ObservationSpecs;
public EventActionSpec ActionSpec;
public List<EventActuatorInfo> ActuatorInfos;
/// <summary>
/// This will be the same as TrainingEnvironmentInitializedEvent if available, but

18
com.unity.ml-agents/Runtime/Analytics/InferenceAnalytics.cs


/// <param name="inferenceDevice">Whether inference is being performed on the CPU or GPU</param>
/// <param name="sensors">List of ISensors for the Agent. Used to generate information about the observation space.</param>
/// <param name="actionSpec">ActionSpec for the Agent. Used to generate information about the action space.</param>
/// <param name="actuators">List of IActuators for the Agent. Used to generate information about the action space.</param>
/// <returns></returns>
public static void InferenceModelSet(
NNModel nnModel,

ActionSpec actionSpec
ActionSpec actionSpec,
IList<IActuator> actuators
)
{
// The event shouldn't be able to report if this is disabled but if we know we're not going to report

return;
}
var data = GetEventForModel(nnModel, behaviorName, inferenceDevice, sensors, actionSpec);
var data = GetEventForModel(nnModel, behaviorName, inferenceDevice, sensors, actionSpec, actuators);
//Debug.Log(JsonUtility.ToJson(data, true));
// Debug.Log(JsonUtility.ToJson(data, true));
#if UNITY_EDITOR
if (AnalyticsUtils.s_SendEditorAnalytics)
{

/// <param name="inferenceDevice"></param>
/// <param name="sensors"></param>
/// <param name="actionSpec"></param>
/// <param name="actuators"></param>
/// <returns></returns>
internal static InferenceEvent GetEventForModel(
NNModel nnModel,

ActionSpec actionSpec
ActionSpec actionSpec,
IList<IActuator> actuators
)
{
var barracudaModel = ModelLoader.Load(nnModel);

foreach (var sensor in sensors)
{
inferenceEvent.ObservationSpecs.Add(EventObservationSpec.FromSensor(sensor));
}
inferenceEvent.ActuatorInfos = new List<EventActuatorInfo>(actuators.Count);
foreach (var actuator in actuators)
{
inferenceEvent.ActuatorInfos.Add(EventActuatorInfo.FromActuator(actuator));
}
inferenceEvent.TotalWeightSizeBytes = GetModelWeightSize(barracudaModel);

17
com.unity.ml-agents/Runtime/Analytics/TrainingAnalytics.cs


public static void RemotePolicyInitialized(
string fullyQualifiedBehaviorName,
IList<ISensor> sensors,
ActionSpec actionSpec
ActionSpec actionSpec,
IList<IActuator> actuators
)
{
if (!IsAnalyticsEnabled())

return;
}
var data = GetEventForRemotePolicy(behaviorName, sensors, actionSpec);
var data = GetEventForRemotePolicy(behaviorName, sensors, actionSpec, actuators);
// Note - to debug, use JsonUtility.ToJson on the event.
// Debug.Log(
// $"Would send event {k_RemotePolicyInitializedEventName} with body {JsonUtility.ToJson(data, true)}"

#endif
}
static RemotePolicyInitializedEvent GetEventForRemotePolicy(
internal static RemotePolicyInitializedEvent GetEventForRemotePolicy(
ActionSpec actionSpec)
ActionSpec actionSpec,
IList<IActuator> actuators
)
{
var remotePolicyEvent = new RemotePolicyInitializedEvent();

foreach (var sensor in sensors)
{
remotePolicyEvent.ObservationSpecs.Add(EventObservationSpec.FromSensor(sensor));
}
remotePolicyEvent.ActuatorInfos = new List<EventActuatorInfo>(actuators.Count);
foreach (var actuator in actuators)
{
remotePolicyEvent.ActuatorInfos.Add(EventActuatorInfo.FromActuator(actuator));
}
remotePolicyEvent.MLAgentsEnvsVersion = s_TrainerPackageVersion;

9
com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs


#if UNITY_EDITOR || UNITY_STANDALONE_WIN || UNITY_STANDALONE_OSX || UNITY_STANDALONE_LINUX
#define MLA_SUPPORTED_TRAINING_PLATFORM
#endif
# if MLA_SUPPORTED_TRAINING_PLATFORM
using Grpc.Core;
#if UNITY_EDITOR
using UnityEditor;

/// <param name="initParametersOut">The External Initialization Parameters received.</param>
public bool Initialize(CommunicatorInitParameters initParameters, out UnityRLInitParameters initParametersOut)
{
#if MLA_SUPPORTED_TRAINING_PLATFORM
var academyParameters = new UnityRLInitializationOutputProto
{
Name = initParameters.name,

UpdateEnvironmentWithInput(input.RlInput);
initParametersOut = initializationInput.RlInitializationInput.ToUnityRLInitParameters();
return true;
#else
initParametersOut = new UnityRLInitParameters();
return false;
#endif
}
/// <summary>

2
com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs


/// See [Imitation Learning - Recording Demonstrations] for more information.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs//Learning-Environment-Design-Agents.md#recording-demonstrations
/// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs//Learning-Environment-Design-Agents.md#recording-demonstrations
/// </remarks>
[RequireComponent(typeof(Agent))]
[AddComponentMenu("ML Agents/Demonstration Recorder", (int)MenuGroup.Default)]

2
com.unity.ml-agents/Runtime/DiscreteActionMasker.cs


///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <param name="branch">The branch for which the actions will be masked.</param>
/// <param name="actionIndices">The indices of the masked actions.</param>

10
com.unity.ml-agents/Runtime/Policies/BarracudaPolicy.cs


private string m_BehaviorName;
/// <summary>
/// List of actuators, only used for analytics
/// </summary>
private IList<IActuator> m_Actuators;
/// <summary>
/// Whether or not we've tried to send analytics for this model. We only ever try to send once per policy,
/// and do additional deduplication in the analytics code.
/// </summary>

public BarracudaPolicy(
ActionSpec actionSpec,
IList<IActuator> actuators,
NNModel model,
InferenceDevice inferenceDevice,
string behaviorName

m_ModelRunner = modelRunner;
m_BehaviorName = behaviorName;
m_ActionSpec = actionSpec;
m_Actuators = actuators;
}
/// <inheritdoc />

m_BehaviorName,
m_ModelRunner.InferenceDevice,
sensors,
m_ActionSpec
m_ActionSpec,
m_Actuators
);
}
m_AgentId = info.episodeId;

6
com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs


"Either assign a model, or change to a different Behavior Type."
);
}
return new BarracudaPolicy(actionSpec, m_Model, m_InferenceDevice, m_BehaviorName);
return new BarracudaPolicy(actionSpec, actuatorManager, m_Model, m_InferenceDevice, m_BehaviorName);
return new RemotePolicy(actionSpec, FullyQualifiedBehaviorName);
return new RemotePolicy(actionSpec, actuatorManager, FullyQualifiedBehaviorName);
return new BarracudaPolicy(actionSpec, m_Model, m_InferenceDevice, m_BehaviorName);
return new BarracudaPolicy(actionSpec, actuatorManager, m_Model, m_InferenceDevice, m_BehaviorName);
}
else
{

10
com.unity.ml-agents/Runtime/Policies/RemotePolicy.cs


internal ICommunicator m_Communicator;
/// <summary>
/// List of actuators, only used for analytics
/// </summary>
private IList<IActuator> m_Actuators;
IList<IActuator> actuators,
string fullyQualifiedBehaviorName)
{
m_FullyQualifiedBehaviorName = fullyQualifiedBehaviorName;

m_Actuators = actuators;
}
/// <inheritdoc />

TrainingAnalytics.RemotePolicyInitialized(
m_FullyQualifiedBehaviorName,
sensors,
m_ActionSpec
m_ActionSpec,
m_Actuators
);
}
m_AgentId = info.episodeId;

38
com.unity.ml-agents/Runtime/Sensors/IBuiltInSensor.cs


/// </summary>
public enum BuiltInSensorType
{
/// <summary>
/// Default Sensor type if it cannot be determined.
/// </summary>
/// <summary>
/// The Vector sensor used by the agent.
/// </summary>
// Note that StackingSensor actually returns the wrapped sensor's type
/// <summary>
/// The Stacking Sensor type. NOTE: StackingSensor actually returns the wrapped sensor's type.
/// </summary>
/// <summary>
/// The RayPerception Sensor types, both 3D and 2D.
/// </summary>
/// <summary>
/// The observable attribute sensor type.
/// </summary>
/// <summary>
/// Sensors that use the Camera for observations.
/// </summary>
/// <summary>
/// Sensors that use RenderTextures for observations.
/// </summary>
/// <summary>
/// Sensors that use buffers or tensors for observations.
/// </summary>
/// <summary>
/// The sensors that observe properties of rigid bodies.
/// </summary>
/// <summary>
/// The sensors that observe Match 3 boards.
/// </summary>
/// <summary>
/// Sensors that break down the world into a grid of colliders to observe an area at a pre-defined granularity.
/// </summary>
GridSensor = 10
}

/// </summary>
public interface IBuiltInSensor
internal interface IBuiltInSensor
{
/// <summary>
/// Return the corresponding BuiltInSensorType for the sensor.

}
}

6
com.unity.ml-agents/Runtime/Sensors/ObservationWriter.cs


}
}
/// <summary>
/// Write the list of floats.
/// </summary>
/// <param name="data">The actual list of floats to write.</param>
/// <param name="writeOffset">Optional write offset to start writing from.</param>
public void AddList(IList<float> data, int writeOffset = 0)
{
if (m_Data != null)

var val = data[index];
m_Data[index + m_Offset + writeOffset] = val;
}
}
else

5
com.unity.ml-agents/Runtime/SideChannels/TrainingAnalyticsSideChannel.cs


namespace Unity.MLAgents.SideChannels
{
public class TrainingAnalyticsSideChannel : SideChannel
/// <summary>
/// Side Channel implementation for recording which training features are being used.
/// </summary>
internal class TrainingAnalyticsSideChannel : SideChannel
{
const string k_TrainingAnalyticsConfigId = "b664a4a9-d86f-5a5f-95cb-e8353a7e8356";

18
com.unity.ml-agents/Tests/Editor/Analytics/InferenceAnalyticsTests.cs


using System;
using System.Collections.Generic;
using NUnit.Framework;
using Unity.MLAgents.Sensors;

{
var sensors = new List<ISensor> { sensor_21_20_3.Sensor, sensor_20_22_3.Sensor };
var behaviorName = "continuousModel";
var actionSpec = GetContinuous2vis8vec2actionActionSpec();
var vectorActuator = new VectorActuator(null, actionSpec, "test'");
var actuators = new IActuator[] { vectorActuator };
InferenceDevice.CPU, sensors, GetContinuous2vis8vec2actionActionSpec()
InferenceDevice.CPU, sensors, actionSpec,
actuators
);
// The behavior name should be hashed, not pass-through.

Assert.AreEqual((int)DimensionProperty.None, continuousEvent.ObservationSpecs[0].DimensionInfos[2].Flags);
Assert.AreEqual("None", continuousEvent.ObservationSpecs[0].CompressionType);
Assert.AreEqual(Test3DSensor.k_BuiltInSensorType, continuousEvent.ObservationSpecs[0].BuiltInSensorType);
Assert.AreEqual((int)BuiltInActuatorType.VectorActuator, continuousEvent.ActuatorInfos[0].BuiltInActuatorType);
Assert.AreNotEqual(null, continuousEvent.ModelHash);
// Make sure nested fields get serialized

Assert.IsTrue(jsonString.Contains("NumDiscreteActions"));
Assert.IsTrue(jsonString.Contains("SensorName"));
Assert.IsTrue(jsonString.Contains("Flags"));
Assert.IsTrue(jsonString.Contains("ActuatorInfos"));
}
[Test]

using (new AnalyticsUtils.DisableAnalyticsSending())
{
var sensors = new List<ISensor> { sensor_21_20_3.Sensor, sensor_20_22_3.Sensor };
var policy = new BarracudaPolicy(GetContinuous2vis8vec2actionActionSpec(), continuousONNXModel, InferenceDevice.CPU, "testBehavior");
var policy = new BarracudaPolicy(
GetContinuous2vis8vec2actionActionSpec(),
Array.Empty<IActuator>(),
continuousONNXModel,
InferenceDevice.CPU,
"testBehavior"
);
policy.RequestDecision(new AgentInfo(), sensors);
}
Academy.Instance.Dispose();

38
com.unity.ml-agents/Tests/Editor/Analytics/TrainingAnalyticsTest.cs


using System;
using UnityEngine;
using Unity.Barracuda;
using UnityEditor;
namespace Unity.MLAgents.Tests.Analytics
{

}
[Test]
public void TestRemotePolicyEvent()
{
var behaviorName = "testBehavior";
var sensor1 = new Test3DSensor("SensorA", 21, 20, 3);
var sensor2 = new Test3DSensor("SensorB", 20, 22, 3);
var sensors = new List<ISensor> { sensor1, sensor2 };
var actionSpec = ActionSpec.MakeContinuous(2);
var vectorActuator = new VectorActuator(null, actionSpec, "test'");
var actuators = new IActuator[] { vectorActuator };
var remotePolicyEvent = TrainingAnalytics.GetEventForRemotePolicy(behaviorName, sensors, actionSpec, actuators);
// The behavior name should be hashed, not pass-through.
Assert.AreNotEqual(behaviorName, remotePolicyEvent.BehaviorName);
Assert.AreEqual(2, remotePolicyEvent.ObservationSpecs.Count);
Assert.AreEqual(3, remotePolicyEvent.ObservationSpecs[0].DimensionInfos.Length);
Assert.AreEqual(20, remotePolicyEvent.ObservationSpecs[0].DimensionInfos[0].Size);
Assert.AreEqual("None", remotePolicyEvent.ObservationSpecs[0].CompressionType);
Assert.AreEqual(Test3DSensor.k_BuiltInSensorType, remotePolicyEvent.ObservationSpecs[0].BuiltInSensorType);
Assert.AreEqual(2, remotePolicyEvent.ActionSpec.NumContinuousActions);
Assert.AreEqual(0, remotePolicyEvent.ActionSpec.NumDiscreteActions);
Assert.AreEqual(2, remotePolicyEvent.ActuatorInfos[0].NumContinuousActions);
Assert.AreEqual(0, remotePolicyEvent.ActuatorInfos[0].NumDiscreteActions);
}
[Test]
public void TestRemotePolicy()
{
if (Academy.IsInitialized)

using (new AnalyticsUtils.DisableAnalyticsSending())
{
var actionSpec = ActionSpec.MakeContinuous(3);
var policy = new RemotePolicy(actionSpec, "TestBehavior?team=42");
var policy = new RemotePolicy(actionSpec, Array.Empty<IActuator>(), "TestBehavior?team=42");
policy.RequestDecision(new AgentInfo(), new List<ISensor>());
}

86
config/ppo/Match3.yaml


default_settings:
trainer_type: ppo
hyperparameters:
batch_size: 16
buffer_size: 120
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: constant
network_settings:
normalize: true
hidden_units: 256
num_layers: 4
vis_encode_type: match3
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 5000000
time_horizon: 128
summary_freq: 10000
threaded: true
Match3VectorObs:
trainer_type: ppo
hyperparameters:
batch_size: 64
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: constant
network_settings:
normalize: true
hidden_units: 128
num_layers: 2
vis_encode_type: match3
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 5000000
time_horizon: 1000
summary_freq: 10000
threaded: true
Match3VisualObs:
trainer_type: ppo
hyperparameters:
batch_size: 64
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: constant
network_settings:
normalize: true
hidden_units: 128
num_layers: 2
vis_encode_type: match3
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 5000000
time_horizon: 1000
summary_freq: 10000
threaded: true
batch_size: 64
buffer_size: 128
batch_size: 16
buffer_size: 120
network_settings:
hidden_units: 4
num_layers: 1

Match3GreedyHeuristic:
Match3SmartHeuristic:
batch_size: 64
buffer_size: 128
batch_size: 16
buffer_size: 120
network_settings:
hidden_units: 4
num_layers: 1

1
config/ppo/PyramidsRND.yaml


strength: 0.01
network_settings:
hidden_units: 64
num_layers: 3
learning_rate: 0.0001
keep_checkpoints: 5
max_steps: 3000000

4
docs/Installation-Anaconda-Windows.md


the ml-agents Conda environment by typing `activate ml-agents`)_:
```sh
git clone --branch release_12 https://github.com/Unity-Technologies/ml-agents.git
git clone --branch release_13 https://github.com/Unity-Technologies/ml-agents.git
The `--branch release_12` option will switch to the tag of the latest stable
The `--branch release_13` option will switch to the tag of the latest stable
release. Omitting that will get the `master` branch which is potentially
unstable.

6
docs/Installation.md


of our tutorials / guides assume you have access to our example environments).
```sh
git clone --branch release_12 https://github.com/Unity-Technologies/ml-agents.git
git clone --branch release_13 https://github.com/Unity-Technologies/ml-agents.git
The `--branch release_12` option will switch to the tag of the latest stable
The `--branch release_13` option will switch to the tag of the latest stable
release. Omitting that will get the `master` branch which is potentially
unstable.

ML-Agents Toolkit for your purposes. If you plan to contribute those changes
back, make sure to clone the `master` branch (by omitting `--branch release_12`
back, make sure to clone the `master` branch (by omitting `--branch release_13`
from the command above). See our
[Contributions Guidelines](../com.unity.ml-agents/CONTRIBUTING.md) for more
information on contributing to the ML-Agents Toolkit.

4
docs/Learning-Environment-Examples.md


- Observations and actions are defined with a sensor and actuator respectively.
- Float Properties: None
- Benchmark Mean Reward:
- 37.2 for visual observations
- 37.6 for vector observations
- 39.5 for visual observations
- 38.5 for vector observations
- 34.2 for simple heuristic (pick a random valid move)
- 37.0 for greedy heuristic (pick the highest-scoring valid move)

2
docs/Training-on-Amazon-Web-Service.md


2. Clone the ML-Agents repo and install the required Python packages
```sh
git clone --branch release_12 https://github.com/Unity-Technologies/ml-agents.git
git clone --branch release_13 https://github.com/Unity-Technologies/ml-agents.git
cd ml-agents/ml-agents/
pip3 install -e .
```

4
docs/Unity-Inference-Engine.md


loading expects certain conventions for constants and tensor names. While it is
possible to construct a model that follows these conventions, we don't provide
any additional help for this. More details can be found in
[TensorNames.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/com.unity.ml-agents/Runtime/Inference/TensorNames.cs)
[TensorNames.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/com.unity.ml-agents/Runtime/Inference/TensorNames.cs)
[BarracudaModelParamLoader.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/com.unity.ml-agents/Runtime/Inference/BarracudaModelParamLoader.cs).
[BarracudaModelParamLoader.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/com.unity.ml-agents/Runtime/Inference/BarracudaModelParamLoader.cs).
If you wish to run inference on an externally trained model, you should use
Barracuda directly, instead of trying to run it through ML-Agents.

9
ml-agents/mlagents/trainers/action_info.py


class ActionInfo(NamedTuple):
"""
A NamedTuple containing actions and related quantities to the policy forward
pass. Additionally contains the agent ids in the corresponding DecisionStep
:param action: The action output of the policy
:param env_action: The possibly clipped action to be executed in the environment
:param outputs: Dict of all quantities associated with the policy forward pass
:param agent_ids: List of int agent ids in DecisionStep
"""
action: Any
env_action: Any
outputs: ActionInfoOutputs

49
ml-agents/mlagents/trainers/agent_processor.py


"""
self.experience_buffers: Dict[str, List[AgentExperience]] = defaultdict(list)
self.last_step_result: Dict[str, Tuple[DecisionStep, int]] = {}
# current_group_obs is used to collect the last seen obs of all the agents in the same group,
# and assemble the group obs.
# current_group_obs is used to collect the current, most recently seen
# obs of all the agents in the same group, and assemble the group obs.
# last_group_obs is used to collect the last seen obs of all the agents in the same group,
# and assemble the group obs.
# group_status is used to collect the current, most recently seen
# group status of all the agents in the same group, and assemble the group obs.
self.group_status: Dict[str, Dict[str, GroupmateStatus]] = defaultdict(
lambda: defaultdict(None)
)

if global_id in self.last_step_result: # Don't store if agent just reset
self.last_take_action_outputs[global_id] = take_action_outputs
# Iterate over all the terminal steps, first gather all the teammate obs
# and then create the AgentExperiences/Trajectories
# Iterate over all the terminal steps, first gather all the group obs
# and then create the AgentExperiences/Trajectories. _add_to_group_status
# stores Group statuses in a common data structure self.group_status
self._gather_group_obs(terminal_step, worker_id)
self._add_to_group_status(terminal_step, worker_id)
for terminal_step in terminal_steps.values():
local_id = terminal_step.agent_id
global_id = get_global_agent_id(worker_id, local_id)

# Clear the last seen group obs when agents die.
self._clear_group_obs(global_id)
# Clean the last experience dictionary for terminal steps
for terminal_step in terminal_steps.values():
local_id = terminal_step.agent_id
global_id = get_global_agent_id(worker_id, local_id)
# Iterate over all the decision steps, first gather all the teammate obs
# and then create the trajectories
# Iterate over all the decision steps, first gather all the group obs
# and then create the trajectories. _add_to_group_status
# stores Group statuses in a common data structure self.group_status
self._gather_group_obs(ongoing_step, worker_id)
self._add_to_group_status(ongoing_step, worker_id)
for ongoing_step in decision_steps.values():
local_id = ongoing_step.agent_id
self._process_step(

[_gid], take_action_outputs["action"]
)
def _gather_group_obs(
def _add_to_group_status(
"""
Takes a TerminalStep or DecisionStep and adds the information in it
to self.group_status. This information can then be retrieved
when constructing trajectories to get the status of group mates.
:param step: TerminalStep or DecisionStep
:param worker_id: Worker ID of this particular environment. Used to generate a
global group id.
"""
global_agent_id = get_global_agent_id(worker_id, step.agent_id)
stored_decision_step, idx = self.last_step_result.get(
global_agent_id, (None, None)

)
if stored_decision_step is not None and stored_take_action_outputs is not None:
# 0, the default group_id, means that the agent doesn't belong to an agent group.
# If 0, don't add any groupmate information.
if step.group_id > 0:
global_group_id = get_global_group_id(worker_id, step.group_id)
stored_actions = stored_take_action_outputs["action"]

if stored_decision_step is not None and stored_take_action_outputs is not None:
obs = stored_decision_step.obs
if self.policy.use_recurrent:
memory = self.policy.retrieve_memories([global_agent_id])[0, :]
memory = self.policy.retrieve_previous_memories([global_agent_id])[0, :]
else:
memory = None
done = terminated # Since this is an ongoing step

# Assemble teammate_obs. If none saved, then it will be an empty list.
group_statuses = []
for _id, _obs in self.group_status[global_group_id].items():
for _id, _mate_status in self.group_status[global_group_id].items():
group_statuses.append(_obs)
group_statuses.append(_mate_status)
experience = AgentExperience(
obs=obs,

):
next_obs = step.obs
next_group_obs = []
for _id, _exp in self.current_group_obs[global_group_id].items():
for _id, _obs in self.current_group_obs[global_group_id].items():
next_group_obs.append(_exp)
next_group_obs.append(_obs)
trajectory = Trajectory(
steps=self.experience_buffers[global_agent_id],

53
ml-agents/mlagents/trainers/buffer.py


ENVIRONMENT_REWARDS = "environment_rewards"
MASKS = "masks"
MEMORY = "memory"
CRITIC_MEMORY = "critic_memory"
PREV_ACTION = "prev_action"
ADVANTAGES = "advantages"

class AgentBufferField(list):
"""
AgentBufferField is a list of numpy arrays. When an agent collects a field, you can add it to its
AgentBufferField with the append method.
AgentBufferField is a list of numpy arrays, or List[np.ndarray] for group entries.
When an agent collects a field, you can add it to its AgentBufferField with the append method.
"""
def __init__(self):

def __str__(self):
def __str__(self) -> str:
return str(np.array(self).shape)
def append(self, element: np.ndarray, padding_value: float = 0.0) -> None:

super().append(element)
self.padding_value = padding_value
def set(self, data):
def set(self, data: List[BufferEntry]) -> None:
Sets the list of np.array to the input data
:param data: The np.array list to be set.
Sets the list of BufferEntry to the input data
:param data: The BufferEntry list to be set.
"""
self[:] = []
self[:] = data

batch_size: int = None,
training_length: Optional[int] = 1,
sequential: bool = True,
) -> np.ndarray:
) -> List[BufferEntry]:
"""
Retrieve the last batch_size elements of length training_length
from the list of np.array

Resets the AgentBufferField
"""
self[:] = []
def padded_to_batch(
self, pad_value: np.float = 0, dtype: np.dtype = np.float32
) -> Union[np.ndarray, List[np.ndarray]]:
"""
Converts this AgentBufferField (which is a List[BufferEntry]) into a numpy array
with first dimension equal to the length of this AgentBufferField. If this AgentBufferField
contains a List[List[BufferEntry]] (i.e., in the case of group observations), return a List
containing numpy arrays or tensors, of length equal to the maximum length of an entry. Missing
For entries with less than that length, the array will be padded with pad_value.
:param pad_value: Value to pad List AgentBufferFields, when there are less than the maximum
number of agents present.
:param dtype: Dtype of output numpy array.
:return: Numpy array or List of numpy arrays representing this AgentBufferField, where the first
dimension is equal to the length of the AgentBufferField.
"""
if len(self) > 0 and not isinstance(self[0], list):
return np.asanyarray(self, dytpe=dtype)
shape = None
for _entry in self:
# _entry could be an empty list if there are no group agents in this
# step. Find the first non-empty list and use that shape.
if _entry:
shape = _entry[0].shape
break
# If there were no groupmate agents in the entire batch, return an empty List.
if shape is None:
return []
# Convert to numpy array while padding with 0's
new_list = list(
map(
lambda x: np.asanyarray(x, dtype=dtype),
itertools.zip_longest(*self, fillvalue=np.full(shape, pad_value)),
)
)
return new_list
class AgentBuffer(MutableMapping):

15
ml-agents/mlagents/trainers/policy/policy.py


self.network_settings: NetworkSettings = trainer_settings.network_settings
self.seed = seed
self.previous_action_dict: Dict[str, np.ndarray] = {}
self.previous_memory_dict: Dict[str, np.ndarray] = {}
self.memory_dict: Dict[str, np.ndarray] = {}
self.normalize = trainer_settings.network_settings.normalize
self.use_recurrent = self.network_settings.memory is not None

if memory_matrix is None:
return
# Pass old memories into previous_memory_dict
for agent_id in agent_ids:
if agent_id in self.memory_dict:
self.previous_memory_dict[agent_id] = self.memory_dict[agent_id]
for index, agent_id in enumerate(agent_ids):
self.memory_dict[agent_id] = memory_matrix[index, :]

memory_matrix[index, :] = self.memory_dict[agent_id]
return memory_matrix
def retrieve_previous_memories(self, agent_ids: List[str]) -> np.ndarray:
memory_matrix = np.zeros((len(agent_ids), self.m_size), dtype=np.float32)
for index, agent_id in enumerate(agent_ids):
if agent_id in self.previous_memory_dict:
memory_matrix[index, :] = self.previous_memory_dict[agent_id]
return memory_matrix
if agent_id in self.previous_memory_dict:
self.previous_memory_dict.pop(agent_id)
def make_empty_previous_action(self, num_agents: int) -> np.ndarray:
"""

2
ml-agents/mlagents/trainers/ppo/trainer.py


int(self.hyperparameters.batch_size / self.policy.sequence_length), 1
)
advantages = self.update_buffer[BufferKey.ADVANTAGES].get_batch()
advantages = np.array(self.update_buffer[BufferKey.ADVANTAGES].get_batch())
self.update_buffer[BufferKey.ADVANTAGES].set(
(advantages - advantages.mean()) / (advantages.std() + 1e-10)
)

62
ml-agents/mlagents/trainers/sac/optimizer_torch.py


reward_signal_configs = trainer_params.reward_signals
reward_signal_names = [key.value for key, _ in reward_signal_configs.items()]
if policy.shared_critic:
self.value_network = policy.actor
else:
self.value_network = ValueNetwork(
reward_signal_names,
policy.behavior_spec.observation_specs,
policy.network_settings,
)
raise UnityTrainerException("SAC does not support SharedActorCritic")
self._critic = ValueNetwork(
reward_signal_names,
policy.behavior_spec.observation_specs,
policy.network_settings,
)
hyperparameters: SACSettings = cast(SACSettings, trainer_params.hyperparameters)
self.tau = hyperparameters.tau

self.policy.behavior_spec.observation_specs,
policy_network_settings,
)
ModelUtils.soft_update(self.value_network, self.target_network, 1.0)
ModelUtils.soft_update(self._critic, self.target_network, 1.0)
# We create one entropy coefficient per action, whether discrete or continuous.
_disc_log_ent_coef = torch.nn.Parameter(

)
policy_params = list(self.policy.actor.parameters())
value_params = list(self.q_network.parameters()) + list(
self.value_network.parameters()
self._critic.parameters()
)
logger.debug("value_vars")

@property
def critic(self):
return self.value_network
return self._critic
self.value_network.to(device)
self._critic.to(device)
self.q_network.to(device)
def sac_q_loss(

for i in range(0, len(batch[BufferKey.MEMORY]), self.policy.sequence_length)
]
# LSTM shouldn't have sequence length <1, but stop it from going out of the index if true.
value_memories_list = [
ModelUtils.list_to_tensor(batch[BufferKey.CRITIC_MEMORY][i])
for i in range(
0, len(batch[BufferKey.CRITIC_MEMORY]), self.policy.sequence_length
)
]
next_memories_list = [
next_value_memories_list = [
batch[BufferKey.MEMORY][i]
batch[BufferKey.CRITIC_MEMORY][i]
offset, len(batch[BufferKey.MEMORY]), self.policy.sequence_length
offset, len(batch[BufferKey.CRITIC_MEMORY]), self.policy.sequence_length
next_memories = torch.stack(next_memories_list).unsqueeze(0)
value_memories = torch.stack(value_memories_list).unsqueeze(0)
next_value_memories = torch.stack(next_value_memories_list).unsqueeze(0)
next_memories = None
value_memories = None
next_value_memories = None
torch.zeros_like(next_memories) if next_memories is not None else None
)
v_memories = (
torch.zeros_like(next_memories) if next_memories is not None else None
torch.zeros_like(next_value_memories)
if next_value_memories is not None
else None
)
# Copy normalizers from policy

self.target_network.network_body.copy_normalization(
self.policy.actor.network_body
)
self.value_network.network_body.copy_normalization(
self.policy.actor.network_body
)
self._critic.network_body.copy_normalization(self.policy.actor.network_body)
sampled_actions, log_probs, _, _, = self.policy.actor.get_action_and_stats(
current_obs,
masks=act_masks,

value_estimates, _ = self.value_network.critic_pass(
current_obs, v_memories, sequence_length=self.policy.sequence_length
value_estimates, _ = self._critic.critic_pass(
current_obs, value_memories, sequence_length=self.policy.sequence_length
)
cont_sampled_actions = sampled_actions.continuous_tensor

with torch.no_grad():
target_values, _ = self.target_network(
next_obs,
memories=next_memories,
memories=next_value_memories,
sequence_length=self.policy.sequence_length,
)
masks = ModelUtils.list_to_tensor(batch[BufferKey.MASKS], dtype=torch.bool)

policy_loss = self.sac_policy_loss(log_probs, q1p_out, masks)
entropy_loss = self.sac_entropy_loss(log_probs, masks)
total_value_loss = q1_loss + q2_loss
total_value_loss = q1_loss + q2_loss
total_value_loss = q1_loss + q2_loss + value_loss
total_value_loss += value_loss
decay_lr = self.decay_learning_rate.get_value(self.policy.get_current_step())
ModelUtils.update_learning_rate(self.policy_optimizer, decay_lr)

self.entropy_optimizer.step()
# Update target network
ModelUtils.soft_update(self.value_network, self.target_network, self.tau)
ModelUtils.soft_update(self._critic, self.target_network, self.tau)
update_stats = {
"Losses/Policy Loss": policy_loss.item(),
"Losses/Value Loss": value_loss.item(),

9
ml-agents/mlagents/trainers/sac/trainer.py


self.collected_rewards[name][agent_id] += np.sum(evaluate_result)
# Get all value estimates for reporting purposes
value_estimates, _ = self.optimizer.get_trajectory_value_estimates(
(
value_estimates,
_,
value_memories,
) = self.optimizer.get_trajectory_value_estimates(
if value_memories is not None:
agent_buffer_trajectory[BufferKey.CRITIC_MEMORY].set(value_memories)
for name, v in value_estimates.items():
self._stats_reporter.add_stat(
f"Policy/{self.optimizer.reward_signals[name].name.capitalize()} Value",

6
ml-agents/mlagents/trainers/tests/test_agent_processor.py


def create_mock_policy():
mock_policy = mock.Mock()
mock_policy.reward_signals = {}
mock_policy.retrieve_memories.return_value = np.zeros((1, 1), dtype=np.float32)
mock_policy.retrieve_previous_memories.return_value = np.zeros(
(1, 1), dtype=np.float32
)
mock_policy.retrieve_previous_action.return_value = np.zeros((1, 1), dtype=np.int32)
return mock_policy

fake_action_info = ActionInfo(
action=ActionTuple(continuous=np.array([[0.1]], dtype=np.float32)),
env_action=ActionTuple(continuous=np.array([[0.1]], dtype=np.float32)),
value=[0.1],
outputs=fake_action_outputs,
agent_ids=mock_decision_step.agent_id,
)

fake_action_info = ActionInfo(
action=ActionTuple(continuous=np.array([[0.1]], dtype=np.float32)),
env_action=ActionTuple(continuous=np.array([[0.1]], dtype=np.float32)),
value=[0.1],
outputs=fake_action_outputs,
agent_ids=mock_decision_step.agent_id,
)

23
ml-agents/mlagents/trainers/tests/test_buffer.py


dtype=np.float32,
)
)
b[BufferKey.GROUP_CONTINUOUS_ACTION].append(
[
np.array(
[
100 * fake_agent_id + 10 * step + 4,
100 * fake_agent_id + 10 * step + 5,
],
dtype=np.float32,
)
]
* 3
)
return b

agent_3_buffer = construct_fake_buffer(3)
# Test get_batch
a = agent_1_buffer[ObsUtil.get_name_at(0)].get_batch(
batch_size=2, training_length=1, sequential=True
)

# Test get_batch
a = agent_2_buffer[ObsUtil.get_name_at(0)].get_batch(
batch_size=2, training_length=3, sequential=True
)

]
),
)
# Test group entries return Lists of Lists
a = agent_2_buffer[BufferKey.GROUP_CONTINUOUS_ACTION].get_batch(
batch_size=2, training_length=1, sequential=True
)
for _group_entry in a:
assert len(_group_entry) == 3
agent_1_buffer.reset_agent()
assert agent_1_buffer.num_experiences == 0
update_buffer = AgentBuffer()

34
ml-agents/mlagents/trainers/tests/test_trajectory.py


import numpy as np
from mlagents.trainers.trajectory import GroupObsUtil
from mlagents.trainers.buffer import BufferKey, ObservationKeyPrefix
from mlagents.trainers.buffer import AgentBuffer, BufferKey, ObservationKeyPrefix
VEC_OBS_SIZE = 6
ACTION_SIZE = 4

for _key in wanted_group_keys:
for step in agentbuffer[_key]:
assert len(step) == 4
def test_obsutil_group_from_buffer():
buff = AgentBuffer()
# Create some obs
for _ in range(3):
buff[GroupObsUtil.get_name_at(0)].append(3 * [np.ones((5,), dtype=np.float32)])
# Some agents have died
for _ in range(2):
buff[GroupObsUtil.get_name_at(0)].append(1 * [np.ones((5,), dtype=np.float32)])
# Get the group obs, which will be a List of Lists of np.ndarray, where each element is the same
# length as the AgentBuffer but contains only one agent's obs. Dead agents are padded by
# NaNs.
gobs = GroupObsUtil.from_buffer(buff, 1)
# Agent 0 is full
agent_0_obs = gobs[0]
for obs in agent_0_obs:
assert obs.shape == (buff.num_experiences, 5)
assert not np.isnan(obs).any()
agent_1_obs = gobs[1]
for obs in agent_1_obs:
assert obs.shape == (buff.num_experiences, 5)
for i, _exp_obs in enumerate(obs):
if i >= 3:
assert np.isnan(_exp_obs).all()
else:
assert not np.isnan(_exp_obs).any()

12
ml-agents/mlagents/trainers/tests/torch/test_ppo.py


RewardSignalUtil.value_estimates_key("extrinsic"),
],
)
# Copy memories to critic memories
copy_buffer_fields(update_buffer, BufferKey.MEMORY, [BufferKey.CRITIC_MEMORY])
return_stats = optimizer.update(
update_buffer,

RewardSignalUtil.value_estimates_key("curiosity"),
],
)
# Copy memories to critic memories
copy_buffer_fields(update_buffer, BufferKey.MEMORY, [BufferKey.CRITIC_MEMORY])
optimizer.update(
update_buffer,

action_spec=DISCRETE_ACTION_SPEC if discrete else CONTINUOUS_ACTION_SPEC,
max_step_complete=True,
)
run_out, final_value_out = optimizer.get_trajectory_value_estimates(
run_out, final_value_out, all_memories = optimizer.get_trajectory_value_estimates(
if all_memories is not None:
assert len(all_memories) == 15
run_out, final_value_out = optimizer.get_trajectory_value_estimates(
run_out, final_value_out, _ = optimizer.get_trajectory_value_estimates(
trajectory.to_agentbuffer(), trajectory.next_obs, done=True
)
for key, val in final_value_out.items():

# Check if we ignore terminal states properly
optimizer.reward_signals["extrinsic"].use_terminal_states = False
run_out, final_value_out = optimizer.get_trajectory_value_estimates(
run_out, final_value_out, _ = optimizer.get_trajectory_value_estimates(
trajectory.to_agentbuffer(), trajectory.next_obs, done=False
)
for key, val in final_value_out.items():

2
ml-agents/mlagents/trainers/tests/torch/test_sac.py


update_buffer[RewardSignalUtil.rewards_key("extrinsic")] = update_buffer[
BufferKey.ENVIRONMENT_REWARDS
]
# Mock out value memories
update_buffer[BufferKey.CRITIC_MEMORY] = update_buffer[BufferKey.MEMORY]
return_stats = optimizer.update(
update_buffer,
num_sequences=update_buffer.num_experiences // optimizer.policy.sequence_length,

4
ml-agents/mlagents/trainers/tests/torch/test_simple_rl.py


new_hyperparams = attr.evolve(
SAC_TORCH_CONFIG.hyperparameters,
batch_size=256,
learning_rate=1e-4,
learning_rate=3e-4,
buffer_init_steps=1000,
steps_per_update=2,
)

num_visual=1,
num_vector=0,
action_sizes=action_sizes,
step_size=0.2,
step_size=0.3,
)
bc_settings = BehavioralCloningSettings(demo_path=demo_path, steps=1500)
reward_signals = {

79
ml-agents/mlagents/trainers/torch/agent_action.py


import numpy as np
from mlagents.torch_utils import torch
from mlagents.trainers.buffer import AgentBuffer, BufferKey, AgentBufferField
from mlagents.trainers.buffer import AgentBuffer, BufferKey
from mlagents.trainers.torch.utils import ModelUtils
from mlagents_envs.base_env import ActionTuple

return AgentAction(continuous, discrete)
@staticmethod
def _padded_time_to_batch(
agent_buffer_field: AgentBufferField, dtype: torch.dtype = torch.float32
) -> List[torch.Tensor]:
"""
Pad actions and convert to tensor. Note that data is padded by 0's, not NaNs
as the observations are.
"""
action_shape = None
for _action in agent_buffer_field:
if _action:
action_shape = _action[0].shape
break
# If there were no critic obs at all
if action_shape is None:
return []
new_list = list(
map(
lambda x: ModelUtils.list_to_tensor(x, dtype=dtype),
itertools.zip_longest(
*agent_buffer_field, fillvalue=np.full(action_shape, 0)
),
)
)
return new_list
@staticmethod
def _group_from_buffer(
def _group_agent_action_from_buffer(
"""
Extracts continuous and discrete groupmate actions, as specified by BufferKey, and
returns a List of AgentActions that correspond to the groupmate's actions. List will
be of length equal to the maximum number of groupmates in the buffer. Any spots where
there are less agents than maximum, the actions will be padded with 0's.
"""
discrete_tensors: List[torch.Tensor] = [] # type: ignore
discrete_tensors: List[torch.Tensor] = []
continuous_tensors = AgentAction._padded_time_to_batch(
buff[cont_action_key]
)
padded_batch = buff[cont_action_key].padded_to_batch()
continuous_tensors = [
ModelUtils.list_to_tensor(arr) for arr in padded_batch
]
discrete_tensors = AgentAction._padded_time_to_batch(
buff[disc_action_key], dtype=torch.long
)
padded_batch = buff[disc_action_key].padded_to_batch(dtype=np.long)
discrete_tensors = [
ModelUtils.list_to_tensor(arr, dtype=torch.long) for arr in padded_batch
]
actions_list = []
for _cont, _disc in itertools.zip_longest(

@staticmethod
def group_from_buffer(buff: AgentBuffer) -> List["AgentAction"]:
"""
A static method that accesses continuous and discrete action fields in an AgentBuffer
and constructs the corresponding AgentAction from the retrieved np arrays.
A static method that accesses next group continuous and discrete action fields in an AgentBuffer
and constructs a padded List of AgentActions that represent the group agent actions.
The List is of length equal to max number of groupmate agents in the buffer, and the AgentBuffer iss
of the same length as the buffer. Empty spots (e.g. when agents die) are padded with 0.
:param buff: AgentBuffer of a batch or trajectory
:return: List of groupmate's AgentActions
return AgentAction._group_from_buffer(
return AgentAction._group_agent_action_from_buffer(
buff, BufferKey.GROUP_CONTINUOUS_ACTION, BufferKey.GROUP_DISCRETE_ACTION
)

A static method that accesses next continuous and discrete action fields in an AgentBuffer
and constructs the corresponding AgentAction from the retrieved np arrays.
A static method that accesses next group continuous and discrete action fields in an AgentBuffer
and constructs a padded List of AgentActions that represent the next group agent actions.
The List is of length equal to max number of groupmate agents in the buffer, and the AgentBuffer iss
of the same length as the buffer. Empty spots (e.g. when agents die) are padded with 0.
:param buff: AgentBuffer of a batch or trajectory
:return: List of groupmate's AgentActions
return AgentAction._group_from_buffer(
return AgentAction._group_agent_action_from_buffer(
"""
Flatten this AgentAction into a single torch Tensor of dimension (batch, num_continuous + num_one_hot_discrete).
Discrete actions are converted into one-hot and concatenated with continuous actions.
:param discrete_branches: List of sizes for discrete actions.
:return: Tensor of flattened actions.
"""
discrete_oh = ModelUtils.actions_to_onehot(
self.discrete_tensor, discrete_branches
)

39
ml-agents/mlagents/trainers/trajectory.py


from typing import List, NamedTuple
import itertools
AgentBufferField,
ObservationKeyPrefix,
AgentBufferKey,
BufferKey,

return ObservationKeyPrefix.NEXT_GROUP_OBSERVATION, index
@staticmethod
def _padded_time_to_batch(
agent_buffer_field: AgentBufferField,
) -> List[np.ndarray]:
"""
Convert an AgentBufferField of List of obs, where one of the dimension is time and the other is number (e.g.
in the case of a variable number of critic observations) to a List of obs, where time is in the batch dimension
of the obs, and the List is the variable number of agents. For cases where there are varying number of agents,
pad the non-existent agents with NaN.
"""
# Find the first observation. This should be USUALLY O(1)
obs_shape = None
for _group_obs in agent_buffer_field:
if _group_obs:
obs_shape = _group_obs[0].shape
break
# If there were no critic obs at all
if obs_shape is None:
return []
new_list = list(
map(
lambda x: np.asanyarray(x),
itertools.zip_longest(
*agent_buffer_field, fillvalue=np.full(obs_shape, np.nan)
),
)
)
return new_list
@staticmethod
def _transpose_list_of_lists(
list_list: List[List[np.ndarray]],
) -> List[List[np.ndarray]]:

separated_obs: List[np.array] = []
for i in range(num_obs):
separated_obs.append(
GroupObsUtil._padded_time_to_batch(batch[GroupObsUtil.get_name_at(i)])
batch[GroupObsUtil.get_name_at(i)].padded_to_batch(pad_value=np.nan)
)
# separated_obs contains a List(num_obs) of Lists(num_agents), we want to flip
# that and get a List(num_agents) of Lists(num_obs)

separated_obs: List[np.array] = []
for i in range(num_obs):
separated_obs.append(
GroupObsUtil._padded_time_to_batch(
batch[GroupObsUtil.get_name_at_next(i)]
batch[GroupObsUtil.get_name_at_next(i)].padded_to_batch(
pad_value=np.nan
)
)
# separated_obs contains a List(num_obs) of Lists(num_agents), we want to flip

1
utils/make_readme_table.py


ReleaseInfo("release_10", "1.6.0", "0.22.0", "November 18, 2020"),
ReleaseInfo("release_11", "1.7.0", "0.23.0", "December 21, 2020"),
ReleaseInfo("release_12", "1.7.2", "0.23.0", "December 22, 2020"),
ReleaseInfo("release_13", "1.8.0", "0.24.0", "February 17, 2021"),
# Verified releases
ReleaseInfo("", "1.0.6", "0.16.1", "November 16, 2020", is_verified=True),
ReleaseInfo("", "1.0.5", "0.16.1", "September 23, 2020", is_verified=True),

69
utils/validate_release_links.py


import os
import re
import subprocess
import tempfile
MATCH_ANY = re.compile(r"(?s).*")
"README.md": re.compile(r"\*\*Release [0-9]+\*\*"),
"docs/Versioning.md": None,
"com.unity.ml-agents/CHANGELOG.md": None,
"utils/make_readme_table.py": None,
"utils/validate_release_links.py": None,
"README.md": re.compile(r"\*\*(Verified Package ([0-9]\.?)*|Release [0-9]+)\*\*"),
"docs/Versioning.md": MATCH_ANY,
"com.unity.ml-agents/CHANGELOG.md": MATCH_ANY,
"utils/make_readme_table.py": MATCH_ANY,
"utils/validate_release_links.py": MATCH_ANY,
}

raise RuntimeError("Can't determine release tag")
def check_file(filename: str, global_allow_pattern: Pattern) -> List[str]:
def check_file(
filename: str, global_allow_pattern: Pattern, release_tag: str
) -> List[str]:
with open(filename) as f:
for line in f:
if not RELEASE_PATTERN.search(line):
continue
with tempfile.TemporaryDirectory() as tempdir:
if not os.path.exists(tempdir):
os.makedirs(tempdir)
new_file_name = os.path.join(tempdir, os.path.basename(filename))
with open(new_file_name, "w+") as new_file:
# default to match everything if there is nothing in the ALLOW_LIST
allow_list_pattern = ALLOW_LIST.get(filename, None)
with open(filename) as f:
for line in f:
keep_line = True
keep_line = not RELEASE_PATTERN.search(line)
keep_line |= global_allow_pattern.search(line) is not None
keep_line |= (
allow_list_pattern is not None
and allow_list_pattern.search(line) is not None
)
if global_allow_pattern.search(line):
continue
if filename in ALLOW_LIST:
if ALLOW_LIST[filename] is None or ALLOW_LIST[filename].search(line):
continue
if keep_line:
new_file.write(line)
else:
bad_lines.append(f"{filename}: {line}")
new_file.write(
re.sub(r"release_[0-9]+", fr"{release_tag}", line)
)
if bad_lines:
if os.path.exists(filename):
os.remove(filename)
os.rename(new_file_name, filename)
bad_lines.append(f"{filename}: {line.strip()}")
def check_all_files(allow_pattern: Pattern) -> List[str]:
def check_all_files(allow_pattern: Pattern, release_tag: str) -> List[str]:
"""
Validate all files tracked by git.
:param allow_pattern:

for file_name in git_ls_files():
if "localized" in file_name or os.path.splitext(file_name)[1] not in file_types:
continue
bad_lines += check_file(file_name, allow_pattern)
bad_lines += check_file(file_name, allow_pattern, release_tag)
return bad_lines

print(f"Release tag: {release_tag}")
allow_pattern = re.compile(f"{release_tag}(_docs)*")
bad_lines = check_all_files(allow_pattern)
bad_lines = check_all_files(allow_pattern, release_tag)
for line in bad_lines:
print(line)
print("*************************************************************")
f"Found lines referring to previous release. Either update the files, or add an exclusion to {__file__}"
"This script attempted to fix the above errors. Please double "
+ "check them to make sure the replacements were done correctly"
for line in bad_lines:
print(line)
sys.exit(1 if bad_lines else 0)

1001
Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx
文件差异内容过多而无法显示
查看文件

15
Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx.meta


fileFormatVersion: 2
guid: 28ccdfd7cb3d941ce8af0ab89e06130a
ScriptedImporter:
fileIDToRecycleName:
11400000: main obj
11400002: model data
externalObjects: {}
userData:
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 683b6cb6d0a474744822c888b46772c9, type: 3}
optimizeModel: 1
forceArbitraryBatchSize: 1
treatErrorsAsWarnings: 0
importMode: 1

49
com.unity.ml-agents/Runtime/Actuators/IBuiltInActuator.cs


namespace Unity.MLAgents.Actuators
{
/// <summary>
/// Identifiers for "built in" actuator types.
/// These are only used for analytics, and should not be used for any runtime decisions.
///
/// NOTE: Do not renumber these, since the values are used for analytics. Renaming is allowed though.
/// </summary>
public enum BuiltInActuatorType
{
/// <summary>
/// Default Sensor type if it cannot be determined.
/// </summary>
Unknown = 0,
/// <summary>
/// VectorActuator used by the Agent
/// </summary>
AgentVectorActuator = 1,
/// <summary>
/// Corresponds to <see cref="VectorActuator"/>
/// </summary>
VectorActuator = 2,
/// <summary>
/// Corresponds to the Match3Actuator in com.unity.ml-agents.extensions.
/// </summary>
Match3Actuator = 3,
/// <summary>
/// Corresponds to the InputActionActuator in com.unity.ml-agents.extensions.
/// </summary>
InputActionActuator = 4,
}
/// <summary>
/// Interface for actuators that are provided as part of ML-Agents.
/// User-implemented actuators don't need to use this interface.
/// </summary>
internal interface IBuiltInActuator
{
/// <summary>
/// Return the corresponding BuiltInActuatorType for the actuator.
/// </summary>
/// <returns>A BuiltInActuatorType corresponding to the actuator.</returns>
BuiltInActuatorType GetBuiltInActuatorType();
}
}

3
com.unity.ml-agents/Runtime/Actuators/IBuiltInActuator.cs.meta


fileFormatVersion: 2
guid: e3d7ef9a9a5043549cc5c0bbee520810
timeCreated: 1613514041

52
ml-agents/mlagents/trainers/tests/torch/test_agent_action.py


import numpy as np
from mlagents.torch_utils import torch
from mlagents.trainers.buffer import AgentBuffer, BufferKey
from mlagents.trainers.torch.agent_action import AgentAction
def test_agent_action_group_from_buffer():
buff = AgentBuffer()
# Create some actions
for _ in range(3):
buff[BufferKey.GROUP_CONTINUOUS_ACTION].append(
3 * [np.ones((5,), dtype=np.float32)]
)
buff[BufferKey.GROUP_DISCRETE_ACTION].append(
3 * [np.ones((4,), dtype=np.float32)]
)
# Some agents have died
for _ in range(2):
buff[BufferKey.GROUP_CONTINUOUS_ACTION].append(
1 * [np.ones((5,), dtype=np.float32)]
)
buff[BufferKey.GROUP_DISCRETE_ACTION].append(
1 * [np.ones((4,), dtype=np.float32)]
)
# Get the group actions, which will be a List of Lists of AgentAction, where each element is the same
# length as the AgentBuffer but contains only one agent's obs. Dead agents are padded by
# NaNs.
gact = AgentAction.group_from_buffer(buff)
# Agent 0 is full
agent_0_act = gact[0]
assert agent_0_act.continuous_tensor.shape == (buff.num_experiences, 5)
assert agent_0_act.discrete_tensor.shape == (buff.num_experiences, 4)
agent_1_act = gact[1]
assert agent_1_act.continuous_tensor.shape == (buff.num_experiences, 5)
assert agent_1_act.discrete_tensor.shape == (buff.num_experiences, 4)
assert (agent_1_act.continuous_tensor[0:3] > 0).all()
assert (agent_1_act.continuous_tensor[3:] == 0).all()
assert (agent_1_act.discrete_tensor[0:3] > 0).all()
assert (agent_1_act.discrete_tensor[3:] == 0).all()
def test_to_flat():
aa = AgentAction(
torch.tensor([[1.0, 1.0, 1.0]]), [torch.tensor([2]), torch.tensor([1])]
)
flattened_actions = aa.to_flat([3, 3])
assert torch.eq(
flattened_actions, torch.tensor([[1, 1, 1, 0, 0, 1, 0, 1, 0]])
).all()

1001
Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn
文件差异内容过多而无法显示
查看文件

11
Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn.meta


fileFormatVersion: 2
guid: 48d14da88fea74d0693c691c6e3f2e34
ScriptedImporter:
fileIDToRecycleName:
11400000: main obj
11400002: model data
externalObjects: {}
userData:
assetBundleName:
assetBundleVariant:
script: {fileID: 11500000, guid: 19ed1486aa27d4903b34839f37b8f69f, type: 3}
正在加载...
取消
保存