浏览代码

Merge branch 'master' into asymm-envs

/asymm-envs
Andrew Cohen 4 年前
当前提交
59a60c1e
共有 185 个文件被更改,包括 1744 次插入918 次删除
  1. 12
      .circleci/config.yml
  2. 7
      .pre-commit-config.yaml
  3. 6
      .yamato/com.unity.ml-agents-pack.yml
  4. 55
      .yamato/com.unity.ml-agents-test.yml
  5. 8
      Project/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs
  6. 8
      Project/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs
  7. 6
      Project/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs
  8. 7
      Project/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorAgent.cs
  9. 10
      Project/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorSettings.cs
  10. 9
      Project/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs
  11. 20
      Project/Assets/ML-Agents/Examples/GridWorld/Scripts/GridArea.cs
  12. 2
      Project/Assets/ML-Agents/Examples/GridWorld/Scripts/GridSettings.cs
  13. 4
      Project/Assets/ML-Agents/Examples/Hallway/Scripts/HallwayAgent.cs
  14. 20
      Project/Assets/ML-Agents/Examples/PushBlock/Scripts/PushAgentBasic.cs
  15. 2
      Project/Assets/ML-Agents/Examples/Pyramids/Scripts/PyramidAgent.cs
  16. 13
      Project/Assets/ML-Agents/Examples/Reacher/Scripts/ReacherAgent.cs
  17. 8
      Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/ModelOverrider.cs
  18. 9
      Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/ProjectSettingsOverrides.cs
  19. 8
      Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/SensorBase.cs
  20. 6
      Project/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs
  21. 10
      Project/Assets/ML-Agents/Examples/Walker/Scripts/WalkerAgent.cs
  22. 12
      Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs
  23. 6
      Project/ProjectSettings/DynamicsManager.asset
  24. 47
      com.unity.ml-agents/CHANGELOG.md
  25. 6
      com.unity.ml-agents/CONTRIBUTING.md
  26. 78
      com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
  27. 7
      com.unity.ml-agents/Editor/AgentEditor.cs
  28. 6
      com.unity.ml-agents/Editor/BehaviorParametersEditor.cs
  29. 10
      com.unity.ml-agents/Editor/BrainParametersDrawer.cs
  30. 4
      com.unity.ml-agents/Editor/DemonstrationDrawer.cs
  31. 2
      com.unity.ml-agents/Editor/RayPerceptionSensorComponentBaseEditor.cs
  32. 136
      com.unity.ml-agents/Runtime/Academy.cs
  33. 622
      com.unity.ml-agents/Runtime/Agent.cs
  34. 39
      com.unity.ml-agents/Runtime/Communicator/GrpcExtensions.cs
  35. 14
      com.unity.ml-agents/Runtime/Communicator/ICommunicator.cs
  36. 11
      com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
  37. 13
      com.unity.ml-agents/Runtime/DecisionRequester.cs
  38. 45
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
  39. 7
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationWriter.cs
  40. 51
      com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
  41. 52
      com.unity.ml-agents/Runtime/Grpc/CommunicatorObjects/UnityRlInitializationInput.cs
  42. 58
      com.unity.ml-agents/Runtime/Grpc/CommunicatorObjects/UnityRlInitializationOutput.cs
  43. 18
      com.unity.ml-agents/Runtime/Inference/BarracudaModelParamLoader.cs
  44. 12
      com.unity.ml-agents/Runtime/Inference/GeneratorImpl.cs
  45. 4
      com.unity.ml-agents/Runtime/Inference/TensorApplier.cs
  46. 38
      com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs
  47. 61
      com.unity.ml-agents/Runtime/Policies/BrainParameters.cs
  48. 6
      com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs
  49. 12
      com.unity.ml-agents/Runtime/Sensors/CameraSensor.cs
  50. 20
      com.unity.ml-agents/Runtime/Sensors/CameraSensorComponent.cs
  51. 6
      com.unity.ml-agents/Runtime/Sensors/ISensor.cs
  52. 112
      com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensor.cs
  53. 2
      com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensorComponent2D.cs
  54. 8
      com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensorComponent3D.cs
  55. 56
      com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensorComponentBase.cs
  56. 6
      com.unity.ml-agents/Runtime/Sensors/RenderTextureSensor.cs
  57. 22
      com.unity.ml-agents/Runtime/Sensors/RenderTextureSensorComponent.cs
  58. 12
      com.unity.ml-agents/Runtime/Sensors/StackingSensor.cs
  59. 14
      com.unity.ml-agents/Runtime/Sensors/VectorSensor.cs
  60. 8
      com.unity.ml-agents/Runtime/Sensors/ObservationWriter.cs
  61. 58
      com.unity.ml-agents/Runtime/SideChannels/EngineConfigurationChannel.cs
  62. 30
      com.unity.ml-agents/Runtime/SideChannels/FloatPropertiesChannel.cs
  63. 2
      com.unity.ml-agents/Runtime/SideChannels/RawBytesChannel.cs
  64. 17
      com.unity.ml-agents/Runtime/SideChannels/SideChannel.cs
  65. 45
      com.unity.ml-agents/Runtime/SideChannels/StatsSideChannel.cs
  66. 2
      com.unity.ml-agents/Runtime/SideChannels/EnvironmentParametersChannel.cs.meta
  67. 16
      com.unity.ml-agents/Runtime/Utilities.cs
  68. 2
      com.unity.ml-agents/Tests/Editor/BehaviorParameterTests.cs
  69. 34
      com.unity.ml-agents/Tests/Editor/DemonstrationTests.cs
  70. 26
      com.unity.ml-agents/Tests/Editor/EditModeTestActionMasker.cs
  71. 8
      com.unity.ml-agents/Tests/Editor/EditModeTestInternalBrainTensorGenerator.cs
  72. 35
      com.unity.ml-agents/Tests/Editor/MLAgentsEditModeTest.cs
  73. 18
      com.unity.ml-agents/Tests/Editor/ModelRunnerTest.cs
  74. 34
      com.unity.ml-agents/Tests/Editor/ParameterLoaderTest.cs
  75. 42
      com.unity.ml-agents/Tests/Editor/PublicAPI/PublicApiValidation.cs
  76. 10
      com.unity.ml-agents/Tests/Editor/Sensor/CameraSensorComponentTest.cs
  77. 4
      com.unity.ml-agents/Tests/Editor/Sensor/CameraSensorTest.cs
  78. 6
      com.unity.ml-agents/Tests/Editor/Sensor/FloatVisualSensorTests.cs
  79. 88
      com.unity.ml-agents/Tests/Editor/Sensor/RayPerceptionSensorTests.cs
  80. 6
      com.unity.ml-agents/Tests/Editor/Sensor/RenderTextureSensorComponentTests.cs
  81. 4
      com.unity.ml-agents/Tests/Editor/Sensor/RenderTextureSensorTests.cs
  82. 2
      com.unity.ml-agents/Tests/Editor/Sensor/SensorShapeValidatorTests.cs
  83. 4
      com.unity.ml-agents/Tests/Editor/Sensor/VectorSensorTests.cs
  84. 8
      com.unity.ml-agents/Tests/Editor/Sensor/ObservationWriterTests.cs
  85. 32
      com.unity.ml-agents/Tests/Editor/SideChannelTests.cs
  86. 33
      com.unity.ml-agents/Tests/Runtime/RuntimeAPITest.cs
  87. 4
      com.unity.ml-agents/package.json
  88. 30
      config/trainer_config.yaml
  89. 2
      docs/API-Reference.md
  90. 10
      docs/Custom-SideChannels.md
  91. 4
      docs/FAQ.md
  92. 6
      docs/Learning-Environment-Design.md
  93. 25
      docs/Learning-Environment-Examples.md
  94. 26
      docs/ML-Agents-Overview.md
  95. 42
      docs/Migrating.md
  96. 36
      docs/Python-API.md
  97. 2
      docs/Readme.md
  98. 32
      docs/Training-Curriculum-Learning.md
  99. 2
      docs/Training-Imitation-Learning.md
  100. 2
      docs/Training-PPO.md

12
.circleci/config.yml


python373:
docker:
- image: circleci/python:3.7.3
python382:
docker:
- image: circleci/python:3.8.2
jobs:
build_python:

- run:
name: Install Dependencies
command: |
# Need ruby for search-and-replace
sudo apt-get update
sudo apt-get install ruby-full
python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip

executor: python373
pyversion: 3.7.3
# Test python 3.7 with the newest supported versions
pip_constraints: test_constraints_max_tf2_version.txt
- build_python:
name: python_3.8.2+tf2.2
executor: python382
pyversion: 3.8.2
# Test python 3.8 with the newest edge versions
pip_constraints: test_constraints_max_tf2_version.txt
- markdown_link_check
- pre-commit

7
.pre-commit-config.yaml


)$
args: [--score=n]
- repo: https://github.com/mattlqx/pre-commit-search-and-replace
rev: v1.0.3
hooks:
- id: search-and-replace
types: [markdown]
exclude: ".*localized.*"
# "Local" hooks, see https://pre-commit.com/#repository-local-hooks
- repo: local
hooks:

6
.yamato/com.unity.ml-agents-pack.yml


pack:
name: Pack
agent:
type: Unity::VM
image: package-ci/ubuntu:stable
flavor: b1.large
type: Unity::VM::osx
image: package-ci/mac:stable
flavor: b1.small
commands:
- npm install upm-ci-utils@stable -g --registry https://artifactory.prd.cds.internal.unity3d.com/artifactory/api/npm/upm-npm
- upm-ci package pack --package-path com.unity.ml-agents

55
.yamato/com.unity.ml-agents-test.yml


- version: 2020.1
coverageOptions: --enable-code-coverage --code-coverage-options 'generateHtmlReport;assemblyFilters:+Unity.ML-Agents'
minCoveragePct: 72
- version: 2020.2
coverageOptions: --enable-code-coverage --code-coverage-options 'generateHtmlReport;assemblyFilters:+Unity.ML-Agents'
minCoveragePct: 72
trunk_editor:
- version: trunk
coverageOptions: --enable-code-coverage --code-coverage-options 'generateHtmlReport;assemblyFilters:+Unity.ML-Agents'
minCoveragePct: 72
test_platforms:
- name: win
type: Unity::VM

flavor: b1.medium
---
all_package_tests:
name: Run All Combinations of Editors/Platforms Tests
dependencies:
{% for editor in test_editors %}
{% for platform in test_platforms %}
- .yamato/com.unity.ml-agents-test.yml#test_{{ platform.name }}_{{ editor.version }}
{% endfor %}
{% endfor %}
{% for editor in trunk_editor %}
{% for platform in test_platforms %}
- .yamato/com.unity.ml-agents-test.yml#test_{{ platform.name }}_{{ editor.version }}
{% endfor %}
{% endfor %}
triggers:
cancel_old_ci: true
recurring:
- branch: master
frequency: daily
{% for editor in test_editors %}
{% for platform in test_platforms %}
test_{{ platform.name }}_{{ editor.version }}:

- .yamato/com.unity.ml-agents-pack.yml#pack
triggers:
cancel_old_ci: true
{% if platform.name == "mac" %}
{% endif %}
{% endfor %}
{% endfor %}
{% endfor %}
{% for editor in trunk_editor %}
{% for platform in test_platforms %}
test_{{ platform.name }}_trunk:
name : com.unity.ml-agents test {{ editor.version }} on {{ platform.name }}
agent:
type: {{ platform.type }}
image: {{ platform.image }}
flavor: {{ platform.flavor}}
commands:
- python -m pip install unity-downloader-cli --extra-index-url https://artifactory.eu-cph-1.unityops.net/api/pypi/common-python/simple
- unity-downloader-cli -u trunk -c editor --wait --fast
- npm install upm-ci-utils@stable -g --registry https://artifactory.prd.cds.internal.unity3d.com/artifactory/api/npm/upm-npm
- upm-ci package test -u {{ editor.version }} --package-path com.unity.ml-agents {{ editor.coverageOptions }}
- python ml-agents/tests/yamato/check_coverage_percent.py upm-ci~/test-results/ {{ editor.minCoveragePct }}
artifacts:
logs:
paths:
- "upm-ci~/test-results/**/*"
dependencies:
- .yamato/com.unity.ml-agents-pack.yml#pack
triggers:
cancel_old_ci: true
{% endfor %}
{% endfor %}

8
Project/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs


[Header("Specific to Ball3D")]
public GameObject ball;
Rigidbody m_BallRb;
FloatPropertiesChannel m_ResetParams;
EnvironmentParameters m_ResetParams;
m_ResetParams = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
SetResetParameters();
}

public void SetBall()
{
//Set the attributes of the ball by fetching the information from the academy
m_BallRb.mass = m_ResetParams.GetPropertyWithDefault("mass", 1.0f);
var scale = m_ResetParams.GetPropertyWithDefault("scale", 1.0f);
m_BallRb.mass = m_ResetParams.GetWithDefault("mass", 1.0f);
var scale = m_ResetParams.GetWithDefault("scale", 1.0f);
ball.transform.localScale = new Vector3(scale, scale, scale);
}

8
Project/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs


[Header("Specific to Ball3DHard")]
public GameObject ball;
Rigidbody m_BallRb;
FloatPropertiesChannel m_ResetParams;
EnvironmentParameters m_ResetParams;
m_ResetParams = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
SetResetParameters();
}

public void SetBall()
{
//Set the attributes of the ball by fetching the information from the academy
m_BallRb.mass = m_ResetParams.GetPropertyWithDefault("mass", 1.0f);
var scale = m_ResetParams.GetPropertyWithDefault("scale", 1.0f);
m_BallRb.mass = m_ResetParams.GetWithDefault("mass", 1.0f);
var scale = m_ResetParams.GetWithDefault("scale", 1.0f);
ball.transform.localScale = new Vector3(scale, scale, scale);
}

6
Project/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs


int m_NumberJumps = 20;
int m_JumpLeft = 20;
FloatPropertiesChannel m_ResetParams;
EnvironmentParameters m_ResetParams;
public override void Initialize()
{

m_ResetParams = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
SetResetParameters();
}

public void SetTargetScale()
{
var targetScale = m_ResetParams.GetPropertyWithDefault("target_scale", 1.0f);
var targetScale = m_ResetParams.GetWithDefault("target_scale", 1.0f);
target.transform.localScale = new Vector3(targetScale, targetScale, targetScale);
}

7
Project/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorAgent.cs


public bool contribute;
public bool useVectorObs;
EnvironmentParameters m_ResetParams;
public override void Initialize()
{

m_ResetParams = Academy.Instance.EnvironmentParameters;
SetResetParameters();
}

public void SetLaserLengths()
{
m_LaserLength = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>().GetPropertyWithDefault("laser_length", 1.0f);
m_LaserLength = m_ResetParams.GetWithDefault("laser_length", 1.0f);
float agentScale = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>().GetPropertyWithDefault("agent_scale", 1.0f);
float agentScale = m_ResetParams.GetWithDefault("agent_scale", 1.0f);
gameObject.transform.localScale = new Vector3(agentScale, agentScale, agentScale);
}

10
Project/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorSettings.cs


using System;
using MLAgents.SideChannels;
public class FoodCollectorSettings : MonoBehaviour
{

public int totalScore;
public Text scoreText;
StatsSideChannel m_statsSideChannel;
StatsRecorder m_Recorder;
m_statsSideChannel = SideChannelUtils.GetSideChannel<StatsSideChannel>();
m_Recorder = Academy.Instance.StatsRecorder;
public void EnvironmentReset()
private void EnvironmentReset()
{
ClearObjects(GameObject.FindGameObjectsWithTag("food"));
ClearObjects(GameObject.FindGameObjectsWithTag("badFood"));

// need to send every Update() call.
if ((Time.frameCount % 100)== 0)
{
m_statsSideChannel?.AddStat("TotalScore", totalScore);
m_Recorder.Add("TotalScore", totalScore);
}
}
}

9
Project/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs


const int k_Left = 3;
const int k_Right = 4;
EnvironmentParameters m_ResetParams;
public override void Initialize()
{
m_ResetParams = Academy.Instance.EnvironmentParameters;
}
public override void CollectDiscreteActionMasks(DiscreteActionMasker actionMasker)
{
// Mask the necessary actions if selected by the user.

var positionX = (int)transform.position.x;
var positionZ = (int)transform.position.z;
var maxPosition = (int)SideChannelUtils.GetSideChannel<FloatPropertiesChannel>().GetPropertyWithDefault("gridSize", 5f) - 1;
var maxPosition = (int)m_ResetParams.GetWithDefault("gridSize", 5f) - 1;
if (positionX == 0)
{

20
Project/Assets/ML-Agents/Examples/GridWorld/Scripts/GridArea.cs


public GameObject trueAgent;
FloatPropertiesChannel m_ResetParameters;
Camera m_AgentCam;
public GameObject goalPref;

Vector3 m_InitialPosition;
EnvironmentParameters m_ResetParams;
m_ResetParameters = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
m_Objects = new[] { goalPref, pitPref };

m_InitialPosition = transform.position;
}
public void SetEnvironment()
private void SetEnvironment()
transform.position = m_InitialPosition * (m_ResetParameters.GetPropertyWithDefault("gridSize", 5f) + 1);
transform.position = m_InitialPosition * (m_ResetParams.GetWithDefault("gridSize", 5f) + 1);
for (var i = 0; i < (int)m_ResetParameters.GetPropertyWithDefault("numObstacles", 1); i++)
for (var i = 0; i < (int)m_ResetParams.GetWithDefault("numObstacles", 1); i++)
for (var i = 0; i < (int)m_ResetParameters.GetPropertyWithDefault("numGoals", 1f); i++)
for (var i = 0; i < (int)m_ResetParams.GetWithDefault("numGoals", 1f); i++)
var gridSize = (int)m_ResetParameters.GetPropertyWithDefault("gridSize", 5f);
var gridSize = (int)m_ResetParams.GetWithDefault("gridSize", 5f);
m_Plane.transform.localScale = new Vector3(gridSize / 10.0f, 1f, gridSize / 10.0f);
m_Plane.transform.localPosition = new Vector3((gridSize - 1) / 2f, -0.5f, (gridSize - 1) / 2f);
m_Sn.transform.localScale = new Vector3(1, 1, gridSize + 2);

public void AreaReset()
{
var gridSize = (int)m_ResetParameters.GetPropertyWithDefault("gridSize", 5f);
var gridSize = (int)m_ResetParams.GetWithDefault("gridSize", 5f);
foreach (var actor in actorObjs)
{
DestroyImmediate(actor);

{
numbers.Add(Random.Range(0, gridSize * gridSize));
}
var numbersA = Enumerable.ToArray(numbers);
var numbersA = numbers.ToArray();
for (var i = 0; i < players.Length; i++)
{

2
Project/Assets/ML-Agents/Examples/GridWorld/Scripts/GridSettings.cs


public void Awake()
{
SideChannelUtils.GetSideChannel<FloatPropertiesChannel>().RegisterCallback("gridSize", f =>
Academy.Instance.EnvironmentParameters.RegisterCallback("gridSize", f =>
{
MainCamera.transform.position = new Vector3(-(f - 1) / 2f, f * 1.25f, -(f - 1) / 2f);
MainCamera.orthographicSize = (f + 5f) / 2f;

4
Project/Assets/ML-Agents/Examples/Hallway/Scripts/HallwayAgent.cs


{
if (useVectorObs)
{
sensor.AddObservation(StepCount / (float)maxStep);
sensor.AddObservation(StepCount / (float)MaxStep);
}
}

public override void OnActionReceived(float[] vectorAction)
{
AddReward(-1f / maxStep);
AddReward(-1f / MaxStep);
MoveAgent(vectorAction);
}

20
Project/Assets/ML-Agents/Examples/PushBlock/Scripts/PushAgentBasic.cs


/// </summary>
Renderer m_GroundRenderer;
private EnvironmentParameters m_ResetParams;
void Awake()
{
m_PushBlockSettings = FindObjectOfType<PushBlockSettings>();

m_GroundRenderer = ground.GetComponent<Renderer>();
// Starting material
m_GroundMaterial = m_GroundRenderer.material;
m_ResetParams = Academy.Instance.EnvironmentParameters;
SetResetParameters();
}

MoveAgent(vectorAction);
// Penalty given each step to encourage agent to finish task quickly.
AddReward(-1f / maxStep);
AddReward(-1f / MaxStep);
}
public override void Heuristic(float[] actionsOut)

public void SetGroundMaterialFriction()
{
var resetParams = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
groundCollider.material.dynamicFriction = resetParams.GetPropertyWithDefault("dynamic_friction", 0);
groundCollider.material.staticFriction = resetParams.GetPropertyWithDefault("static_friction", 0);
groundCollider.material.dynamicFriction = m_ResetParams.GetWithDefault("dynamic_friction", 0);
groundCollider.material.staticFriction = m_ResetParams.GetWithDefault("static_friction", 0);
var resetParams = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
var scale = resetParams.GetPropertyWithDefault("block_scale", 2);
var scale = m_ResetParams.GetWithDefault("block_scale", 2);
m_BlockRb.drag = resetParams.GetPropertyWithDefault("block_drag", 0.5f);
m_BlockRb.drag = m_ResetParams.GetWithDefault("block_drag", 0.5f);
public void SetResetParameters()
private void SetResetParameters()
{
SetGroundMaterialFriction();
SetBlockProperties();

2
Project/Assets/ML-Agents/Examples/Pyramids/Scripts/PyramidAgent.cs


public override void OnActionReceived(float[] vectorAction)
{
AddReward(-1f / maxStep);
AddReward(-1f / MaxStep);
MoveAgent(vectorAction);
}

13
Project/Assets/ML-Agents/Examples/Reacher/Scripts/ReacherAgent.cs


// Frequency of the cosine deviation of the goal along the vertical dimension
float m_DeviationFreq;
private EnvironmentParameters m_ResetParams;
/// <summary>
/// Collect the rigidbodies of the reacher in order to resue them for
/// observations and actions.

m_RbA = pendulumA.GetComponent<Rigidbody>();
m_RbB = pendulumB.GetComponent<Rigidbody>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
SetResetParameters();
}

public void SetResetParameters()
{
var fp = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
m_GoalSize = fp.GetPropertyWithDefault("goal_size", 5);
m_GoalSpeed = Random.Range(-1f, 1f) * fp.GetPropertyWithDefault("goal_speed", 1);
m_Deviation = fp.GetPropertyWithDefault("deviation", 0);
m_DeviationFreq = fp.GetPropertyWithDefault("deviation_freq", 0);
m_GoalSize = m_ResetParams.GetWithDefault("goal_size", 5);
m_GoalSpeed = Random.Range(-1f, 1f) * m_ResetParams.GetWithDefault("goal_speed", 1);
m_Deviation = m_ResetParams.GetWithDefault("deviation", 0);
m_DeviationFreq = m_ResetParams.GetWithDefault("deviation_freq", 0);
}
}

8
Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/ModelOverrider.cs


if (m_MaxEpisodes > 0)
{
// For Agents without maxSteps, exit as soon as we've hit the target number of episodes.
// For Agents that specify maxStep, also make sure we've gone at least that many steps.
// For Agents that specify MaxStep, also make sure we've gone at least that many steps.
if (m_Agent.CompletedEpisodes >= m_MaxEpisodes && m_NumSteps > m_MaxEpisodes * m_Agent.maxStep)
if (m_Agent.CompletedEpisodes >= m_MaxEpisodes && m_NumSteps > m_MaxEpisodes * m_Agent.MaxStep)
{
Application.Quit(0);
}

if (!m_BehaviorNameOverrides.ContainsKey(behaviorName))
{
Debug.Log($"No override for behaviorName {behaviorName}");
Debug.Log($"No override for BehaviorName {behaviorName}");
return null;
}

{
m_Agent.LazyInitialize();
var bp = m_Agent.GetComponent<BehaviorParameters>();
var name = bp.behaviorName;
var name = bp.BehaviorName;
var nnModel = GetModelForBehaviorName(name);
Debug.Log($"Overriding behavior {name} for agent with model {nnModel?.name}");

9
Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/ProjectSettingsOverrides.cs


namespace MLAgentsExamples
{
/// <summary>
/// A helper class for the ML-Agents example scenes to override various
/// global settings, and restore them afterwards.
/// This can modify some Physics and time-stepping properties, so you
/// shouldn't copy it into your project unless you know what you're doing.
/// </summary>
public class ProjectSettingsOverrides : MonoBehaviour
{
// Original values

Physics.defaultSolverVelocityIterations = solverVelocityIterations;
// Make sure the Academy singleton is initialized first, since it will create the SideChannels.
var academy = Academy.Instance;
SideChannelUtils.GetSideChannel<FloatPropertiesChannel>().RegisterCallback("gravity", f => { Physics.gravity = new Vector3(0, -f, 0); });
Academy.Instance.EnvironmentParameters.RegisterCallback("gravity", f => { Physics.gravity = new Vector3(0, -f, 0); });
}
public void OnDestroy()

8
Project/Assets/ML-Agents/Examples/SharedAssets/Scripts/SensorBase.cs


/// <summary>
/// Default implementation of Write interface. This creates a temporary array,
/// calls WriteObservation, and then writes the results to the WriteAdapter.
/// calls WriteObservation, and then writes the results to the ObservationWriter.
/// <param name="adapter"></param>
/// <param name="writer"></param>
public virtual int Write(WriteAdapter adapter)
public virtual int Write(ObservationWriter writer)
{
// TODO reuse buffer for similar agents, don't call GetObservationShape()
var numFloats = this.ObservationSize();

adapter.AddRange(buffer);
writer.AddRange(buffer);
return numFloats;
}

6
Project/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs


HitWall m_BallScript;
TennisArea m_Area;
float m_InvertMult;
FloatPropertiesChannel m_ResetParams;
EnvironmentParameters m_ResetParams;
Vector3 m_Down = new Vector3(0f, -100f, 0f);
Vector3 zAxis = new Vector3(0f, 0f, 1f);
const float k_Angle = 90f;

m_Area = myArea.GetComponent<TennisArea>();
var canvas = GameObject.Find(k_CanvasName);
GameObject scoreBoard;
m_ResetParams = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
if (invertX)
{
scoreBoard = canvas.transform.Find(k_ScoreBoardBName).gameObject;

public void SetBall()
{
scale = m_ResetParams.GetPropertyWithDefault("scale", .5f);
scale = m_ResetParams.GetWithDefault("scale", .5f);
ball.transform.localScale = new Vector3(scale, scale, scale);
}

10
Project/Assets/ML-Agents/Examples/Walker/Scripts/WalkerAgent.cs


Rigidbody m_ChestRb;
Rigidbody m_SpineRb;
FloatPropertiesChannel m_ResetParams;
EnvironmentParameters m_ResetParams;
public override void Initialize()
{

m_ChestRb = chest.GetComponent<Rigidbody>();
m_SpineRb = spine.GetComponent<Rigidbody>();
m_ResetParams = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
SetResetParameters();
}

public void SetTorsoMass()
{
m_ChestRb.mass = m_ResetParams.GetPropertyWithDefault("chest_mass", 8);
m_SpineRb.mass = m_ResetParams.GetPropertyWithDefault("spine_mass", 10);
m_HipsRb.mass = m_ResetParams.GetPropertyWithDefault("hip_mass", 15);
m_ChestRb.mass = m_ResetParams.GetWithDefault("chest_mass", 8);
m_SpineRb.mass = m_ResetParams.GetWithDefault("spine_mass", 10);
m_HipsRb.mass = m_ResetParams.GetWithDefault("hip_mass", 15);
}
public void SetResetParameters()

12
Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs


Vector3 m_JumpTargetPos;
Vector3 m_JumpStartingPos;
FloatPropertiesChannel m_FloatProperties;
EnvironmentParameters m_ResetParams;
public override void Initialize()
{

spawnArea.SetActive(false);
m_FloatProperties = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
m_ResetParams = Academy.Instance.EnvironmentParameters;
}
// Begin the jump sequence

{
localScale = new Vector3(
localScale.x,
m_FloatProperties.GetPropertyWithDefault("no_wall_height", 0),
m_ResetParams.GetWithDefault("no_wall_height", 0),
localScale.z);
wall.transform.localScale = localScale;
SetModel("SmallWallJump", noWallBrain);

localScale = new Vector3(
localScale.x,
m_FloatProperties.GetPropertyWithDefault("small_wall_height", 4),
m_ResetParams.GetWithDefault("small_wall_height", 4),
localScale.z);
wall.transform.localScale = localScale;
SetModel("SmallWallJump", smallWallBrain);

var min = m_FloatProperties.GetPropertyWithDefault("big_wall_min_height", 8);
var max = m_FloatProperties.GetPropertyWithDefault("big_wall_max_height", 8);
var min = m_ResetParams.GetWithDefault("big_wall_min_height", 8);
var max = m_ResetParams.GetWithDefault("big_wall_max_height", 8);
var height = min + Random.value * (max - min);
localScale = new Vector3(
localScale.x,

6
Project/ProjectSettings/DynamicsManager.asset


--- !u!55 &1
PhysicsManager:
m_ObjectHideFlags: 0
serializedVersion: 7
serializedVersion: 10
m_Gravity: {x: 0, y: -9.81, z: 0}
m_DefaultMaterial: {fileID: 0}
m_BounceThreshold: 2

m_LayerCollisionMatrix: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffebffffffddffffffeffffffff5fffffffbffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
m_AutoSimulation: 1
m_AutoSyncTransforms: 1
m_ReuseCollisionCallbacks: 0
m_ClothInterCollisionSettingsToggle: 0
m_ContactPairsMode: 0
m_BroadphaseType: 0

m_WorldSubdivisions: 8
m_FrictionType: 0
m_EnableEnhancedDeterminism: 0
m_EnableUnifiedHeightmaps: 1

47
com.unity.ml-agents/CHANGELOG.md


[Semantic Versioning](http://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Major Changes
### Minor Changes
### Bug Fixes
## [1.0.0-preview] - 2020-05-06
- Added new 3-joint Worm ragdoll environment. (#3798)
- Introduced the `SideChannelUtils` to register, unregister and access side
channels.
- `Academy.FloatProperties` was removed, please use
`SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Removed the multi-agent gym option from the gym wrapper. For multi-agent
scenarios, use the [Low Level Python API](../docs/Python-API.md).
- The low level Python API has changed. You can look at the document

`AgentAction` and `AgentReset` have been removed.
- The GhostTrainer has been extended to support asymmetric games and the
asymmetric example environment Strikers Vs. Goalie has been added.
- The SideChannel API has changed (#3833, #3660) :
- Introduced the `SideChannelManager` to register, unregister and access side
channels.
- `EnvironmentParameters` replaces the default `FloatProperties`.
You can access the `EnvironmentParameters` with
`Academy.Instance.EnvironmentParameters` on C# and create an
`EnvironmentParametersChannel` on Python
- `SideChannel.OnMessageReceived` is now a protected method (was public)
- SideChannel IncomingMessages methods now take an optional default argument,
which is used when trying to read more data than the message contains.
- Added a feature to allow sending stats from C# environments to TensorBoard
(and other python StatsWriters). To do this from your code, use
`Academy.Instance.StatsRecorder.Add(key, value)`(#3660)
- CameraSensorComponent.m_Grayscale and RenderTextureSensorComponent.m_Grayscale
were changed from `public` to `private` (#3808).
- The `UnityEnv` class from the `gym-unity` package was renamed

- Public fields and properties on several classes were renamed to follow Unity's
C# style conventions. All public fields and properties now use "PascalCase"
instead of "camelCase"; for example, `Agent.maxStep` was renamed to
`Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
- Added a feature to allow sending stats from C# environments to TensorBoard
(and other python StatsWriters). To do this from your code, use
`SideChannelUtils.GetSideChannel<StatsSideChannel>().AddStat(key, value)`
(#3660)
- SideChannel IncomingMessages methods now take an optional default argument,
which is used when trying to read more data than the message contains.
- The way that UnityEnvironment decides the port was changed. If no port is
specified, the behavior will depend on the `file_name` parameter. If it is
`None`, 5004 (the editor port) will be used; otherwise 5005 (the base

- Running `mlagents-learn` with the same `--run-id` twice will no longer
overwrite the existing files. (#3705)
- `StackingSensor` was changed from `internal` visibility to `public`
- Academy.InferenceSeed property was added. This is used to initialize the
random number generator in ModelRunner, and is incremented for each ModelRunner. (#3823)
- Model updates can now happen asynchronously with environment steps for better performance. (#3690)
- `num_updates` and `train_interval` for SAC were replaced with `steps_per_update`. (#3690)
- Added `Agent.GetObservations(), which returns a read-only view of the observations
added in `CollectObservations()`. (#3825)
- Model updates can now happen asynchronously with environment steps for better performance. (#3690)
- `num_updates` and `train_interval` for SAC were replaced with `steps_per_update`. (#3690)
- `WriteAdapter` was renamed to `ObservationWriter`. If you have a custom `ISensor` implementation,
you will need to change the signature of its `Write()` method. (#3834)
- The maximum compatible version of tensorflow was changed to allow tensorflow 2.1 and 2.2. This
will allow use with python 3.8 using tensorflow 2.2.0rc3.
- `UnityRLCapabilities` was added to help inform users when RL features are mismatched between C# and Python packages. (#3831)
### Bug Fixes

6
com.unity.ml-agents/CONTRIBUTING.md


# Contribution Guidelines
Thank you for your interest in contributing to the ML-Agents toolkit! We are
Thank you for your interest in contributing to the ML-Agents Toolkit! We are
ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set
ML-Agents Toolkit. To facilitate your contributions, we've outlined a brief set
of guidelines to ensure that your extensions can be easily integrated.
## Communication

Second, before starting on a project that you intend to contribute to the
ML-Agents toolkit (whether environments or modifications to the codebase), we
ML-Agents Toolkit (whether environments or modifications to the codebase), we
**strongly** recommend posting on our
[Issues page](https://github.com/Unity-Technologies/ml-agents/issues)
and briefly outlining the changes you plan to make. This will enable us to

78
com.unity.ml-agents/Documentation~/com.unity.ml-agents.md


# About ML-Agents package (`com.unity.ml-agents`)
The Unity ML-Agents package contains the C# SDK for the
[Unity ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
The Unity ML-Agents package contains the C# SDK for the [Unity ML-Agents Toolkit].
The package provides the ability for any Unity scene to be converted into a learning
environment where character behaviors can be trained using a variety of machine learning
algorithms. Additionally, it enables any trained behavior to be embedded back into the Unity
scene. More specifically, the package provides the following core functionalities:
* Define Agents: entities whose behavior will be learned. Agents are entities
that generate observations (through sensors), take actions and receive rewards from
the environment.
The package allows you to convert any Unity scene to into a learning
environment and train character behaviors using a variety of machine learning
algorithms. Additionally, it allows you to embed these trained behaviors back into
Unity scenes to control your characters. More specifically, the package provides
the following core functionalities:
* Define Agents: entities, or characters, whose behavior will be learned. Agents are entities
that generate observations (through sensors), take actions, and receive rewards from
the environment.
share the same Behavior and a scene may have multiple Behaviors.
* Record demonstrations of an agent within the Editor. These demonstrations can be
valuable to train a behavior for that agent.
* Embedding a trained behavior into the scene via the
[Unity Inference Engine](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html).
Thus an Agent can switch from a learning behavior to an inference behavior.
share the same Behavior and a scene may have multiple Behaviors.
* Record demonstrations of an agent within the Editor. You can use demonstrations
to help train a behavior for that agent.
* Embedding a trained behavior into the scene via the [Unity Inference Engine].
Embedded behaviors allow you to switch an Agent between learning and inference.
Note that this package does not contain the machine learning algorithms for training
behaviors. It relies on a Python package to orchestrate the training. This package
only enables instrumenting a Unity scene and setting it up for training, and then
embedding the trained model back into your Unity scene.
## Preview package
This package is available as a preview, so it is not ready for production use.
The features and documentation in this package might change before it is verified for release.
Note that the *ML-Agents* package does not contain the machine learning algorithms for training
behaviors. The *ML-Agents* package only supports instrumenting a Unity scene, setting it up for
training, and then embedding the trained model back into your Unity scene. The machine learning
algorithms that orchestrate training are part of the companion [Python package].
## Package contents

|*Runtime*|Contains core C# APIs for integrating ML-Agents into your Unity scene. |
|*Tests*|Contains the unit tests for the package.|
<a name="Installation"></a>
<a name="Installation"></a>
To install this package, follow the instructions in the
[Package Manager documentation](https://docs.unity3d.com/Manual/upm-ui-install.html).
To install this *ML-Agents* package, follow the instructions in the [Package Manager documentation].
To install the Python package to enable training behaviors, follow the instructions on our
[GitHub repository](https://github.com/Unity-Technologies/ml-agents/blob/latest_release/docs/Installation.md).
To install the companion Python package to enable training behaviors, follow the
[installation instructions] on our [GitHub repository].
This version of the Unity ML-Agents package is compatible with the following versions of the Unity Editor:
This version of the Unity ML-Agents package is compatible with the following versions of the
Unity Editor:
* 2018.4 and later (recommended)
* 2018.4 and later
## Known limitations

Currently the speed of the game physics can only be increased to 100x real-time.
The Academy also moves in time with FixedUpdate() rather than Update(), so game
behavior implemented in Update() may be out of sync with the agent decision
making. See
[Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
for more information.
making. See [Execution Order of Event Functions] for more information.
You can control the frequency of Academy stepping by calling
`Academy.Instance.DisableAutomaticStepping()`, and then calling

If you are new to the Unity ML-Agents package, or have a question after reading
the documentation, you can checkout our
[GitHUb Repository](https://github.com/Unity-Technologies/ml-agents), which
also includes a number of ways to
[connect with us](https://github.com/Unity-Technologies/ml-agents#community-and-feedback)
including our [ML-Agents Forum](https://forum.unity.com/forums/ml-agents.453/).
[GitHUb Repository], which also includes a number of ways to [connect with us]
including our [ML-Agents Forum].
[Unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
[Unity Inference Engine]: https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
[Package Manager documentation]: https://docs.unity3d.com/Manual/upm-ui-install.html
[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/latest_release/docs/Installation.md
[GitHUb Repository]: https://github.com/Unity-Technologies/ml-agents
[Python package]: https://github.com/Unity-Technologies/ml-agents
[Execution Order of Event Functions]: https://docs.unity3d.com/Manual/ExecutionOrder.html
[connect with us]: https://github.com/Unity-Technologies/ml-agents#community-and-feedback
[ML-Agents Forum]: https://forum.unity.com/forums/ml-agents.453/

7
com.unity.ml-agents/Editor/AgentEditor.cs


var serializedAgent = serializedObject;
serializedAgent.Update();
var maxSteps = serializedAgent.FindProperty(
"maxStep");
var maxSteps = serializedAgent.FindProperty("MaxStep");
new GUIContent(
"Max Step", "The per-agent maximum number of steps."));
new GUIContent("Max Step", "The per-agent maximum number of steps.")
);
serializedAgent.ApplyModifiedProperties();

6
com.unity.ml-agents/Editor/BehaviorParametersEditor.cs


var model = (NNModel)serializedObject.FindProperty("m_Model").objectReferenceValue;
var behaviorParameters = (BehaviorParameters)target;
SensorComponent[] sensorComponents;
if (behaviorParameters.useChildSensors)
if (behaviorParameters.UseChildSensors)
{
sensorComponents = behaviorParameters.GetComponentsInChildren<SensorComponent>();
}

}
var brainParameters = behaviorParameters.brainParameters;
var brainParameters = behaviorParameters.BrainParameters;
if (model != null)
{
barracudaModel = ModelLoader.Load(model);

var failedChecks = Inference.BarracudaModelParamLoader.CheckModel(
barracudaModel, brainParameters, sensorComponents, behaviorParameters.behaviorType
barracudaModel, brainParameters, sensorComponents, behaviorParameters.BehaviorType
);
foreach (var check in failedChecks)
{

10
com.unity.ml-agents/Editor/BrainParametersDrawer.cs


// The height of a line in the Unity Inspectors
const float k_LineHeight = 17f;
const int k_VecObsNumLine = 3;
const string k_ActionSizePropName = "vectorActionSize";
const string k_ActionTypePropName = "vectorActionSpaceType";
const string k_ActionDescriptionPropName = "vectorActionDescriptions";
const string k_VecObsPropName = "vectorObservationSize";
const string k_NumVecObsPropName = "numStackedVectorObservations";
const string k_ActionSizePropName = "VectorActionSize";
const string k_ActionTypePropName = "VectorActionSpaceType";
const string k_ActionDescriptionPropName = "VectorActionDescriptions";
const string k_VecObsPropName = "VectorObservationSize";
const string k_NumVecObsPropName = "NumStackedVectorObservations";
/// <inheritdoc />
public override float GetPropertyHeight(SerializedProperty property, GUIContent label)

4
com.unity.ml-agents/Editor/DemonstrationDrawer.cs


/// </summary>
void MakeActionsProperty(SerializedProperty property)
{
var actSizeProperty = property.FindPropertyRelative("vectorActionSize");
var actSpaceTypeProp = property.FindPropertyRelative("vectorActionSpaceType");
var actSizeProperty = property.FindPropertyRelative("VectorActionSize");
var actSpaceTypeProp = property.FindPropertyRelative("VectorActionSpaceType");
var vecActSizeLabel =
actSizeProperty.displayName + ": " + BuildIntArrayLabel(actSizeProperty);

2
com.unity.ml-agents/Editor/RayPerceptionSensorComponentBaseEditor.cs


// it is not editable during play mode.
EditorGUI.BeginDisabledGroup(!EditorUtilities.CanUpdateModelProperties());
{
EditorGUILayout.PropertyField(so.FindProperty("m_ObservationStacks"), true);
EditorGUILayout.PropertyField(so.FindProperty("m_ObservationStacks"), new GUIContent("Stacked Raycasts"), true);
}
EditorGUI.EndDisabledGroup();

136
com.unity.ml-agents/Runtime/Academy.cs


* API. For more information on each of these entities, in addition to how to
* set-up a learning environment and train the behavior of characters in a
* Unity scene, please browse our documentation pages on GitHub:
* https://github.com/Unity-Technologies/ml-agents/blob/master/docs/
* https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/
*/
namespace MLAgents

}
/// <summary>
/// An Academy is where Agent objects go to train their behaviors.
/// The Academy singleton manages agent training and decision making.
/// When an academy is run, it can either be in inference or training mode.
/// The mode is determined by the presence or absence of a Communicator. In
/// the presence of a communicator, the academy is run in training mode where
/// the states and observations of each agent are sent through the
/// communicator. In the absence of a communicator, the academy is run in
/// inference mode where the agent behavior is determined by the Policy
/// attached to it.
/// Access the Academy singleton through the <see cref="Instance"/>
/// property. The Academy instance is initialized the first time it is accessed (which will
/// typically be by the first <see cref="Agent"/> initialized in a scene).
///
/// At initialization, the Academy attempts to connect to the Python training process through
/// the external communicator. If successful, the training process can train <see cref="Agent"/>
/// instances. When you set an agent's <see cref="BehaviorParameters.behaviorType"/> setting
/// to <see cref="BehaviorType.Default"/>, the agent exchanges data with the training process
/// to make decisions. If no training process is available, agents with the default behavior
/// fall back to inference or heuristic decisions. (You can also set agents to always use
/// inference or heuristics.)
/// </remarks>
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/master/" +
"docs/Learning-Environment-Design.md")]

/// on each side, although we may allow some flexibility in the future.
/// This should be incremented whenever a change is made to the communication protocol.
/// </summary>
const string k_ApiVersion = "0.16.0";
const string k_ApiVersion = "0.17.0";
internal const string k_PackageVersion = "0.15.1-preview";
internal const string k_PackageVersion = "1.0.0-preview";
const int k_EditorTrainingPort = 5004;

static Lazy<Academy> s_Lazy = new Lazy<Academy>(() => new Academy());
/// <summary>
/// True if the Academy is initialized, false otherwise.
///Reports whether the Academy has been initialized yet.
/// <value><c>True</c> if the Academy is initialized, <c>false</c> otherwise.</value>
public static bool IsInitialized
{
get { return s_Lazy.IsValueCreated; }

/// The singleton Academy object.
/// </summary>
/// <value>Getting the instance initializes the Academy, if necessary.</value>
/// Returns whether or not the communicator is on.
/// Reports whether or not the communicator is on.
/// <returns>
/// <c>true</c>, if communicator is on, <c>false</c> otherwise.
/// </returns>
/// <seealso cref="ICommunicator"/>
/// <value>
/// <c>True</c>, if communicator is on, <c>false</c> otherwise.
/// </value>
public bool IsCommunicatorOn
{
get { return Communicator != null; }

// Flag used to keep track of the first time the Academy is reset.
bool m_HadFirstReset;
// Random seed used for inference.
int m_InferenceSeed;
/// <summary>
/// Set the random seed used for inference. This should be set before any Agents are added
/// to the scene. The seed is passed to the ModelRunner constructor, and incremented each
/// time a new ModelRunner is created.
/// </summary>
public int InferenceSeed
{
set { m_InferenceSeed = value; }
}
/// <summary>
/// Returns the RLCapabilities of the python client that the unity process is connected to.
/// </summary>
internal UnityRLCapabilities TrainerCapabilities { get; set; }
// to facilitate synchronization. More specifically, it ensure
// that all the agents performs their steps in a consistent order (i.e. no
// to facilitate synchronization. More specifically, it ensures
// that all the agents perform their steps in a consistent order (i.e. no
// agent can act based on a decision before another agent has had a chance
// to request a decision).

// Signals to all the listeners that the academy is being destroyed
internal event Action DestroyAction;
// Signals the Agent that a new step is about to start.
// Signals to the Agent that a new step is about to start.
// This will mark the Agent as Done if it has reached its maxSteps.
internal event Action AgentIncrementStep;

/// <summary>
/// Determines whether or not the Academy is automatically stepped during the FixedUpdate phase.
/// </summary>
/// <value>Set <c>true</c> to enable automatic stepping; <c>false</c> to disable.</value>
public bool AutomaticSteppingEnabled
{
get { return m_FixedUpdateStepper != null; }

#endif
}
}
private EnvironmentParameters m_EnvironmentParameters;
private StatsRecorder m_StatsRecorder;
/// Initializes the environment, configures it and initialized the Academy.
/// Returns the <see cref="EnvironmentParameters"/> instance. If training
/// features such as Curriculum Learning or Environment Parameter Randomization are used,
/// then the values of the parameters generated from the training process can be
/// retrieved here.
/// </summary>
/// <returns></returns>
public EnvironmentParameters EnvironmentParameters
{
get { return m_EnvironmentParameters; }
}
/// <summary>
/// Returns the <see cref="StatsRecorder"/> instance. This instance can be used
/// to record any statistics from the Unity environment.
/// </summary>
/// <returns></returns>
public StatsRecorder StatsRecorder
{
get { return m_StatsRecorder; }
}
/// <summary>
/// Initializes the environment, configures it and initializes the Academy.
/// </summary>
void InitializeEnvironment()
{

EnableAutomaticStepping();
SideChannelUtils.RegisterSideChannel(new EngineConfigurationChannel());
SideChannelUtils.RegisterSideChannel(new FloatPropertiesChannel());
SideChannelUtils.RegisterSideChannel(new StatsSideChannel());
SideChannelsManager.RegisterSideChannel(new EngineConfigurationChannel());
m_EnvironmentParameters = new EnvironmentParameters();
m_StatsRecorder = new StatsRecorder();
// Try to launch the communicator by using the arguments passed at launch
var port = ReadPortFromArgs();

unityCommunicationVersion = k_ApiVersion,
unityPackageVersion = k_PackageVersion,
name = "AcademySingleton",
CSharpCapabilities = new UnityRLCapabilities()
// We might have inference-only Agents, so set the seed for them too.
m_InferenceSeed = unityRlInitParameters.seed;
TrainerCapabilities = unityRlInitParameters.TrainerCapabilities;
TrainerCapabilities.WarnOnPythonMissingBaseRLCapabilities();
}
catch
{

}
/// <summary>
/// Returns the current episode counter.
/// The current episode count.
/// <returns>
/// <value>
/// </returns>
/// </value>
public int EpisodeCount
{
get { return m_EpisodeCount; }

/// Returns the current step counter (within the current episode).
/// The current step count (within the current episode).
/// <returns>
/// <value>
/// </returns>
/// </value>
public int StepCount
{
get { return m_StepCount; }

/// Returns the total step counter.
/// Returns the total step count.
/// <returns>
/// <value>
/// </returns>
/// </value>
public int TotalStepCount
{
get { return m_TotalStepCount; }

}
/// <summary>
/// Performs a single environment update to the Academy, and Agent
/// Performs a single environment update of the Academy and Agent
/// objects within the environment.
/// </summary>
public void EnvironmentStep()

// If the communicator is not on, we need to clear the SideChannel sending queue
if (!IsCommunicatorOn)
{
SideChannelUtils.GetSideChannelMessage();
SideChannelsManager.GetSideChannelMessage();
}
using (TimerStack.Instance.Scoped("AgentAct"))

/// NNModel and the InferenceDevice as provided.
/// </summary>
/// <param name="model">The NNModel the ModelRunner must use.</param>
/// <param name="brainParameters">The brainParameters used to create the ModelRunner.</param>
/// <param name="brainParameters">The BrainParameters used to create the ModelRunner.</param>
/// <param name="inferenceDevice">
/// The inference device (CPU or GPU) the ModelRunner will use.
/// </param>

var modelRunner = m_ModelRunners.Find(x => x.HasModel(model, inferenceDevice));
if (modelRunner == null)
{
modelRunner = new ModelRunner(
model, brainParameters, inferenceDevice);
modelRunner = new ModelRunner(model, brainParameters, inferenceDevice, m_InferenceSeed);
m_InferenceSeed++;
}
return modelRunner;
}

Communicator?.Dispose();
Communicator = null;
SideChannelUtils.UnregisterAllSideChannels();
m_EnvironmentParameters.Dispose();
m_StatsRecorder.Dispose();
SideChannelsManager.UnregisterAllSideChannels(); // unregister custom side channels
if (m_ModelRunners != null)
{

622
com.unity.ml-agents/Runtime/Agent.cs


using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using UnityEngine.Serialization;
namespace MLAgents
{

public float[] storedVectorActions;
/// <summary>
/// For discrete control, specifies the actions that the agent cannot take. Is true if
/// the action is masked.
/// For discrete control, specifies the actions that the agent cannot take.
/// An element of the mask array is <c>true</c> if the action is prohibited.
/// Current agent reward.
/// The current agent reward.
/// </summary>
public float reward;

}
/// <summary>
/// Agent MonoBehaviour class that is attached to a Unity GameObject, making it
/// an Agent. An agent produces observations and takes actions in the
/// environment. Observations are determined by the cameras attached
/// to the agent in addition to the vector observations implemented by the
/// user in <see cref="Agent.CollectObservations(VectorSensor)"/>.
/// On the other hand, actions are determined by decisions produced by a Policy.
/// Currently, this class is expected to be extended to implement the desired agent behavior.
/// An agent is an actor that can observe its environment, decide on the
/// best course of action using those observations, and execute those actions
/// within the environment.
/// Simply speaking, an agent roams through an environment and at each step
/// of the environment extracts its current observation, sends them to its
/// policy and in return receives an action. In practice,
/// however, an agent need not send its observation at every step since very
/// little may have changed between successive steps.
/// Use the Agent class as the subclass for implementing your own agents. Add
/// your Agent implementation to a [GameObject] in the [Unity scene] that serves
/// as the agent's environment.
/// At any step, an agent may be considered done due to a variety of reasons:
/// - The agent reached an end state within its environment.
/// - The agent reached the maximum # of steps (i.e. timed out).
/// - The academy reached the maximum # of steps (forced agent to be done).
/// Agents in an environment operate in *steps*. At each step, an agent collects observations,
/// passes them to its decision-making policy, and receives an action vector in response.
/// Here, an agent reaches an end state if it completes its task successfully
/// or somehow fails along the way. In the case where an agent is done before
/// the academy, it either resets and restarts, or just lingers until the
/// academy is done.
/// Agents make observations using <see cref="ISensor"/> implementations. The ML-Agents
/// API provides implementations for visual observations (<see cref="CameraSensor"/>)
/// raycast observations (<see cref="RayPerceptionSensor"/>), and arbitrary
/// data observations (<see cref="VectorSensor"/>). You can add the
/// <see cref="CameraSensorComponent"/> and <see cref="RayPerceptionSensorComponent2D"/> or
/// <see cref="RayPerceptionSensorComponent3D"/> components to an agent's [GameObject] to use
/// those sensor types. You can implement the <see cref="CollectObservations(VectorSensor)"/>
/// function in your Agent subclass to use a vector observation. The Agent class calls this
/// function before it uses the observation vector to make a decision. (If you only use
/// visual or raycast observations, you do not need to implement
/// <see cref="CollectObservations"/>.)
/// An important note regarding steps and episodes is due. Here, an agent step
/// corresponds to an academy step, which also corresponds to Unity
/// environment step (i.e. each FixedUpdate call). This is not the case for
/// episodes. The academy controls the global episode count and each agent
/// controls its own local episode count and can reset and start a new local
/// episode independently (based on its own experience). Thus an academy
/// (global) episode can be viewed as the upper-bound on an agents episode
/// length and that within a single global episode, an agent may have completed
/// multiple local episodes. Consequently, if an agent max step is
/// set to a value larger than the academy max steps value, then the academy
/// value takes precedence (since the agent max step will never be reached).
/// Assign a decision making policy to an agent using a <see cref="BehaviorParameters"/>
/// component attached to the agent's [GameObject]. The <see cref="BehaviorType"/> setting
/// determines how decisions are made:
/// Lastly, note that at any step the policy to the agent is allowed to
/// change model with <see cref="SetModel"/>.
/// * <see cref="BehaviorType.Default"/>: decisions are made by the external process,
/// when connected. Otherwise, decisions are made using inference. If no inference model
/// is specified in the BehaviorParameters component, then heuristic decision
/// making is used.
/// * <see cref="BehaviorType.InferenceOnly"/>: decisions are always made using the trained
/// model specified in the <see cref="BehaviorParameters"/> component.
/// * <see cref="BehaviorType.HeuristicOnly"/>: when a decision is needed, the agent's
/// <see cref="Heuristic"/> function is called. Your implementation is responsible for
/// providing the appropriate action.
/// Implementation-wise, it is required that this class is extended and the
/// virtual methods overridden. For sample implementations of agent behavior,
/// see the Examples/ directory within this Unity project.
/// To trigger an agent decision automatically, you can attach a <see cref="DecisionRequester"/>
/// component to the Agent game object. You can also call the agent's <see cref="RequestDecision"/>
/// function manually. You only need to call <see cref="RequestDecision"/> when the agent is
/// in a position to act upon the decision. In many cases, this will be every [FixedUpdate]
/// callback, but could be less frequent. For example, an agent that hops around its environment
/// can only take an action when it touches the ground, so several frames might elapse between
/// one decision and the need for the next.
///
/// Use the <see cref="OnActionReceived"/> function to implement the actions your agent can take,
/// such as moving to reach a goal or interacting with its environment.
///
/// When you call <see cref="EndEpisode"/> on an agent or the agent reaches its <see cref="maxStep"/> count,
/// its current episode ends. You can reset the agent -- or remove it from the
/// environment -- by implementing the <see cref="OnEpisodeBegin"/> function. An agent also
/// becomes done when the <see cref="Academy"/> resets the environment, which only happens when
/// the <see cref="Academy"/> receives a reset signal from an external process via the
/// <see cref="Academy.Communicator"/>.
///
/// The Agent class extends the Unity [MonoBehaviour] class. You can implement the
/// standard [MonoBehaviour] functions as needed for your agent. Since an agent's
/// observations and actions typically take place during the [FixedUpdate] phase, you should
/// only use the [MonoBehaviour.Update] function for cosmetic purposes. If you override the [MonoBehaviour]
/// methods, [OnEnable()] or [OnDisable()], always call the base Agent class implementations.
///
/// You can implement the <see cref="Heuristic"/> function to specify agent actions using
/// your own heuristic algorithm. Implementing a heuristic function can be useful
/// for debugging. For example, you can use keyboard input to select agent actions in
/// order to manually control an agent's behavior.
///
/// Note that you can change the inference model assigned to an agent at any step
/// by calling <see cref="SetModel"/>.
///
/// See [Agents] and [Reinforcement Learning in Unity] in the [Unity ML-Agents Toolkit manual] for
/// more information on creating and training agents.
///
/// For sample implementations of agent behavior, see the examples available in the
/// [Unity ML-Agents Toolkit] on Github.
///
/// [MonoBehaviour]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.html
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Unity scene]: https://docs.unity3d.com/Manual/CreatingScenes.html
/// [FixedUpdate]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.FixedUpdate.html
/// [MonoBehaviour.Update]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.Update.html
/// [OnEnable()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnEnable.html
/// [OnDisable()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnDisable.html]
/// [OnBeforeSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnBeforeSerialize.html
/// [OnAfterSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnAfterSerialize.html
/// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md
/// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design.md
/// [Unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
/// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Readme.md
///
/// </remarks>
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/master/" +
"docs/Learning-Environment-Design-Agents.md")]

IPolicy m_Brain;
BehaviorParameters m_PolicyFactory;
/// This code is here to make the upgrade path for users using maxStep
/// This code is here to make the upgrade path for users using MaxStep
/// easier. We will hook into the Serialization code and make sure that
/// agentParameters.maxStep and this.maxStep are in sync.
[Serializable]

/// <summary>
/// The maximum number of steps the agent takes before being done.
/// </summary>
/// <value>The maximum steps for an agent to take before it resets; or 0 for
/// unlimited steps.</value>
/// If set to 0, the agent can only be set to done programmatically (or
/// when the Academy is done).
/// If set to any positive integer, the agent will be set to done after
/// that many steps. Note that setting the max step to a value greater
/// than the academy max step value renders it useless.
/// The max step value determines the maximum length of an agent's episodes.
/// Set to a positive integer to limit the episode length to that many steps.
/// Set to 0 for unlimited episode length.
///
/// When an episode ends and a new one begins, the Agent object's
/// <seealso cref="OnEpisodeBegin"/> function is called. You can implement
/// <see cref="OnEpisodeBegin"/> to reset the agent or remove it from the
/// environment. An agent's episode can also end if you call its <seealso cref="EndEpisode"/>
/// method or an external process resets the environment through the <see cref="Academy"/>.
///
/// Consider limiting the number of steps in an episode to avoid wasting time during
/// training. If you set the max step value to a reasonable estimate of the time it should
/// take to complete a task, then agents that haven’t succeeded in that time frame will
/// reset and start a new training episode rather than continue to fail.
[HideInInspector] public int maxStep;
/// <example>
/// To use a step limit when training while allowing agents to run without resetting
/// outside of training, you can set the max step to 0 in <see cref="Initialize"/>
/// if the <see cref="Academy"/> is not connected to an external process.
/// <code>
/// using MLAgents;
///
/// public class MyAgent : Agent
/// {
/// public override void Initialize()
/// {
/// if (!Academy.Instance.IsCommunicatorOn)
/// {
/// this.MaxStep = 0;
/// }
/// }
/// }
/// </code>
/// **Note:** in general, you should limit the differences between the code you execute
/// during training and the code you run during inference.
/// </example>
[FormerlySerializedAs("maxStep")]
[HideInInspector] public int MaxStep;
/// Current Agent information (message sent to Brain).
AgentInfo m_Info;

/// <summary>
/// Called when the attached <see cref="GameObject"/> becomes enabled and active.
/// </summary>
/// <remarks>
/// This function initializes the Agent instance, if it hasn't been initialized yet.
/// Always call the base Agent class version of this function if you implement `OnEnable()`
/// in your own Agent subclasses.
/// </remarks>
/// <example>
/// <code>
/// protected override void OnEnable()
/// {
/// base.OnEnable();
/// // additional OnEnable logic...
/// }
/// </code>
/// </example>
protected virtual void OnEnable()
{
LazyInitialize();

/// <inheritdoc cref="OnBeforeSerialize"/>
/// Called by Unity immediately before serializing this object.
/// <remarks>
/// The Agent class uses OnBeforeSerialize() for internal housekeeping. Call the
/// base class implementation if you need your own custom serialization logic.
///
/// See [OnBeforeSerialize] for more information.
///
/// [OnBeforeSerialize]: https://docs.unity3d.com/ScriptReference/ISerializationCallbackReceiver.OnAfterDeserialize.html
/// </remarks>
/// <example>
/// <code>
/// public new void OnBeforeSerialize()
/// {
/// base.OnBeforeSerialize();
/// // additional serialization logic...
/// }
/// </code>
/// </example>
// Manages a serialization upgrade issue from v0.13 to v0.14 where maxStep moved
// Manages a serialization upgrade issue from v0.13 to v0.14 where MaxStep moved
if (maxStep == 0 && maxStep != agentParameters.maxStep && !hasUpgradedFromAgentParameters)
if (MaxStep == 0 && MaxStep != agentParameters.maxStep && !hasUpgradedFromAgentParameters)
maxStep = agentParameters.maxStep;
MaxStep = agentParameters.maxStep;
/// <inheritdoc cref="OnAfterDeserialize"/>
/// Called by Unity immediately after deserializing this object.
/// <remarks>
/// The Agent class uses OnAfterDeserialize() for internal housekeeping. Call the
/// base class implementation if you need your own custom deserialization logic.
///
/// See [OnAfterDeserialize] for more information.
///
/// [OnAfterDeserialize]: https://docs.unity3d.com/ScriptReference/ISerializationCallbackReceiver.OnAfterDeserialize.html
/// </remarks>
/// <example>
/// <code>
/// public new void OnAfterDeserialize()
/// {
/// base.OnAfterDeserialize();
/// // additional deserialization logic...
/// }
/// </code>
/// </example>
// Manages a serialization upgrade issue from v0.13 to v0.14 where maxStep moved
// Manages a serialization upgrade issue from v0.13 to v0.14 where MaxStep moved
if (maxStep == 0 && maxStep != agentParameters.maxStep && !hasUpgradedFromAgentParameters)
if (MaxStep == 0 && MaxStep != agentParameters.maxStep && !hasUpgradedFromAgentParameters)
maxStep = agentParameters.maxStep;
MaxStep = agentParameters.maxStep;
}
hasUpgradedFromAgentParameters = true;
}

/// </summary>
/// <remarks>
/// This function calls your <seealso cref="Initialize"/> implementation, if one exists.
/// </remarks>
public void LazyInitialize()
{
if (m_Initialized)

}
/// <summary>
/// Reason that the Agent is being considered "done"
/// The reason that the Agent has been set to "done".
/// The <see cref="Done"/> method was called.
/// The <see cref="EndEpisode"/> method was called.
/// </summary>
DoneCalled,

MaxStepReached,
/// <summary>
/// The Agent was disabled
/// The Agent was disabled.
/// </summary>
Disabled,
}

/// </summary>
/// <remarks>
/// Always call the base Agent class version of this function if you implement `OnDisable()`
/// in your own Agent subclasses.
/// </remarks>
/// <example>
/// <code>
/// protected override void OnDisable()
/// {
/// base.OnDisable();
/// // additional OnDisable logic...
/// }
/// </code>
/// </example>
/// <seealso cref="OnEnable"/>
protected virtual void OnDisable()
{
DemonstrationWriters.Clear();

}
/// <summary>
/// Updates the Model for the agent. Any model currently assigned to the
/// agent will be replaced with the provided one. If the arguments are
/// identical to the current parameters of the agent, the model will
/// remain unchanged.
/// Updates the Model assigned to this Agent instance.
/// <remarks>
/// If the agent already has an assigned model, that model is replaced with the
/// the provided one. However, if you call this function with arguments that are
/// identical to the current parameters of the agent, then no changes are made.
///
/// **Note:** the <paramref name="behaviorName"/> parameter is ignored when not training.
/// The <paramref name="model"/> and <paramref name="inferenceDevice"/> parameters
/// are ignored when not using inference.
/// </remarks>
/// <param name = "inferenceDevice"> Define on what device the model
/// <param name = "inferenceDevice"> Define the device on which the model
/// will be run.</param>
public void SetModel(
string behaviorName,

if (behaviorName == m_PolicyFactory.behaviorName &&
model == m_PolicyFactory.model &&
inferenceDevice == m_PolicyFactory.inferenceDevice)
if (behaviorName == m_PolicyFactory.BehaviorName &&
model == m_PolicyFactory.Model &&
inferenceDevice == m_PolicyFactory.InferenceDevice)
m_PolicyFactory.model = model;
m_PolicyFactory.inferenceDevice = inferenceDevice;
m_PolicyFactory.behaviorName = behaviorName;
m_PolicyFactory.Model = model;
m_PolicyFactory.InferenceDevice = inferenceDevice;
m_PolicyFactory.BehaviorName = behaviorName;
ReloadPolicy();
}

/// Overrides the current step reward of the agent and updates the episode
/// reward accordingly.
/// </summary>
/// <remarks>
/// This function replaces any rewards given to the agent during the current step.
/// Use <see cref="AddReward(float)"/> to incrementally change the reward rather than
/// overriding it.
///
/// Typically, you assign rewards in the Agent subclass's <see cref="OnActionReceived(float[])"/>
/// implementation after carrying out the received action and evaluating its success.
///
/// Rewards are used during reinforcement learning; they are ignored during inference.
///
/// See [Agents - Rewards] for general advice on implementing rewards and [Reward Signals]
/// for information about mixing reward signals from curiosity and Generative Adversarial
/// Imitation Learning (GAIL) with rewards supplied through this method.
///
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Reward-Signals.md
/// </remarks>
/// <param name="reward">The new value of the reward.</param>
public void SetReward(float reward)
{

/// <summary>
/// Increments the step and episode rewards by the provided value.
/// </summary>
/// <remarks>Use a positive reward to reinforce desired behavior. You can use a
/// negative reward to penalize mistakes. Use <seealso cref="SetReward(float)"/> to
/// set the reward assigned to the current step with a specific value rather than
/// increasing or decreasing it.
///
/// Typically, you assign rewards in the Agent subclass's <see cref="OnActionReceived(float[])"/>
/// implementation after carrying out the received action and evaluating its success.
///
/// Rewards are used during reinforcement learning; they are ignored during inference.
///
/// See [Agents - Rewards] for general advice on implementing rewards and [Reward Signals]
/// for information about mixing reward signals from curiosity and Generative Adversarial
/// Imitation Learning (GAIL) with rewards supplied through this method.
///
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Reward-Signals.md
///</remarks>
/// <param name="increment">Incremental reward value.</param>
public void AddReward(float increment)
{

void UpdateRewardStats()
{
var gaugeName = $"{m_PolicyFactory.behaviorName}.CumulativeReward";
var gaugeName = $"{m_PolicyFactory.BehaviorName}.CumulativeReward";
/// Sets the done flag to true.
/// Sets the done flag to true and resets the agent.
/// <seealso cref="OnEpisodeBegin"/>
public void EndEpisode()
{
NotifyAgentDone(DoneReason.DoneCalled);

/// <summary>
/// Is called when the agent must request the brain for a new decision.
/// Requests a new decision for this agent.
/// <remarks>
/// Call `RequestDecision()` whenever an agent needs a decision. You often
/// want to request a decision every environment step. However, if an agent
/// cannot use the decision every step, then you can request a decision less
/// frequently.
///
/// You can add a <seealso cref="DecisionRequester"/> component to the agent's
/// [GameObject] to drive the agent's decision making. When you use this component,
/// do not call `RequestDecision()` separately.
///
/// Note that this function calls <seealso cref="RequestAction"/>; you do not need to
/// call both functions at the same time.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
public void RequestDecision()
{
m_RequestDecision = true;

/// <summary>
/// Is called then the agent must perform a new action.
/// Requests an action for this agent.
/// <remarks>
/// Call `RequestAction()` to repeat the previous action returned by the agent's
/// most recent decision. A new decision is not requested. When you call this function,
/// the Agent instance invokes <seealso cref="OnActionReceived(float[])"/> with the
/// existing action vector.
///
/// You can use `RequestAction()` in situations where an agent must take an action
/// every update, but doesn't need to make a decision as often. For example, an
/// agent that moves through its environment might need to apply an action to keep
/// moving, but only needs to make a decision to change course or speed occasionally.
///
/// You can add a <seealso cref="DecisionRequester"/> component to the agent's
/// [GameObject] to drive the agent's decision making and action frequency. When you
/// use this component, do not call `RequestAction()` separately.
///
/// Note that <seealso cref="RequestDecision"/> calls `RequestAction()`; you do not need to
/// call both functions at the same time.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
public void RequestAction()
{
m_RequestAction = true;

/// at the end of an episode.
void ResetData()
{
var param = m_PolicyFactory.brainParameters;
var param = m_PolicyFactory.BrainParameters;
m_ActionMasker = new DiscreteActionMasker(param);
// If we haven't initialized vectorActions, initialize to 0. This should only
// happen during the creation of the Agent. In subsequent episodes, vectorAction

m_Action.vectorActions = new float[param.numActions];
m_Info.storedVectorActions = new float[param.numActions];
m_Action.vectorActions = new float[param.NumActions];
m_Info.storedVectorActions = new float[param.NumActions];
/// Initializes the agent, called once when the agent is enabled. Can be
/// left empty if there is no special, unique set-up behavior for the
/// agent.
/// Implement `Initialize()` to perform one-time initialization or set up of the
/// Agent instance.
/// One sample use is to store local references to other objects in the
/// scene which would facilitate computing this agents observation.
/// `Initialize()` is called once when the agent is first enabled. If, for example,
/// the Agent object needs references to other [GameObjects] in the scene, you
/// can collect and store those references here.
///
/// Note that <seealso cref="OnEpisodeBegin"/> is called at the start of each of
/// the agent's "episodes". You can use that function for items that need to be reset
/// for each episode.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
public virtual void Initialize(){}
public virtual void Initialize() {}
/// When the Agent uses Heuristics, it will call this method every time it
/// needs an action. This can be used for debugging or controlling the agent
/// with keyboard. This can also be useful to record demonstrations for imitation learning.
/// Implement `Heuristic()` to choose an action for this agent using a custom heuristic.
/// <param name="actionsOut">An array corresponding to the next action of the Agent</param>
/// <remarks>
/// Implement this function to provide custom decision making logic or to support manual
/// control of an agent using keyboard, mouse, or game controller input.
///
/// Your heuristic implementation can use any decision making logic you specify. Assign decision
/// values to the float[] array, <paramref cref="actionsOut"/>, passed to your function as a parameter.
/// Add values to the array at the same indexes as they are used in your
/// <seealso cref="OnActionReceived(float[])"/> function, which receives this array and
/// implements the corresponding agent behavior. See [Actions] for more information
/// about agent actions.
///
/// An agent calls this `Heuristic()` function to make a decision when you set its behavior
/// type to <see cref="BehaviorType.HeuristicOnly"/>. The agent also calls this function if
/// you set its behavior type to <see cref="BehaviorType.Default"/> when the
/// <see cref="Academy"/> is not connected to an external training process and you do not
/// assign a trained model to the agent.
///
/// To perform imitation learning, implement manual control of the agent in the `Heuristic()`
/// function so that you can record the demonstrations required for the imitation learning
/// algorithms. (Attach a [Demonstration Recorder] component to the agent's [GameObject] to
/// record the demonstration session to a file.)
///
/// Even when you don’t plan to use heuristic decisions for an agent or imitation learning,
/// implementing a simple heuristic function can aid in debugging agent actions and interactions
/// with its environment.
///
/// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Training-Imitation-Learning.md#recording-demonstrations
/// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#actions
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
/// <example>
/// The following example illustrates a `Heuristic()` function that provides WASD-style
/// keyboard control for an agent that can move in two dimensions as well as jump. See
/// [Input Manager] for more information about the built-in Unity input functions.
/// You can also use the [Input System package], which provides a more flexible and
/// configurable input system.
/// <code>
/// public override void Heuristic(float[] actionsOut)
/// {
/// actionsOut[0] = Input.GetAxis("Horizontal");
/// actionsOut[1] = Input.GetKey(KeyCode.Space) ? 1.0f : 0.0f;
/// actionsOut[2] = Input.GetAxis("Vertical");
/// }
/// </code>
/// [Input Manager]: https://docs.unity3d.com/Manual/class-InputManager.html
/// [Input System package]: https://docs.unity3d.com/Packages/com.unity.inputsystem@1.0/manual/index.html
/// </example>
/// <param name="actionsOut">Array for the output actions.</param>
/// <seealso cref="OnActionReceived(float[])"/>
public virtual void Heuristic(float[] actionsOut)
{
Debug.LogWarning("Heuristic method called but not implemented. Returning placeholder actions.");

{
// Get all attached sensor components
SensorComponent[] attachedSensorComponents;
if (m_PolicyFactory.useChildSensors)
if (m_PolicyFactory.UseChildSensors)
{
attachedSensorComponents = GetComponentsInChildren<SensorComponent>();
}

}
// Support legacy CollectObservations
var param = m_PolicyFactory.brainParameters;
if (param.vectorObservationSize > 0)
var param = m_PolicyFactory.BrainParameters;
if (param.VectorObservationSize > 0)
collectObservationsSensor = new VectorSensor(param.vectorObservationSize);
if (param.numStackedVectorObservations > 1)
collectObservationsSensor = new VectorSensor(param.VectorObservationSize);
if (param.NumStackedVectorObservations > 1)
collectObservationsSensor, param.numStackedVectorObservations);
collectObservationsSensor, param.NumStackedVectorObservations);
sensors.Add(stackingSensor);
}
else

}
using (TimerStack.Instance.Scoped("CollectDiscreteActionMasks"))
{
if (m_PolicyFactory.brainParameters.vectorActionSpaceType == SpaceType.Discrete)
if (m_PolicyFactory.BrainParameters.VectorActionSpaceType == SpaceType.Discrete)
{
CollectDiscreteActionMasks(m_ActionMasker);
}

}
/// <summary>
/// Collects the vector observations of the agent.
/// The agent observation describes the current environment from the
/// perspective of the agent.
/// Implement `CollectObservations()` to collect the vector observations of
/// the agent for the step. The agent observation describes the current
/// environment from the perspective of the agent.
/// An agents observation is any environment information that helps
/// the Agent achieve its goal. For example, for a fighting Agent, its
/// An agent's observation is any environment information that helps
/// the agent achieve its goal. For example, for a fighting agent, its
/// Recall that an Agent may attach vector or visual observations.
/// Vector observations are added by calling the provided helper methods
/// on the VectorSensor input:
///
/// You can use a combination of vector, visual, and raycast observations for an
/// agent. If you only use visual or raycast observations, you do not need to
/// implement a `CollectObservations()` function.
///
/// Add vector observations to the <paramref name="sensor"/> parameter passed to
/// this method by calling the <seealso cref="VectorSensor"/> helper methods:
/// - <see cref="VectorSensor.AddObservation(int)"/>
/// - <see cref="VectorSensor.AddObservation(float)"/>
/// - <see cref="VectorSensor.AddObservation(Vector3)"/>

/// - <see cref="VectorSensor.AddObservation(IEnumerable{float})"/>
/// - <see cref="VectorSensor.AddOneHotObservation(int, int)"/>
/// Depending on your environment, any combination of these helpers can
/// be used. They just need to be used in the exact same order each time
/// this method is called and the resulting size of the vector observation
/// needs to match the vectorObservationSize attribute of the linked Brain.
/// Visual observations are implicitly added from the cameras attached to
/// the Agent.
///
/// You can use any combination of these helper functions to build the agent's
/// vector of observations. You must build the vector in the same order
/// each time `CollectObservations()` is called and the length of the vector
/// must always be the same. In addition, the length of the observation must
/// match the <see cref="BrainParameters.VectorObservationSize"/>
/// attribute of the linked Brain, which is set in the Editor on the
/// **Behavior Parameters** component attached to the agent's [GameObject].
///
/// For more information about observations, see [Observations and Sensors].
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
/// </remarks>
public virtual void CollectObservations(VectorSensor sensor)
{

/// Collects the masks for discrete actions.
/// When using discrete actions, the agent will not perform the masked action.
/// Returns a read-only view of the observations that were generated in
/// <see cref="CollectObservations(VectorSensor)"/>. This is mainly useful inside of a
/// <see cref="Heuristic(float[])"/> method to avoid recomputing the observations.
/// </summary>
/// <returns>A read-only view of the observations list.</returns>
public ReadOnlyCollection<float> GetObservations()
{
return collectObservationsSensor.GetObservations();
}
/// <summary>
/// Implement `CollectDiscreteActionMasks()` to collects the masks for discrete
/// actions. When using discrete actions, the agent will not perform the masked
/// action.
/// </summary>
/// <param name="actionMasker">
/// The action masker for the agent.

/// action by masking it with <see cref="DiscreteActionMasker.SetMask(int, IEnumerable{int})"/>
/// action by masking it with <see cref="DiscreteActionMasker.SetMask(int, IEnumerable{int})"/>.
///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Design-Agents.md#actions
/// <seealso cref="OnActionReceived(float[])"/>
/// Specifies the agent behavior at every step based on the provided
/// action.
/// Implement `OnActionReceived()` to specify agent behavior at every step, based
/// on the provided action.
/// <remarks>
/// An action is passed to this function in the form of an array vector. Your
/// implementation must use the array to direct the agent's behavior for the
/// current step.
///
/// You decide how many elements you need in the action array to control your
/// agent and what each element means. For example, if you want to apply a
/// force to move an agent around the environment, you can arbitrarily pick
/// three values in the action array to use as the force components. During
/// training, the agent's policy learns to set those particular elements of
/// the array to maximize the training rewards the agent receives. (Of course,
/// if you implement a <seealso cref="Heuristic"/> function, it must use the same
/// elements of the action array for the same purpose since there is no learning
/// involved.)
///
/// Actions for an agent can be either *Continuous* or *Discrete*. Specify which
/// type of action space an agent uses, along with the size of the action array,
/// in the <see cref="BrainParameters"/> of the agent's associated
/// <see cref="BehaviorParameters"/> component.
///
/// When an agent uses the continuous action space, the values in the action
/// array are floating point numbers. You should clamp the values to the range,
/// -1..1, to increase numerical stability during training.
///
/// When an agent uses the discrete action space, the values in the action array
/// are integers that each represent a specific, discrete action. For example,
/// you could define a set of discrete actions such as:
///
/// <code>
/// 0 = Do nothing
/// 1 = Move one space left
/// 2 = Move one space right
/// 3 = Move one space up
/// 4 = Move one space down
/// </code>
///
/// When making a decision, the agent picks one of the five actions and puts the
/// corresponding integer value in the action vector. For example, if the agent
/// decided to move left, the action vector parameter would contain an array with
/// a single element with the value 1.
///
/// You can define multiple sets, or branches, of discrete actions to allow an
/// agent to perform simultaneous, independent actions. For example, you could
/// use one branch for movement and another branch for throwing a ball left, right,
/// up, or down, to allow the agent to do both in the same step.
///
/// The action vector of a discrete action space contains one element for each
/// branch. The value of each element is the integer representing the chosen
/// action for that branch. The agent always chooses one action for each
/// branch.
///
/// When you use the discrete action space, you can prevent the training process
/// or the neural network model from choosing specific actions in a step by
/// implementing the <see cref="CollectDiscreteActionMasks(DiscreteActionMasker)"/>
/// function. For example, if your agent is next to a wall, you could mask out any
/// actions that would result in the agent trying to move into the wall.
///
/// For more information about implementing agent actions see [Agents - Actions].
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// Vector action. Note that for discrete actions, the provided array
/// will be of length 1.
/// An array containing the action vector. The length of the array is specified
/// by the <see cref="BrainParameters"/> of the agent's associated
/// <see cref="BehaviorParameters"/> component.
public virtual void OnActionReceived(float[] vectorAction){}
public virtual void OnActionReceived(float[] vectorAction) {}
/// Specifies the agent behavior when being reset, which can be due to
/// the agent or Academy being done (i.e. completion of local or global
/// episode).
/// Implement `OnEpisodeBegin()` to set up an Agent instance at the beginning
/// of an episode.
public virtual void OnEpisodeBegin(){}
/// <seealso cref="Initialize"/>
/// <seealso cref="EndEpisode"/>
public virtual void OnEpisodeBegin() {}
/// Returns the last action that was decided on by the Agent
/// Returns the last action that was decided on by the Agent.
/// The last action that was decided by the Agent (or null if no decision has been made)
/// The last action that was decided by the Agent (or null if no decision has been made).
/// <seealso cref="OnActionReceived(float[])"/>
public float[] GetAction()
{
return m_Action.vectorActions;

/// An internal reset method that updates internal data structures in
/// addition to calling <see cref="AgentReset"/>.
/// addition to calling <see cref="OnEpisodeBegin"/>.
/// </summary>
void _AgentReset()
{

/// <summary>
/// Scales continuous action from [-1, 1] to arbitrary range.
/// </summary>
/// <param name="rawAction"></param>
/// <param name="min"></param>
/// <param name="max"></param>
/// <returns></returns>
/// <param name="rawAction">The input action value.</param>
/// <param name="min">The minimum output value.</param>
/// <param name="max">The maximum output value.</param>
/// <returns>The <paramref name="rawAction"/> scaled from [-1,1] to
/// [<paramref name="min"/>, <paramref name="max"/>].</returns>
protected static float ScaleAction(float rawAction, float min, float max)
{
var middle = (min + max) / 2;

/// <summary>
/// Signals the agent that it must sent its decision to the brain.
/// Signals the agent that it must send its decision to the brain.
/// </summary>
void SendInfo()
{

OnActionReceived(m_Action.vectorActions);
}
if ((m_StepCount >= maxStep) && (maxStep > 0))
if ((m_StepCount >= MaxStep) && (MaxStep > 0))
{
NotifyAgentDone(DoneReason.MaxStepReached);
_AgentReset();

39
com.unity.ml-agents/Runtime/Communicator/GrpcExtensions.cs


{
var brainParametersProto = new BrainParametersProto
{
VectorActionSize = { bp.vectorActionSize },
VectorActionSize = { bp.VectorActionSize },
(SpaceTypeProto)bp.vectorActionSpaceType,
(SpaceTypeProto)bp.VectorActionSpaceType,
brainParametersProto.VectorActionDescriptions.AddRange(bp.vectorActionDescriptions);
brainParametersProto.VectorActionDescriptions.AddRange(bp.VectorActionDescriptions);
return brainParametersProto;
}

{
var bp = new BrainParameters
{
vectorActionSize = bpp.VectorActionSize.ToArray(),
vectorActionDescriptions = bpp.VectorActionDescriptions.ToArray(),
vectorActionSpaceType = (SpaceType)bpp.VectorActionSpaceType
VectorActionSize = bpp.VectorActionSize.ToArray(),
VectorActionDescriptions = bpp.VectorActionDescriptions.ToArray(),
VectorActionSpaceType = (SpaceType)bpp.VectorActionSpaceType
};
return bp;
}

seed = inputProto.Seed,
pythonLibraryVersion = inputProto.PackageVersion,
pythonCommunicationVersion = inputProto.CommunicationVersion,
TrainerCapabilities = inputProto.Capabilities.ToRLCapabilities()
};
}

}
/// <summary>
/// Generate an ObservationProto for the sensor using the provided WriteAdapter.
/// Generate an ObservationProto for the sensor using the provided ObservationWriter.
/// <param name="writeAdapter"></param>
/// <param name="observationWriter"></param>
public static ObservationProto GetObservationProto(this ISensor sensor, WriteAdapter writeAdapter)
public static ObservationProto GetObservationProto(this ISensor sensor, ObservationWriter observationWriter)
{
var shape = sensor.GetObservationShape();
ObservationProto observationProto = null;

floatDataProto.Data.Add(0.0f);
}
writeAdapter.SetTarget(floatDataProto.Data, sensor.GetObservationShape(), 0);
sensor.Write(writeAdapter);
observationWriter.SetTarget(floatDataProto.Data, sensor.GetObservationShape(), 0);
sensor.Write(observationWriter);
observationProto = new ObservationProto
{

return observationProto;
}
#endregion
public static UnityRLCapabilities ToRLCapabilities(this UnityRLCapabilitiesProto proto)
{
return new UnityRLCapabilities
{
m_BaseRLCapabilities = proto.BaseRLCapabilities
};
}
public static UnityRLCapabilitiesProto ToProto(this UnityRLCapabilities rlCaps)
{
return new UnityRLCapabilitiesProto
{
BaseRLCapabilities = rlCaps.m_BaseRLCapabilities
};
}
}
}

14
com.unity.ml-agents/Runtime/Communicator/ICommunicator.cs


/// The version of the communication API.
/// </summary>
public string unityCommunicationVersion;
/// <summary>
/// The RL capabilities of the C# codebase.
/// </summary>
public UnityRLCapabilities CSharpCapabilities;
/// An RNG seed sent from the python process to Unity.
/// A random number generator (RNG) seed sent from the python process to Unity.
/// </summary>
public int seed;

/// The version of the communication API that python is using.
/// </summary>
public string pythonCommunicationVersion;
/// <summary>
/// The RL capabilities of the Trainer codebase.
/// </summary>
public UnityRLCapabilities TrainerCapabilities;
}
internal struct UnityRLInputParameters
{

}
/// <summary>
/// Delegate for handling quite events sent back from the communicator.
/// Delegate for handling quit events sent back from the communicator.
/// </summary>
internal delegate void QuitCommandHandler();

11
com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs


List<string> m_BehaviorNames = new List<string>();
bool m_NeedCommunicateThisStep;
WriteAdapter m_WriteAdapter = new WriteAdapter();
ObservationWriter m_ObservationWriter = new ObservationWriter();
Dictionary<string, SensorShapeValidator> m_SensorShapeValidators = new Dictionary<string, SensorShapeValidator>();
Dictionary<string, List<int>> m_OrderedAgentsRequestingDecisions = new Dictionary<string, List<int>>();

{
Name = initParameters.name,
PackageVersion = initParameters.unityPackageVersion,
CommunicationVersion = initParameters.unityCommunicationVersion
CommunicationVersion = initParameters.unityCommunicationVersion,
Capabilities = initParameters.CSharpCapabilities.ToProto()
};
UnityInputProto input;

void UpdateEnvironmentWithInput(UnityRLInputProto rlInput)
{
SideChannelUtils.ProcessSideChannelData(rlInput.SideChannel.ToArray());
SideChannelsManager.ProcessSideChannelData(rlInput.SideChannel.ToArray());
SendCommandEvent(rlInput.Command);
}

{
foreach (var sensor in sensors)
{
var obsProto = sensor.GetObservationProto(m_WriteAdapter);
var obsProto = sensor.GetObservationProto(m_ObservationWriter);
agentInfoProto.Observations.Add(obsProto);
}
}

message.RlInitializationOutput = tempUnityRlInitializationOutput;
}
byte[] messageAggregated = SideChannelUtils.GetSideChannelMessage();
byte[] messageAggregated = SideChannelsManager.GetSideChannelMessage();
message.RlOutput.SideChannel = ByteString.CopyFrom(messageAggregated);
var input = Exchange(message);

13
com.unity.ml-agents/Runtime/DecisionRequester.cs


namespace MLAgents
{
/// <summary>
/// A component that when attached to an Agent will automatically request decisions from it
/// at regular intervals.
/// The DecisionRequester component automatically request decisions for an
/// <see cref="Agent"/> instance at regular intervals.
/// <remarks>
/// Attach a DecisionRequester component to the same [GameObject] as the
/// <see cref="Agent"/> component.
///
/// The DecisionRequester component provides a convenient and flexible way to
/// trigger the agent decision making process. Without a DecisionRequester,
/// your <see cref="Agent"/> implmentation must manually call its
/// <seealso cref="Agent.RequestDecision"/> function.
/// </remarks>
[AddComponentMenu("ML Agents/Decision Requester", (int)MenuGroup.Default)]
[RequireComponent(typeof(Agent))]
public class DecisionRequester : MonoBehaviour

45
com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs


using UnityEngine;
using System.IO;
using MLAgents.Policies;
using UnityEngine.Serialization;
/// Demonstration Recorder Component.
/// The Demonstration Recorder component facilitates the recording of demonstrations
/// used for imitation learning.
/// <remarks>Add this component to the [GameObject] containing an <see cref="Agent"/>
/// to enable recording the agent for imitation learning. You must implement the
/// <see cref="Agent.Heuristic"/> function of the agent to provide manual control
/// in order to record demonstrations.
///
/// See [Imitation Learning - Recording Demonstrations] for more information.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Training-Imitation-Learning.md#recording-demonstrations
/// </remarks>
[RequireComponent(typeof(Agent))]
[AddComponentMenu("ML Agents/Demonstration Recorder", (int)MenuGroup.Default)]
public class DemonstrationRecorder : MonoBehaviour

/// </summary>
[FormerlySerializedAs("record")]
public bool record;
public bool Record;
[FormerlySerializedAs("demonstrationName")]
public string demonstrationName;
public string DemonstrationName;
[FormerlySerializedAs("demonstrationDirectory")]
public string demonstrationDirectory;
public string DemonstrationDirectory;
DemonstrationWriter m_DemoWriter;
internal const int MaxNameLength = 16;

void Update()
{
if (record)
if (Record)
{
LazyInitialize();
}

m_FileSystem = fileSystem ?? new FileSystem();
var behaviorParams = GetComponent<BehaviorParameters>();
if (string.IsNullOrEmpty(demonstrationName))
if (string.IsNullOrEmpty(DemonstrationName))
demonstrationName = behaviorParams.behaviorName;
DemonstrationName = behaviorParams.BehaviorName;
if (string.IsNullOrEmpty(demonstrationDirectory))
if (string.IsNullOrEmpty(DemonstrationDirectory))
demonstrationDirectory = Path.Combine(Application.dataPath, k_DefaultDirectoryName);
DemonstrationDirectory = Path.Combine(Application.dataPath, k_DefaultDirectoryName);
demonstrationName = SanitizeName(demonstrationName, MaxNameLength);
var filePath = MakeDemonstrationFilePath(m_FileSystem, demonstrationDirectory, demonstrationName);
DemonstrationName = SanitizeName(DemonstrationName, MaxNameLength);
var filePath = MakeDemonstrationFilePath(m_FileSystem, DemonstrationDirectory, DemonstrationName);
var stream = m_FileSystem.File.Create(filePath);
m_DemoWriter = new DemonstrationWriter(stream);

}
/// <summary>
/// Gets a unique path for the demonstrationName in the demonstrationDirectory.
/// Gets a unique path for the DemonstrationName in the DemonstrationDirectory.
/// </summary>
/// <param name="fileSystem"></param>
/// <param name="demonstrationDirectory"></param>

{
var behaviorParams = GetComponent<BehaviorParameters>();
demoWriter.Initialize(
demonstrationName,
behaviorParams.brainParameters,
behaviorParams.fullyQualifiedBehaviorName
DemonstrationName,
behaviorParams.BrainParameters,
behaviorParams.FullyQualifiedBehaviorName
);
m_Agent.DemonstrationWriters.Add(demoWriter);
}

7
com.unity.ml-agents/Runtime/Demonstrations/DemonstrationWriter.cs


/// <summary>
/// Responsible for writing demonstration data to stream (typically a file stream).
/// </summary>
/// <seealso cref="DemonstrationRecorder"/>
/// Number of bytes reserved for the Demonstration metadata at the start of the demo file.
/// Number of bytes reserved for the <see cref="Demonstration"/> metadata at the start of the demo file.
/// </summary>
internal const int MetaDataBytes = 32;

WriteAdapter m_WriteAdapter = new WriteAdapter();
ObservationWriter m_ObservationWriter = new ObservationWriter();
/// <summary>
/// Create a DemonstrationWriter that will write to the specified stream.

var agentProto = info.ToInfoActionPairProto();
foreach (var sensor in sensors)
{
agentProto.AgentInfo.Observations.Add(sensor.GetObservationProto(m_WriteAdapter));
agentProto.AgentInfo.Observations.Add(sensor.GetObservationProto(m_ObservationWriter));
}
agentProto.WriteDelimitedTo(m_Writer);

51
com.unity.ml-agents/Runtime/DiscreteActionMasker.cs


namespace MLAgents
{
/// <summary>
/// The DiscreteActionMasker class represents a set of masked (disallowed) actions and
/// provides utilities for setting and retrieving them.
/// </summary>
/// <remarks>
/// may be illegal (e.g. the King in Chess taking a move to the left if it is already in the
/// left side of the board). This class represents the set of masked actions and provides
/// the utilities for setting and retrieving them.
/// </summary>
/// may be illegal. For example, if an agent is adjacent to a wall or other obstacle
/// you could mask any actions that direct the agent to move into the blocked space.
/// </remarks>
public class DiscreteActionMasker
{
/// When using discrete control, is the starting indices of the actions

}
/// <summary>
/// Modifies an action mask for discrete control agents. When used, the agent will not be
/// able to perform the actions passed as argument at the next decision for the specified
/// action branch. The actionIndices correspond to the action options the agent will
/// be unable to perform.
/// Modifies an action mask for discrete control agents.
/// <param name="branch">The branch for which the actions will be masked</param>
/// <param name="actionIndices">The indices of the masked actions</param>
/// <remarks>
/// When used, the agent will not be able to perform the actions passed as argument
/// at the next decision for the specified action branch. The actionIndices correspond
/// to the action options the agent will be unable to perform.
///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <param name="branch">The branch for which the actions will be masked.</param>
/// <param name="actionIndices">The indices of the masked actions.</param>
if (branch >= m_BrainParameters.vectorActionSize.Length)
if (branch >= m_BrainParameters.VectorActionSize.Length)
var totalNumberActions = m_BrainParameters.vectorActionSize.Sum();
var totalNumberActions = m_BrainParameters.VectorActionSize.Sum();
// By default, the masks are null. If we want to specify a new mask, we initialize
// the actionMasks with trues.

// indices for each branch.
if (m_StartingActionIndices == null)
{
m_StartingActionIndices = Utilities.CumSum(m_BrainParameters.vectorActionSize);
m_StartingActionIndices = Utilities.CumSum(m_BrainParameters.VectorActionSize);
if (actionIndex >= m_BrainParameters.vectorActionSize[branch])
if (actionIndex >= m_BrainParameters.VectorActionSize[branch])
{
throw new UnityAgentsException(
"Invalid Action Masking: Action Mask is too large for specified branch.");

}
/// <summary>
/// Get the current mask for an agent
/// Get the current mask for an agent.
/// </summary>
/// <returns>A mask for the agent. A boolean array of length equal to the total number of
/// actions.</returns>

void AssertMask()
{
// Action Masks can only be used in Discrete Control.
if (m_BrainParameters.vectorActionSpaceType != SpaceType.Discrete)
if (m_BrainParameters.VectorActionSpaceType != SpaceType.Discrete)
var numBranches = m_BrainParameters.vectorActionSize.Length;
var numBranches = m_BrainParameters.VectorActionSize.Length;
for (var branchIndex = 0; branchIndex < numBranches; branchIndex++)
{
if (AreAllActionsMasked(branchIndex))

}
/// <summary>
/// Resets the current mask for an agent
/// Resets the current mask for an agent.
/// </summary>
internal void ResetMask()
{

}
/// <summary>
/// Checks if all the actions in the input branch are masked
/// Checks if all the actions in the input branch are masked.
/// <param name="branch"> The index of the branch to check</param>
/// <returns> True if all the actions of the branch are masked</returns>
/// <param name="branch"> The index of the branch to check.</param>
/// <returns> True if all the actions of the branch are masked.</returns>
bool AreAllActionsMasked(int branch)
{
if (m_CurrentMask == null)

52
com.unity.ml-agents/Runtime/Grpc/CommunicatorObjects/UnityRlInitializationInput.cs


string.Concat(
"CkZtbGFnZW50c19lbnZzL2NvbW11bmljYXRvcl9vYmplY3RzL3VuaXR5X3Js",
"X2luaXRpYWxpemF0aW9uX2lucHV0LnByb3RvEhRjb21tdW5pY2F0b3Jfb2Jq",
"ZWN0cyJnCh9Vbml0eVJMSW5pdGlhbGl6YXRpb25JbnB1dFByb3RvEgwKBHNl",
"ZWQYASABKAUSHQoVY29tbXVuaWNhdGlvbl92ZXJzaW9uGAIgASgJEhcKD3Bh",
"Y2thZ2VfdmVyc2lvbhgDIAEoCUIfqgIcTUxBZ2VudHMuQ29tbXVuaWNhdG9y",
"T2JqZWN0c2IGcHJvdG8z"));
"ZWN0cxo1bWxhZ2VudHNfZW52cy9jb21tdW5pY2F0b3Jfb2JqZWN0cy9jYXBh",
"YmlsaXRpZXMucHJvdG8irQEKH1VuaXR5UkxJbml0aWFsaXphdGlvbklucHV0",
"UHJvdG8SDAoEc2VlZBgBIAEoBRIdChVjb21tdW5pY2F0aW9uX3ZlcnNpb24Y",
"AiABKAkSFwoPcGFja2FnZV92ZXJzaW9uGAMgASgJEkQKDGNhcGFiaWxpdGll",
"cxgEIAEoCzIuLmNvbW11bmljYXRvcl9vYmplY3RzLlVuaXR5UkxDYXBhYmls",
"aXRpZXNQcm90b0IfqgIcTUxBZ2VudHMuQ29tbXVuaWNhdG9yT2JqZWN0c2IG",
"cHJvdG8z"));
new pbr::FileDescriptor[] { },
new pbr::FileDescriptor[] { global::MLAgents.CommunicatorObjects.CapabilitiesReflection.Descriptor, },
new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.UnityRLInitializationInputProto), global::MLAgents.CommunicatorObjects.UnityRLInitializationInputProto.Parser, new[]{ "Seed", "CommunicationVersion", "PackageVersion" }, null, null, null)
new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.UnityRLInitializationInputProto), global::MLAgents.CommunicatorObjects.UnityRLInitializationInputProto.Parser, new[]{ "Seed", "CommunicationVersion", "PackageVersion", "Capabilities" }, null, null, null)
}));
}
#endregion

seed_ = other.seed_;
communicationVersion_ = other.communicationVersion_;
packageVersion_ = other.packageVersion_;
Capabilities = other.capabilities_ != null ? other.Capabilities.Clone() : null;
_unknownFields = pb::UnknownFieldSet.Clone(other._unknownFields);
}

}
}
/// <summary>Field number for the "capabilities" field.</summary>
public const int CapabilitiesFieldNumber = 4;
private global::MLAgents.CommunicatorObjects.UnityRLCapabilitiesProto capabilities_;
/// <summary>
/// The RL Capabilities of the Python trainer.
/// </summary>
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public global::MLAgents.CommunicatorObjects.UnityRLCapabilitiesProto Capabilities {
get { return capabilities_; }
set {
capabilities_ = value;
}
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public override bool Equals(object other) {
return Equals(other as UnityRLInitializationInputProto);

if (Seed != other.Seed) return false;
if (CommunicationVersion != other.CommunicationVersion) return false;
if (PackageVersion != other.PackageVersion) return false;
if (!object.Equals(Capabilities, other.Capabilities)) return false;
return Equals(_unknownFields, other._unknownFields);
}

if (Seed != 0) hash ^= Seed.GetHashCode();
if (CommunicationVersion.Length != 0) hash ^= CommunicationVersion.GetHashCode();
if (PackageVersion.Length != 0) hash ^= PackageVersion.GetHashCode();
if (capabilities_ != null) hash ^= Capabilities.GetHashCode();
if (_unknownFields != null) {
hash ^= _unknownFields.GetHashCode();
}

output.WriteRawTag(26);
output.WriteString(PackageVersion);
}
if (capabilities_ != null) {
output.WriteRawTag(34);
output.WriteMessage(Capabilities);
}
if (_unknownFields != null) {
_unknownFields.WriteTo(output);
}

}
if (PackageVersion.Length != 0) {
size += 1 + pb::CodedOutputStream.ComputeStringSize(PackageVersion);
}
if (capabilities_ != null) {
size += 1 + pb::CodedOutputStream.ComputeMessageSize(Capabilities);
}
if (_unknownFields != null) {
size += _unknownFields.CalculateSize();

if (other.PackageVersion.Length != 0) {
PackageVersion = other.PackageVersion;
}
if (other.capabilities_ != null) {
if (capabilities_ == null) {
capabilities_ = new global::MLAgents.CommunicatorObjects.UnityRLCapabilitiesProto();
}
Capabilities.MergeFrom(other.Capabilities);
}
_unknownFields = pb::UnknownFieldSet.MergeFrom(_unknownFields, other._unknownFields);
}

}
case 26: {
PackageVersion = input.ReadString();
break;
}
case 34: {
if (capabilities_ == null) {
capabilities_ = new global::MLAgents.CommunicatorObjects.UnityRLCapabilitiesProto();
}
input.ReadMessage(capabilities_);
break;
}
}

58
com.unity.ml-agents/Runtime/Grpc/CommunicatorObjects/UnityRlInitializationOutput.cs


string.Concat(
"CkdtbGFnZW50c19lbnZzL2NvbW11bmljYXRvcl9vYmplY3RzL3VuaXR5X3Js",
"X2luaXRpYWxpemF0aW9uX291dHB1dC5wcm90bxIUY29tbXVuaWNhdG9yX29i",
"amVjdHMaOW1sYWdlbnRzX2VudnMvY29tbXVuaWNhdG9yX29iamVjdHMvYnJh",
"aW5fcGFyYW1ldGVycy5wcm90byLGAQogVW5pdHlSTEluaXRpYWxpemF0aW9u",
"T3V0cHV0UHJvdG8SDAoEbmFtZRgBIAEoCRIdChVjb21tdW5pY2F0aW9uX3Zl",
"cnNpb24YAiABKAkSEAoIbG9nX3BhdGgYAyABKAkSRAoQYnJhaW5fcGFyYW1l",
"dGVycxgFIAMoCzIqLmNvbW11bmljYXRvcl9vYmplY3RzLkJyYWluUGFyYW1l",
"dGVyc1Byb3RvEhcKD3BhY2thZ2VfdmVyc2lvbhgHIAEoCUoECAYQB0IfqgIc",
"TUxBZ2VudHMuQ29tbXVuaWNhdG9yT2JqZWN0c2IGcHJvdG8z"));
"amVjdHMaNW1sYWdlbnRzX2VudnMvY29tbXVuaWNhdG9yX29iamVjdHMvY2Fw",
"YWJpbGl0aWVzLnByb3RvGjltbGFnZW50c19lbnZzL2NvbW11bmljYXRvcl9v",
"YmplY3RzL2JyYWluX3BhcmFtZXRlcnMucHJvdG8ijAIKIFVuaXR5UkxJbml0",
"aWFsaXphdGlvbk91dHB1dFByb3RvEgwKBG5hbWUYASABKAkSHQoVY29tbXVu",
"aWNhdGlvbl92ZXJzaW9uGAIgASgJEhAKCGxvZ19wYXRoGAMgASgJEkQKEGJy",
"YWluX3BhcmFtZXRlcnMYBSADKAsyKi5jb21tdW5pY2F0b3Jfb2JqZWN0cy5C",
"cmFpblBhcmFtZXRlcnNQcm90bxIXCg9wYWNrYWdlX3ZlcnNpb24YByABKAkS",
"RAoMY2FwYWJpbGl0aWVzGAggASgLMi4uY29tbXVuaWNhdG9yX29iamVjdHMu",
"VW5pdHlSTENhcGFiaWxpdGllc1Byb3RvSgQIBhAHQh+qAhxNTEFnZW50cy5D",
"b21tdW5pY2F0b3JPYmplY3RzYgZwcm90bzM="));
new pbr::FileDescriptor[] { global::MLAgents.CommunicatorObjects.BrainParametersReflection.Descriptor, },
new pbr::FileDescriptor[] { global::MLAgents.CommunicatorObjects.CapabilitiesReflection.Descriptor, global::MLAgents.CommunicatorObjects.BrainParametersReflection.Descriptor, },
new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.UnityRLInitializationOutputProto), global::MLAgents.CommunicatorObjects.UnityRLInitializationOutputProto.Parser, new[]{ "Name", "CommunicationVersion", "LogPath", "BrainParameters", "PackageVersion" }, null, null, null)
new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.UnityRLInitializationOutputProto), global::MLAgents.CommunicatorObjects.UnityRLInitializationOutputProto.Parser, new[]{ "Name", "CommunicationVersion", "LogPath", "BrainParameters", "PackageVersion", "Capabilities" }, null, null, null)
}));
}
#endregion

logPath_ = other.logPath_;
brainParameters_ = other.brainParameters_.Clone();
packageVersion_ = other.packageVersion_;
Capabilities = other.capabilities_ != null ? other.Capabilities.Clone() : null;
_unknownFields = pb::UnknownFieldSet.Clone(other._unknownFields);
}

}
}
/// <summary>Field number for the "capabilities" field.</summary>
public const int CapabilitiesFieldNumber = 8;
private global::MLAgents.CommunicatorObjects.UnityRLCapabilitiesProto capabilities_;
/// <summary>
/// The RL Capabilities of the C# package.
/// </summary>
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public global::MLAgents.CommunicatorObjects.UnityRLCapabilitiesProto Capabilities {
get { return capabilities_; }
set {
capabilities_ = value;
}
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public override bool Equals(object other) {
return Equals(other as UnityRLInitializationOutputProto);

if (LogPath != other.LogPath) return false;
if(!brainParameters_.Equals(other.brainParameters_)) return false;
if (PackageVersion != other.PackageVersion) return false;
if (!object.Equals(Capabilities, other.Capabilities)) return false;
return Equals(_unknownFields, other._unknownFields);
}

if (LogPath.Length != 0) hash ^= LogPath.GetHashCode();
hash ^= brainParameters_.GetHashCode();
if (PackageVersion.Length != 0) hash ^= PackageVersion.GetHashCode();
if (capabilities_ != null) hash ^= Capabilities.GetHashCode();
if (_unknownFields != null) {
hash ^= _unknownFields.GetHashCode();
}

output.WriteRawTag(58);
output.WriteString(PackageVersion);
}
if (capabilities_ != null) {
output.WriteRawTag(66);
output.WriteMessage(Capabilities);
}
if (_unknownFields != null) {
_unknownFields.WriteTo(output);
}

if (PackageVersion.Length != 0) {
size += 1 + pb::CodedOutputStream.ComputeStringSize(PackageVersion);
}
if (capabilities_ != null) {
size += 1 + pb::CodedOutputStream.ComputeMessageSize(Capabilities);
}
if (_unknownFields != null) {
size += _unknownFields.CalculateSize();
}

brainParameters_.Add(other.brainParameters_);
if (other.PackageVersion.Length != 0) {
PackageVersion = other.PackageVersion;
}
if (other.capabilities_ != null) {
if (capabilities_ == null) {
capabilities_ = new global::MLAgents.CommunicatorObjects.UnityRLCapabilitiesProto();
}
Capabilities.MergeFrom(other.Capabilities);
}
_unknownFields = pb::UnknownFieldSet.MergeFrom(_unknownFields, other._unknownFields);
}

}
case 58: {
PackageVersion = input.ReadString();
break;
}
case 66: {
if (capabilities_ == null) {
capabilities_ = new global::MLAgents.CommunicatorObjects.UnityRLCapabilitiesProto();
}
input.ReadMessage(capabilities_);
break;
}
}

18
com.unity.ml-agents/Runtime/Inference/BarracudaModelParamLoader.cs


var tensorsNames = GetInputTensors(model).Select(x => x.name).ToList();
// If there is no Vector Observation Input but the Brain Parameters expect one.
if ((brainParameters.vectorObservationSize != 0) &&
if ((brainParameters.VectorObservationSize != 0) &&
(!tensorsNames.Contains(TensorNames.VectorObservationPlaceholder)))
{
failedModelChecks.Add(

static string CheckVectorObsShape(
BrainParameters brainParameters, TensorProxy tensorProxy, SensorComponent[] sensorComponents)
{
var vecObsSizeBp = brainParameters.vectorObservationSize;
var numStackedVector = brainParameters.numStackedVectorObservations;
var vecObsSizeBp = brainParameters.VectorObservationSize;
var numStackedVector = brainParameters.NumStackedVectorObservations;
var totalVecObsSizeT = tensorProxy.shape[tensorProxy.shape.Length - 1];
var totalVectorSensorSize = 0;

static string CheckPreviousActionShape(
BrainParameters brainParameters, TensorProxy tensorProxy, SensorComponent[] sensorComponents)
{
var numberActionsBp = brainParameters.vectorActionSize.Length;
var numberActionsBp = brainParameters.VectorActionSize.Length;
var numberActionsT = tensorProxy.shape[tensorProxy.shape.Length - 1];
if (numberActionsBp != numberActionsT)
{

return failedModelChecks;
}
if (isContinuous == ModelActionType.Continuous &&
brainParameters.vectorActionSpaceType != SpaceType.Continuous)
brainParameters.VectorActionSpaceType != SpaceType.Continuous)
{
failedModelChecks.Add(
"Model has been trained using Continuous Control but the Brain Parameters " +

if (isContinuous == ModelActionType.Discrete &&
brainParameters.vectorActionSpaceType != SpaceType.Discrete)
brainParameters.VectorActionSpaceType != SpaceType.Discrete)
{
failedModelChecks.Add(
"Model has been trained using Discrete Control but the Brain Parameters " +

var tensorTester = new Dictionary<string, Func<BrainParameters, TensorShape, int, string>>();
if (brainParameters.vectorActionSpaceType == SpaceType.Continuous)
if (brainParameters.VectorActionSpaceType == SpaceType.Continuous)
{
tensorTester[TensorNames.ActionOutput] = CheckContinuousActionOutputShape;
}

static string CheckDiscreteActionOutputShape(
BrainParameters brainParameters, TensorShape shape, int modelActionSize)
{
var bpActionSize = brainParameters.vectorActionSize.Sum();
var bpActionSize = brainParameters.VectorActionSize.Sum();
if (modelActionSize != bpActionSize)
{
return "Action Size of the model does not match. The BrainParameters expect " +

static string CheckContinuousActionOutputShape(
BrainParameters brainParameters, TensorShape shape, int modelActionSize)
{
var bpActionSize = brainParameters.vectorActionSize[0];
var bpActionSize = brainParameters.VectorActionSize[0];
if (modelActionSize != bpActionSize)
{
return "Action Size of the model does not match. The BrainParameters expect " +

12
com.unity.ml-agents/Runtime/Inference/GeneratorImpl.cs


{
readonly ITensorAllocator m_Allocator;
List<int> m_SensorIndices = new List<int>();
WriteAdapter m_WriteAdapter = new WriteAdapter();
ObservationWriter m_ObservationWriter = new ObservationWriter();
public VectorObservationGenerator(ITensorAllocator allocator)
{

foreach (var sensorIndex in m_SensorIndices)
{
var sensor = info.sensors[sensorIndex];
m_WriteAdapter.SetTarget(tensorProxy, agentIndex, tensorOffset);
var numWritten = sensor.Write(m_WriteAdapter);
m_ObservationWriter.SetTarget(tensorProxy, agentIndex, tensorOffset);
var numWritten = sensor.Write(m_ObservationWriter);
tensorOffset += numWritten;
}
Debug.AssertFormat(

{
readonly int m_SensorIndex;
readonly ITensorAllocator m_Allocator;
WriteAdapter m_WriteAdapter = new WriteAdapter();
ObservationWriter m_ObservationWriter = new ObservationWriter();
public VisualObservationInputGenerator(
int sensorIndex, ITensorAllocator allocator)

}
else
{
m_WriteAdapter.SetTarget(tensorProxy, agentIndex, 0);
sensor.Write(m_WriteAdapter);
m_ObservationWriter.SetTarget(tensorProxy, agentIndex, 0);
sensor.Write(m_ObservationWriter);
}
agentIndex++;
}

4
com.unity.ml-agents/Runtime/Inference/TensorApplier.cs


Dictionary<int, List<float>> memories,
object barracudaModel = null)
{
if (bp.vectorActionSpaceType == SpaceType.Continuous)
if (bp.VectorActionSpaceType == SpaceType.Continuous)
{
m_Dict[TensorNames.ActionOutput] = new ContinuousActionOutputApplier();
}

new DiscreteActionOutputApplier(bp.vectorActionSize, seed, allocator);
new DiscreteActionOutputApplier(bp.VectorActionSize, seed, allocator);
}
m_Dict[TensorNames.RecurrentOutput] = new MemoryOutputApplier(memories);

38
com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs


InferenceOnly
}
/// The Factory to generate policies.
/// A component for setting an <seealso cref="Agent"/> instance's behavior and
/// brain properties.
/// <remarks>At runtime, this component generates the agent's policy objects
/// according to the settings you specified in the Editor.</remarks>
[AddComponentMenu("ML Agents/Behavior Parameters", (int)MenuGroup.Default)]
public class BehaviorParameters : MonoBehaviour
{

/// <summary>
/// The associated <see cref="BrainParameters"/> for this behavior.
/// The associated <see cref="Policies.BrainParameters"/> for this behavior.
public BrainParameters brainParameters
public BrainParameters BrainParameters
{
get { return m_BrainParameters; }
internal set { m_BrainParameters = value; }

/// <summary>
/// The neural network model used when in inference mode.
/// This should not be set at runtime; use <see cref="Agent.SetModel(string,NNModel,InferenceDevice)"/>
/// This should not be set at runtime; use <see cref="Agent.SetModel(string,NNModel,Policies.InferenceDevice)"/>
public NNModel model
public NNModel Model
{
get { return m_Model; }
set { m_Model = value; UpdateAgentPolicy(); }

/// <summary>
/// How inference is performed for this Agent's model.
/// This should not be set at runtime; use <see cref="Agent.SetModel(string,NNModel,InferenceDevice)"/>
/// This should not be set at runtime; use <see cref="Agent.SetModel(string,NNModel,Policies.InferenceDevice)"/>
public InferenceDevice inferenceDevice
public InferenceDevice InferenceDevice
{
get { return m_InferenceDevice; }
set { m_InferenceDevice = value; UpdateAgentPolicy();}

/// <summary>
/// The BehaviorType for the Agent.
/// </summary>
public BehaviorType behaviorType
public BehaviorType BehaviorType
{
get { return m_BehaviorType; }
set { m_BehaviorType = value; UpdateAgentPolicy(); }

/// <summary>
/// The name of this behavior, which is used as a base name. See
/// <see cref="fullyQualifiedBehaviorName"/> for the full name.
/// This should not be set at runtime; use <see cref="Agent.SetModel(string,NNModel,InferenceDevice)"/>
/// <see cref="FullyQualifiedBehaviorName"/> for the full name.
/// This should not be set at runtime; use <see cref="Agent.SetModel(string,NNModel,Policies.InferenceDevice)"/>
public string behaviorName
public string BehaviorName
{
get { return m_BehaviorName; }
set { m_BehaviorName = value; UpdateAgentPolicy(); }

/// Whether or not to use all the sensor components attached to child GameObjects of the agent.
/// Note that changing this after the Agent has been initialized will not have any effect.
/// </summary>
public bool useChildSensors
public bool UseChildSensors
{
get { return m_UseChildSensors; }
set { m_UseChildSensors = value; }

/// Returns the behavior name, concatenated with any other metadata (i.e. team id).
/// </summary>
public string fullyQualifiedBehaviorName
public string FullyQualifiedBehaviorName
{
get { return m_BehaviorName + "?team=" + TeamId; }
}

switch (m_BehaviorType)
{
case BehaviorType.HeuristicOnly:
return new HeuristicPolicy(heuristic, m_BrainParameters.numActions);
return new HeuristicPolicy(heuristic, m_BrainParameters.NumActions);
case BehaviorType.InferenceOnly:
{
if (m_Model == null)

case BehaviorType.Default:
if (Academy.Instance.IsCommunicatorOn)
{
return new RemotePolicy(m_BrainParameters, fullyQualifiedBehaviorName);
return new RemotePolicy(m_BrainParameters, FullyQualifiedBehaviorName);
}
if (m_Model != null)
{

{
return new HeuristicPolicy(heuristic, m_BrainParameters.numActions);
return new HeuristicPolicy(heuristic, m_BrainParameters.NumActions);
return new HeuristicPolicy(heuristic, m_BrainParameters.numActions);
return new HeuristicPolicy(heuristic, m_BrainParameters.NumActions);
}
}

61
com.unity.ml-agents/Runtime/Policies/BrainParameters.cs


using System;
using UnityEngine;
using UnityEngine.Serialization;
namespace MLAgents.Policies
{

}
/// <summary>
/// Holds information about the Brain. It defines what are the inputs and outputs of the
/// Holds information about the brain. It defines what are the inputs and outputs of the
/// <remarks>
/// Set brain parameters for an <see cref="Agent"/> instance using the
/// <seealso cref="BehaviorParameters"/> component attached to the agent's [GameObject].
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
/// If continuous : The length of the float vector that represents the state.
/// If discrete : The number of possible values the state can take.
/// The size of the observation space.
public int vectorObservationSize = 1;
/// <remarks>An agent creates the observation vector in its
/// <see cref="Agent.CollectObservations(Sensors.VectorSensor)"/>
/// implementation.</remarks>
/// <value>
/// The length of the vector containing observation values.
/// </value>
[FormerlySerializedAs("vectorObservationSize")]
public int VectorObservationSize = 1;
[Range(1, 50)] public int numStackedVectorObservations = 1;
[FormerlySerializedAs("numStackedVectorObservations")]
[Range(1, 50)] public int NumStackedVectorObservations = 1;
/// If continuous : The length of the float vector that represents the action.
/// If discrete : The number of possible values the action can take.
/// The size of the action space.
public int[] vectorActionSize = new[] {1};
/// <remarks>The size specified is interpreted differently depending on whether
/// the agent uses the continuous or the discrete action space.</remarks>
/// <value>
/// For the continuous action space: the length of the float vector that represents
/// the action.
/// For the discrete action space: the number of branches in the action space.
/// </value>
[FormerlySerializedAs("vectorActionSize")]
public int[] VectorActionSize = new[] {1};
public string[] vectorActionDescriptions;
[FormerlySerializedAs("vectorActionDescriptions")]
public string[] VectorActionDescriptions;
public SpaceType vectorActionSpaceType = SpaceType.Discrete;
[FormerlySerializedAs("vectorActionSpaceType")]
public SpaceType VectorActionSpaceType = SpaceType.Discrete;
public int numActions
public int NumActions
switch (vectorActionSpaceType)
switch (VectorActionSpaceType)
return vectorActionSize.Length;
return VectorActionSize.Length;
return vectorActionSize[0];
return VectorActionSize[0];
default:
return 0;
}

{
return new BrainParameters
{
vectorObservationSize = vectorObservationSize,
numStackedVectorObservations = numStackedVectorObservations,
vectorActionSize = (int[])vectorActionSize.Clone(),
vectorActionDescriptions = (string[])vectorActionDescriptions.Clone(),
vectorActionSpaceType = vectorActionSpaceType
VectorObservationSize = VectorObservationSize,
NumStackedVectorObservations = NumStackedVectorObservations,
VectorActionSize = (int[])VectorActionSize.Clone(),
VectorActionDescriptions = (string[])VectorActionDescriptions.Clone(),
VectorActionSpaceType = VectorActionSpaceType
};
}
}

6
com.unity.ml-agents/Runtime/Policies/HeuristicPolicy.cs


bool m_Done;
bool m_DecisionRequested;
WriteAdapter m_WriteAdapter = new WriteAdapter();
ObservationWriter m_ObservationWriter = new ObservationWriter();
NullList m_NullList = new NullList();

{
if (sensor.GetCompressionType() == SensorCompressionType.None)
{
m_WriteAdapter.SetTarget(m_NullList, sensor.GetObservationShape(), 0);
sensor.Write(m_WriteAdapter);
m_ObservationWriter.SetTarget(m_NullList, sensor.GetObservationShape(), 0);
sensor.Write(m_ObservationWriter);
}
else
{

12
com.unity.ml-agents/Runtime/Sensors/CameraSensor.cs


/// <summary>
/// The Camera used for rendering the sensor observations.
/// </summary>
public Camera camera
public Camera Camera
{
get { return m_Camera; }
set { m_Camera = value; }

/// The compression type used by the sensor.
/// </summary>
public SensorCompressionType compressionType
public SensorCompressionType CompressionType
{
get { return m_CompressionType; }
set { m_CompressionType = value; }

}
/// <summary>
/// Writes out the generated, uncompressed image to the provided <see cref="WriteAdapter"/>.
/// Writes out the generated, uncompressed image to the provided <see cref="ObservationWriter"/>.
/// <param name="adapter">Where the observation is written to.</param>
/// <param name="writer">Where the observation is written to.</param>
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
var numWritten = Utilities.TextureToTensorProxy(texture, adapter, m_Grayscale);
var numWritten = Utilities.TextureToTensorProxy(texture, writer, m_Grayscale);
DestroyTexture(texture);
return numWritten;
}

20
com.unity.ml-agents/Runtime/Sensors/CameraSensorComponent.cs


/// <summary>
/// Camera object that provides the data to the sensor.
/// </summary>
public new Camera camera
public Camera Camera
{
get { return m_Camera; }
set { m_Camera = value; UpdateSensor(); }

/// Name of the generated <see cref="CameraSensor"/> object.
/// Note that changing this at runtime does not affect how the Agent sorts the sensors.
/// </summary>
public string sensorName
public string SensorName
{
get { return m_SensorName; }
set { m_SensorName = value; }

/// Width of the generated observation.
/// Note that changing this after the sensor is created has no effect.
/// </summary>
public int width
public int Width
{
get { return m_Width; }
set { m_Width = value; }

/// Height of the generated observation.
/// Note that changing this after the sensor is created has no effect.
/// </summary>
public int height
public int Height
{
get { return m_Height; }
set { m_Height = value; }

/// Whether to generate grayscale images or color.
/// Note that changing this after the sensor is created has no effect.
/// </summary>
public bool grayscale
public bool Grayscale
{
get { return m_Grayscale; }
set { m_Grayscale = value; }

/// <summary>
/// The compression type to use for the sensor.
/// </summary>
public SensorCompressionType compression
public SensorCompressionType CompressionType
{
get { return m_Compression; }
set { m_Compression = value; UpdateSensor(); }

/// <returns>The created <see cref="CameraSensor"/> object for this component.</returns>
public override ISensor CreateSensor()
{
m_Sensor = new CameraSensor(m_Camera, m_Width, m_Height, grayscale, m_SensorName, compression);
m_Sensor = new CameraSensor(m_Camera, m_Width, m_Height, Grayscale, m_SensorName, m_Compression);
return m_Sensor;
}

/// <returns>The observation shape of the associated <see cref="CameraSensor"/> object.</returns>
public override int[] GetObservationShape()
{
return CameraSensor.GenerateShape(m_Width, m_Height, grayscale);
return CameraSensor.GenerateShape(m_Width, m_Height, Grayscale);
}
/// <summary>

{
if (m_Sensor != null)
{
m_Sensor.camera = m_Camera;
m_Sensor.compressionType = m_Compression;
m_Sensor.Camera = m_Camera;
m_Sensor.CompressionType = m_Compression;
}
}
}

6
com.unity.ml-agents/Runtime/Sensors/ISensor.cs


int[] GetObservationShape();
/// <summary>
/// Write the observation data directly to the <see cref="WriteAdapter"/>.
/// Write the observation data directly to the <see cref="ObservationWriter"/>.
/// <param name="adapter">Where the observations will be written to.</param>
/// <param name="writer">Where the observations will be written to.</param>
int Write(WriteAdapter adapter);
int Write(ObservationWriter writer);
/// <summary>
/// Return a compressed representation of the observation. For small observations,

112
com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensor.cs


/// <summary>
/// Length of the rays to cast. This will be scaled up or down based on the scale of the transform.
/// </summary>
public float rayLength;
public float RayLength;
public IReadOnlyList<string> detectableTags;
public IReadOnlyList<string> DetectableTags;
public IReadOnlyList<float> angles;
public IReadOnlyList<float> Angles;
public float startOffset;
public float StartOffset;
public float endOffset;
public float EndOffset;
public float castRadius;
public float CastRadius;
public Transform transform;
public Transform Transform;
public RayPerceptionCastType castType;
public RayPerceptionCastType CastType;
public int layerMask;
public int LayerMask;
/// <summary>
/// Returns the expected number of floats in the output.

{
return (detectableTags.Count + 2) * angles.Count;
return (DetectableTags.Count + 2) * Angles.Count;
}
/// <summary>

/// <returns>A tuple of the start and end positions in world space.</returns>
public (Vector3 StartPositionWorld, Vector3 EndPositionWorld) RayExtents(int rayIndex)
{
var angle = angles[rayIndex];
var angle = Angles[rayIndex];
if (castType == RayPerceptionCastType.Cast3D)
if (CastType == RayPerceptionCastType.Cast3D)
startPositionLocal = new Vector3(0, startOffset, 0);
endPositionLocal = PolarToCartesian3D(rayLength, angle);
endPositionLocal.y += endOffset;
startPositionLocal = new Vector3(0, StartOffset, 0);
endPositionLocal = PolarToCartesian3D(RayLength, angle);
endPositionLocal.y += EndOffset;
endPositionLocal = PolarToCartesian2D(rayLength, angle);
endPositionLocal = PolarToCartesian2D(RayLength, angle);
var startPositionWorld = transform.TransformPoint(startPositionLocal);
var endPositionWorld = transform.TransformPoint(endPositionLocal);
var startPositionWorld = Transform.TransformPoint(startPositionLocal);
var endPositionWorld = Transform.TransformPoint(endPositionLocal);
return (StartPositionWorld : startPositionWorld, EndPositionWorld : endPositionWorld);
}

/// <summary>
/// Whether or not the ray hit anything.
/// </summary>
public bool hasHit;
public bool HasHit;
/// Whether or not the ray hit an object whose tag is in the input's detectableTags list.
/// Whether or not the ray hit an object whose tag is in the input's DetectableTags list.
public bool hitTaggedObject;
public bool HitTaggedObject;
/// The index of the hit object's tag in the detectableTags list, or -1 if there was no hit, or the
/// The index of the hit object's tag in the DetectableTags list, or -1 if there was no hit, or the
public int hitTagIndex;
public int HitTagIndex;
public float hitFraction;
public float HitFraction;
/// 1. A one-hot encoding for detectable tags. For example, if detectableTags.Length = n, the
/// 1. A one-hot encoding for detectable tags. For example, if DetectableTags.Length = n, the
/// first n elements of the sublist will be a one-hot encoding of the detectableTag that was hit, or
/// all zeroes otherwise.
/// 2. The 'numDetectableTags' element of the sublist will be 1 if the ray missed everything, or 0 if it hit

/// </summary>
/// <param name="numDetectableTags"></param>
/// <param name="rayIndex"></param>
/// <param name="buffer">Output buffer. The size must be equal to (numDetectableTags+2) * rayOutputs.Length</param>
/// <param name="buffer">Output buffer. The size must be equal to (numDetectableTags+2) * RayOutputs.Length</param>
if (hitTaggedObject)
if (HitTaggedObject)
buffer[bufferOffset + hitTagIndex] = 1f;
buffer[bufferOffset + HitTagIndex] = 1f;
buffer[bufferOffset + numDetectableTags] = hasHit ? 0f : 1f;
buffer[bufferOffset + numDetectableTags + 1] = hitFraction;
buffer[bufferOffset + numDetectableTags] = HasHit ? 0f : 1f;
buffer[bufferOffset + numDetectableTags + 1] = HitFraction;
}
}

public RayOutput[] rayOutputs;
public RayOutput[] RayOutputs;
}
/// <summary>

/// <summary>
/// Computes the ray perception observations and saves them to the provided
/// <see cref="WriteAdapter"/>.
/// <see cref="ObservationWriter"/>.
/// <param name="adapter">Where the ray perception observations are written to.</param>
/// <param name="writer">Where the ray perception observations are written to.</param>
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
var numRays = m_RayPerceptionInput.angles.Count;
var numDetectableTags = m_RayPerceptionInput.detectableTags.Count;
var numRays = m_RayPerceptionInput.Angles.Count;
var numDetectableTags = m_RayPerceptionInput.DetectableTags.Count;
if (m_DebugDisplayInfo != null)
{

rayOutput.ToFloatArray(numDetectableTags, rayIndex, m_Observations);
}
// Finally, add the observations to the WriteAdapter
adapter.AddRange(m_Observations);
// Finally, add the observations to the ObservationWriter
writer.AddRange(m_Observations);
}
return m_Observations.Length;
}

public static RayPerceptionOutput Perceive(RayPerceptionInput input)
{
RayPerceptionOutput output = new RayPerceptionOutput();
output.rayOutputs = new RayPerceptionOutput.RayOutput[input.angles.Count];
output.RayOutputs = new RayPerceptionOutput.RayOutput[input.Angles.Count];
for (var rayIndex = 0; rayIndex < input.angles.Count; rayIndex++)
for (var rayIndex = 0; rayIndex < input.Angles.Count; rayIndex++)
output.rayOutputs[rayIndex] = PerceiveSingleRay(input, rayIndex, out debugRay);
output.RayOutputs[rayIndex] = PerceiveSingleRay(input, rayIndex, out debugRay);
}
return output;

out DebugDisplayInfo.RayInfo debugRayOut
)
{
var unscaledRayLength = input.rayLength;
var unscaledCastRadius = input.castRadius;
var unscaledRayLength = input.RayLength;
var unscaledCastRadius = input.CastRadius;
var extents = input.RayExtents(rayIndex);
var startPositionWorld = extents.StartPositionWorld;

float hitFraction;
GameObject hitObject;
if (input.castType == RayPerceptionCastType.Cast3D)
if (input.CastType == RayPerceptionCastType.Cast3D)
scaledRayLength, input.layerMask);
scaledRayLength, input.LayerMask);
scaledRayLength, input.layerMask);
scaledRayLength, input.LayerMask);
}
// If scaledRayLength is 0, we still could have a hit with sphere casts (maybe?).

if (scaledCastRadius > 0f)
{
rayHit = Physics2D.CircleCast(startPositionWorld, scaledCastRadius, rayDirection,
scaledRayLength, input.layerMask);
scaledRayLength, input.LayerMask);
rayHit = Physics2D.Raycast(startPositionWorld, rayDirection, scaledRayLength, input.layerMask);
rayHit = Physics2D.Raycast(startPositionWorld, rayDirection, scaledRayLength, input.LayerMask);
}
castHit = rayHit;

var rayOutput = new RayPerceptionOutput.RayOutput
{
hasHit = castHit,
hitFraction = hitFraction,
hitTaggedObject = false,
hitTagIndex = -1
HasHit = castHit,
HitFraction = hitFraction,
HitTaggedObject = false,
HitTagIndex = -1
for (var i = 0; i < input.detectableTags.Count; i++)
for (var i = 0; i < input.DetectableTags.Count; i++)
if (hitObject.CompareTag(input.detectableTags[i]))
if (hitObject.CompareTag(input.DetectableTags[i]))
rayOutput.hitTaggedObject = true;
rayOutput.hitTagIndex = i;
rayOutput.HitTaggedObject = true;
rayOutput.HitTagIndex = i;
break;
}
}

2
com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensorComponent2D.cs


public RayPerceptionSensorComponent2D()
{
// Set to the 2D defaults (just in case they ever diverge).
rayLayerMask = Physics2D.DefaultRaycastLayers;
RayLayerMask = Physics2D.DefaultRaycastLayers;
}
/// <inheritdoc/>

8
com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensorComponent3D.cs


/// <summary>
/// Ray start is offset up or down by this amount.
/// </summary>
public float startVerticalOffset
public float StartVerticalOffset
{
get => m_StartVerticalOffset;
set { m_StartVerticalOffset = value; UpdateSensor(); }

/// <summary>
/// Ray end is offset up or down by this amount.
/// </summary>
public float endVerticalOffset
public float EndVerticalOffset
{
get => m_EndVerticalOffset;
set { m_EndVerticalOffset = value; UpdateSensor(); }

/// <inheritdoc/>
public override float GetStartVerticalOffset()
{
return startVerticalOffset;
return StartVerticalOffset;
return endVerticalOffset;
return EndVerticalOffset;
}
}
}

56
com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensorComponentBase.cs


/// The name of the Sensor that this component wraps.
/// Note that changing this at runtime does not affect how the Agent sorts the sensors.
/// </summary>
public string sensorName
public string SensorName
{
get { return m_SensorName; }
set { m_SensorName = value; }

/// List of tags in the scene to compare against.
/// Note that this should not be changed at runtime.
/// </summary>
public List<string> detectableTags
public List<string> DetectableTags
{
get { return m_DetectableTags; }
set { m_DetectableTags = value; }

/// Number of rays to the left and right of center.
/// Note that this should not be changed at runtime.
/// </summary>
public int raysPerDirection
public int RaysPerDirection
{
get { return m_RaysPerDirection; }
// Note: can't change at runtime

/// Cone size for rays. Using 90 degrees will cast rays to the left and right.
/// Greater than 90 degrees will go backwards.
/// </summary>
public float maxRayDegrees
public float MaxRayDegrees
{
get => m_MaxRayDegrees;
set { m_MaxRayDegrees = value; UpdateSensor(); }

/// <summary>
/// Radius of sphere to cast. Set to zero for raycasts.
/// </summary>
public float sphereCastRadius
public float SphereCastRadius
{
get => m_SphereCastRadius;
set { m_SphereCastRadius = value; UpdateSensor(); }

/// <summary>
/// Length of the rays to cast.
/// </summary>
public float rayLength
public float RayLength
{
get => m_RayLength;
set { m_RayLength = value; UpdateSensor(); }

/// <summary>
/// Controls which layers the rays can hit.
/// </summary>
public LayerMask rayLayerMask
public LayerMask RayLayerMask
{
get => m_RayLayerMask;
set { m_RayLayerMask = value; UpdateSensor(); }

[Range(1, 50)]
[Tooltip("Whether to stack previous observations. Using 1 means no previous observations.")]
[Tooltip("Number of raycast results that will be stacked before being fed to the neural network.")]
int m_ObservationStacks = 1;
/// <summary>

public int observationStacks
public int ObservationStacks
{
get { return m_ObservationStacks; }
set { m_ObservationStacks = value; }

/// <summary>
/// Get the RayPerceptionSensor that was created.
/// </summary>
public RayPerceptionSensor raySensor
public RayPerceptionSensor RaySensor
{
get => m_RaySensor;
}

m_RaySensor = new RayPerceptionSensor(m_SensorName, rayPerceptionInput);
if (observationStacks != 1)
if (ObservationStacks != 1)
var stackingSensor = new StackingSensor(m_RaySensor, observationStacks);
var stackingSensor = new StackingSensor(m_RaySensor, ObservationStacks);
return stackingSensor;
}

/// <returns></returns>
public override int[] GetObservationShape()
{
var numRays = 2 * raysPerDirection + 1;
var numRays = 2 * RaysPerDirection + 1;
var stacks = observationStacks > 1 ? observationStacks : 1;
var stacks = ObservationStacks > 1 ? ObservationStacks : 1;
return new[] { obsSize * stacks };
}

/// <returns></returns>
public RayPerceptionInput GetRayPerceptionInput()
{
var rayAngles = GetRayAngles(raysPerDirection, maxRayDegrees);
var rayAngles = GetRayAngles(RaysPerDirection, MaxRayDegrees);
rayPerceptionInput.rayLength = rayLength;
rayPerceptionInput.detectableTags = detectableTags;
rayPerceptionInput.angles = rayAngles;
rayPerceptionInput.startOffset = GetStartVerticalOffset();
rayPerceptionInput.endOffset = GetEndVerticalOffset();
rayPerceptionInput.castRadius = sphereCastRadius;
rayPerceptionInput.transform = transform;
rayPerceptionInput.castType = GetCastType();
rayPerceptionInput.layerMask = rayLayerMask;
rayPerceptionInput.RayLength = RayLength;
rayPerceptionInput.DetectableTags = DetectableTags;
rayPerceptionInput.Angles = rayAngles;
rayPerceptionInput.StartOffset = GetStartVerticalOffset();
rayPerceptionInput.EndOffset = GetEndVerticalOffset();
rayPerceptionInput.CastRadius = SphereCastRadius;
rayPerceptionInput.Transform = transform;
rayPerceptionInput.CastType = GetCastType();
rayPerceptionInput.LayerMask = RayLayerMask;
return rayPerceptionInput;
}

else
{
var rayInput = GetRayPerceptionInput();
for (var rayIndex = 0; rayIndex < rayInput.angles.Count; rayIndex++)
for (var rayIndex = 0; rayIndex < rayInput.Angles.Count; rayIndex++)
{
DebugDisplayInfo.RayInfo debugRay;
RayPerceptionSensor.PerceiveSingleRay(rayInput, rayIndex, out debugRay);

var startPositionWorld = rayInfo.worldStart;
var endPositionWorld = rayInfo.worldEnd;
var rayDirection = endPositionWorld - startPositionWorld;
rayDirection *= rayInfo.rayOutput.hitFraction;
rayDirection *= rayInfo.rayOutput.HitFraction;
var lerpT = rayInfo.rayOutput.hitFraction * rayInfo.rayOutput.hitFraction;
var lerpT = rayInfo.rayOutput.HitFraction * rayInfo.rayOutput.HitFraction;
var color = Color.Lerp(rayHitColor, rayMissColor, lerpT);
color.a *= alpha;
Gizmos.color = color;

if (rayInfo.rayOutput.hasHit)
if (rayInfo.rayOutput.HasHit)
{
var hitRadius = Mathf.Max(rayInfo.castRadius, .05f);
Gizmos.DrawWireSphere(startPositionWorld + rayDirection, hitRadius);

6
com.unity.ml-agents/Runtime/Sensors/RenderTextureSensor.cs


/// <summary>
/// The compression type used by the sensor.
/// </summary>
public SensorCompressionType compressionType
public SensorCompressionType CompressionType
{
get { return m_CompressionType; }
set { m_CompressionType = value; }

}
/// <inheritdoc/>
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
var numWritten = Utilities.TextureToTensorProxy(texture, adapter, m_Grayscale);
var numWritten = Utilities.TextureToTensorProxy(texture, writer, m_Grayscale);
DestroyTexture(texture);
return numWritten;
}

22
com.unity.ml-agents/Runtime/Sensors/RenderTextureSensorComponent.cs


RenderTextureSensor m_Sensor;
/// <summary>
/// The <see cref="RenderTexture"/> instance that the associated
/// The <see cref="UnityEngine.RenderTexture"/> instance that the associated
/// <see cref="RenderTextureSensor"/> wraps.
/// </summary>
[HideInInspector, SerializeField, FormerlySerializedAs("renderTexture")]

/// Stores the <see cref="RenderTexture"/> associated with this sensor.
/// Stores the <see cref="UnityEngine.RenderTexture"/> associated with this sensor.
public RenderTexture renderTexture
public RenderTexture RenderTexture
{
get { return m_RenderTexture; }
set { m_RenderTexture = value; }

/// Name of the generated <see cref="RenderTextureSensor"/>.
/// Note that changing this at runtime does not affect how the Agent sorts the sensors.
/// </summary>
public string sensorName
public string SensorName
{
get { return m_SensorName; }
set { m_SensorName = value; }

/// Whether the RenderTexture observation should be converted to grayscale or not.
/// Note that changing this after the sensor is created has no effect.
/// </summary>
public bool grayscale
public bool Grayscale
{
get { return m_Grayscale; }
set { m_Grayscale = value; }

/// <summary>
/// Compression type for the render texture observation.
/// </summary>
public SensorCompressionType compression
public SensorCompressionType CompressionType
{
get { return m_Compression; }
set { m_Compression = value; UpdateSensor(); }

public override ISensor CreateSensor()
{
m_Sensor = new RenderTextureSensor(renderTexture, grayscale, sensorName, compression);
m_Sensor = new RenderTextureSensor(RenderTexture, Grayscale, SensorName, m_Compression);
return m_Sensor;
}

var width = renderTexture != null ? renderTexture.width : 0;
var height = renderTexture != null ? renderTexture.height : 0;
var width = RenderTexture != null ? RenderTexture.width : 0;
var height = RenderTexture != null ? RenderTexture.height : 0;
return new[] { height, width, grayscale ? 1 : 3 };
return new[] { height, width, Grayscale ? 1 : 3 };
}
/// <summary>

{
if (m_Sensor != null)
{
m_Sensor.compressionType = m_Compression;
m_Sensor.CompressionType = m_Compression;
}
}
}

12
com.unity.ml-agents/Runtime/Sensors/StackingSensor.cs


float[][] m_StackedObservations;
int m_CurrentIndex;
WriteAdapter m_LocalAdapter = new WriteAdapter();
ObservationWriter m_LocalWriter = new ObservationWriter();
/// <summary>
/// Initializes the sensor.

}
/// <inheritdoc/>
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
// First, call the wrapped sensor's write method. Make sure to use our own adapter, not the passed one.
// First, call the wrapped sensor's write method. Make sure to use our own writer, not the passed one.
m_LocalAdapter.SetTarget(m_StackedObservations[m_CurrentIndex], wrappedShape, 0);
m_WrappedSensor.Write(m_LocalAdapter);
m_LocalWriter.SetTarget(m_StackedObservations[m_CurrentIndex], wrappedShape, 0);
m_WrappedSensor.Write(m_LocalWriter);
// Now write the saved observations (oldest first)
var numWritten = 0;

adapter.AddRange(m_StackedObservations[obsIndex], numWritten);
writer.AddRange(m_StackedObservations[obsIndex], numWritten);
numWritten += m_UnstackedObservationSize;
}

14
com.unity.ml-agents/Runtime/Sensors/VectorSensor.cs


using System.Collections.Generic;
using System.Collections.ObjectModel;
using UnityEngine;
namespace MLAgents.Sensors

}
/// <inheritdoc/>
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
{
var expectedObservations = m_Shape[0];
if (m_Observations.Count > expectedObservations)

m_Observations.Add(0);
}
}
adapter.AddRange(m_Observations);
writer.AddRange(m_Observations);
}
/// <summary>
/// Returns a read-only view of the observations that added.
/// </summary>
/// <returns>A read-only view of the observations list.</returns>
internal ReadOnlyCollection<float> GetObservations()
{
return m_Observations.AsReadOnly();
}
/// <inheritdoc/>

8
com.unity.ml-agents/Runtime/Sensors/ObservationWriter.cs


/// <summary>
/// Allows sensors to write to both TensorProxy and float arrays/lists.
/// </summary>
public class WriteAdapter
public class ObservationWriter
{
IList<float> m_Data;
int m_Offset;

TensorShape m_TensorShape;
internal WriteAdapter() { }
internal ObservationWriter() { }
/// Set the adapter to write to an IList at the given channelOffset.
/// Set the writer to write to an IList at the given channelOffset.
/// </summary>
/// <param name="data">Float array or list that will be written to.</param>
/// <param name="shape">Shape of the observations to be written.</param>

}
/// <summary>
/// Set the adapter to write to a TensorProxy at the given batch and channel offset.
/// Set the writer to write to a TensorProxy at the given batch and channel offset.
/// </summary>
/// <param name="tensorProxy">Tensor proxy that will be written to.</param>
/// <param name="batchIndex">Batch index in the tensor proxy (i.e. the index of the Agent).</param>

58
com.unity.ml-agents/Runtime/SideChannels/EngineConfigurationChannel.cs


namespace MLAgents.SideChannels
{
public class EngineConfigurationChannel : SideChannel
internal class EngineConfigurationChannel : SideChannel
private enum ConfigurationType : int
{
ScreenResolution = 0,
QualityLevel = 1,
TimeScale = 2,
TargetFrameRate = 3,
CaptureFrameRate = 4
}
const string k_EngineConfigId = "e951342c-4f7e-11ea-b238-784f4387d1f7";
/// <summary>

}
/// <inheritdoc/>
public override void OnMessageReceived(IncomingMessage msg)
protected override void OnMessageReceived(IncomingMessage msg)
var width = msg.ReadInt32();
var height = msg.ReadInt32();
var qualityLevel = msg.ReadInt32();
var timeScale = msg.ReadFloat32();
var targetFrameRate = msg.ReadInt32();
timeScale = Mathf.Clamp(timeScale, 1, 100);
Screen.SetResolution(width, height, false);
QualitySettings.SetQualityLevel(qualityLevel, true);
Time.timeScale = timeScale;
Time.captureFramerate = 60;
Application.targetFrameRate = targetFrameRate;
var messageType = (ConfigurationType)msg.ReadInt32();
switch (messageType)
{
case ConfigurationType.ScreenResolution:
var width = msg.ReadInt32();
var height = msg.ReadInt32();
Screen.SetResolution(width, height, false);
break;
case ConfigurationType.QualityLevel:
var qualityLevel = msg.ReadInt32();
QualitySettings.SetQualityLevel(qualityLevel, true);
break;
case ConfigurationType.TimeScale:
var timeScale = msg.ReadFloat32();
timeScale = Mathf.Clamp(timeScale, 1, 100);
Time.timeScale = timeScale;
break;
case ConfigurationType.TargetFrameRate:
var targetFrameRate = msg.ReadInt32();
Application.targetFrameRate = targetFrameRate;
break;
case ConfigurationType.CaptureFrameRate:
var captureFrameRate = msg.ReadInt32();
Time.captureFramerate = captureFrameRate;
break;
default:
Debug.LogWarning(
"Unknown engine configuration received from Python. Make sure" +
" your Unity and Python versions are compatible.");
break;
}
}
}
}

30
com.unity.ml-agents/Runtime/SideChannels/FloatPropertiesChannel.cs


}
/// <inheritdoc/>
public override void OnMessageReceived(IncomingMessage msg)
protected override void OnMessageReceived(IncomingMessage msg)
{
var key = msg.ReadString();
var value = msg.ReadFloat32();

action?.Invoke(value);
}
/// <inheritdoc/>
public void SetProperty(string key, float value)
/// <summary>
/// Sets one of the float properties of the environment. This data will be sent to Python.
/// </summary>
/// <param name="key"> The string identifier of the property.</param>
/// <param name="value"> The float value of the property.</param>
public void Set(string key, float value)
{
m_FloatProperties[key] = value;
using (var msgOut = new OutgoingMessage())

action?.Invoke(value);
}
public float GetPropertyWithDefault(string key, float defaultValue)
/// <summary>
/// Get an Environment property with a default value. If there is a value for this property,
/// it will be returned, otherwise, the default value will be returned.
/// </summary>
/// <param name="key"> The string identifier of the property.</param>
/// <param name="defaultValue"> The default value of the property.</param>
/// <returns></returns>
public float GetWithDefault(string key, float defaultValue)
{
float valueOut;
bool hasKey = m_FloatProperties.TryGetValue(key, out valueOut);

/// <summary>
/// Registers an action to be performed everytime the property is changed.
/// </summary>
/// <param name="key"> The string identifier of the property.</param>
/// <param name="action"> The action that ill be performed. Takes a float as input.</param>
public IList<string> ListProperties()
/// <summary>
/// Returns a list of all the string identifiers of the properties currently present.
/// </summary>
/// <returns> The list of string identifiers </returns>
public IList<string> Keys()
{
return new List<string>(m_FloatProperties.Keys);
}

2
com.unity.ml-agents/Runtime/SideChannels/RawBytesChannel.cs


}
/// <inheritdoc/>
public override void OnMessageReceived(IncomingMessage msg)
protected override void OnMessageReceived(IncomingMessage msg)
{
m_MessagesReceived.Add(msg.GetRawBytes());
}

17
com.unity.ml-agents/Runtime/SideChannels/SideChannel.cs


/// Side channels provide an alternative mechanism of sending/receiving data from Unity
/// to Python that is outside of the traditional machine learning loop. ML-Agents provides
/// some specific implementations of side channels, but users can create their own.
///
/// To create your own, you'll need to create two, new mirrored classes, one in Unity (by
/// extending <see cref="SideChannel"/>) and another in Python by extending a Python class
/// also called SideChannel. Then, within your project, use
/// <see cref="SideChannelsManager.RegisterSideChannel"/> and
/// <see cref="SideChannelsManager.UnregisterSideChannel"/> to register and unregister your
/// custom side channel.
/// </summary>
public abstract class SideChannel
{

protected set;
}
internal void ProcessMessage(byte[] msg)
{
using (var incomingMsg = new IncomingMessage(msg))
{
OnMessageReceived(incomingMsg);
}
}
public abstract void OnMessageReceived(IncomingMessage msg);
protected abstract void OnMessageReceived(IncomingMessage msg);
/// <summary>
/// Queues a message to be sent to Python during the next simulation step.

45
com.unity.ml-agents/Runtime/SideChannels/StatsSideChannel.cs


namespace MLAgents.SideChannels
{
/// <summary>
/// Determines the behavior of how multiple stats within the same summary period are combined.
/// A Side Channel for sending <see cref="StatsRecorder"/> data.
public enum StatAggregationMethod
{
/// <summary>
/// Values within the summary period are averaged before reporting.
/// Note that values from the same C# environment in the same step may replace each other.
/// </summary>
Average = 0,
/// <summary>
/// Only the most recent value is reported.
/// To avoid conflicts between multiple environments, the ML Agents environment will only
/// keep stats from worker index 0.
/// </summary>
MostRecent = 1
}
/// <summary>
/// Add stats (key-value pairs) for reporting. The ML Agents environment will send these to a StatsReporter
/// instance, which means the values will appear in the Tensorboard summary, as well as trainer gauges.
/// Note that stats are only written every summary_frequency steps; See <see cref="StatAggregationMethod"/>
/// for options on how multiple values are handled.
/// </summary>
public class StatsSideChannel : SideChannel
internal class StatsSideChannel : SideChannel
/// Initializes the side channel with the provided channel ID.
/// The constructor is internal because only one instance is
/// supported at a time, and is created by the Academy.
/// Initializes the side channel. The constructor is internal because only one instance is
/// supported at a time.
/// </summary>
internal StatsSideChannel()
{

/// <summary>
/// Add a stat value for reporting. This will appear in the Tensorboard summary and trainer gauges.
/// You can nest stats in Tensorboard with "/".
/// Note that stats are only written to Tensorboard each summary_frequency steps; if a stat is
/// received multiple times, only the most recent version is used.
/// To avoid conflicts between multiple environments, only stats from worker index 0 are used.
/// Add a stat value for reporting.
/// <param name="value">The stat value. You can nest stats in Tensorboard by using "/". </param>
/// <param name="value">The stat value.</param>
public void AddStat(
string key, float value, StatAggregationMethod aggregationMethod = StatAggregationMethod.Average
)
public void AddStat(string key, float value, StatAggregationMethod aggregationMethod)
{
using (var msg = new OutgoingMessage())
{

}
/// <inheritdoc/>
public override void OnMessageReceived(IncomingMessage msg)
protected override void OnMessageReceived(IncomingMessage msg)
{
throw new UnityAgentsException("StatsSideChannel should never receive messages.");
}

2
com.unity.ml-agents/Runtime/SideChannels/EnvironmentParametersChannel.cs.meta


fileFormatVersion: 2
guid: 2506dff31271f49298fbff21e13fa8b6
guid: a849760d5bec946b884984e35c66fcfa
MonoImporter:
externalObjects: {}
serializedVersion: 2

16
com.unity.ml-agents/Runtime/Utilities.cs


internal static class Utilities
{
/// <summary>
/// Puts a Texture2D into a WriteAdapter.
/// Puts a Texture2D into a ObservationWriter.
/// <param name="adapter">
/// Adapter to fill with Texture data.
/// <param name="obsWriter">
/// Writer to fill with Texture data.
/// </param>
/// <param name="grayScale">
/// If set to <c>true</c> the textures will be converted to grayscale before

internal static int TextureToTensorProxy(
Texture2D texture,
WriteAdapter adapter,
ObservationWriter obsWriter,
bool grayScale)
{
var width = texture.width;

var currentPixel = texturePixels[(height - h - 1) * width + w];
if (grayScale)
{
adapter[h, w, 0] =
obsWriter[h, w, 0] =
adapter[h, w, 0] = currentPixel.r / 255.0f;
adapter[h, w, 1] = currentPixel.g / 255.0f;
adapter[h, w, 2] = currentPixel.b / 255.0f;
obsWriter[h, w, 0] = currentPixel.r / 255.0f;
obsWriter[h, w, 1] = currentPixel.g / 255.0f;
obsWriter[h, w, 2] = currentPixel.b / 255.0f;
}
}
}

2
com.unity.ml-agents/Tests/Editor/BehaviorParameterTests.cs


{
var gameObj = new GameObject();
var bp = gameObj.AddComponent<BehaviorParameters>();
bp.behaviorType = BehaviorType.InferenceOnly;
bp.BehaviorType = BehaviorType.InferenceOnly;
Assert.Throws<UnityAgentsException>(() =>
{

34
com.unity.ml-agents/Tests/Editor/DemonstrationTests.cs


var gameobj = new GameObject("gameObj");
var bp = gameobj.AddComponent<BehaviorParameters>();
bp.brainParameters.vectorObservationSize = 3;
bp.brainParameters.numStackedVectorObservations = 2;
bp.brainParameters.vectorActionDescriptions = new[] { "TestActionA", "TestActionB" };
bp.brainParameters.vectorActionSize = new[] { 2, 2 };
bp.brainParameters.vectorActionSpaceType = SpaceType.Discrete;
bp.BrainParameters.VectorObservationSize = 3;
bp.BrainParameters.NumStackedVectorObservations = 2;
bp.BrainParameters.VectorActionDescriptions = new[] { "TestActionA", "TestActionB" };
bp.BrainParameters.VectorActionSize = new[] { 2, 2 };
bp.BrainParameters.VectorActionSpaceType = SpaceType.Discrete;
var agent = gameobj.AddComponent<TestAgent>();

demoRec.record = true;
demoRec.demonstrationName = k_DemoName;
demoRec.demonstrationDirectory = k_DemoDirectory;
demoRec.Record = true;
demoRec.DemonstrationName = k_DemoName;
demoRec.DemonstrationDirectory = k_DemoDirectory;
var demoWriter = demoRec.LazyInitialize(fileSystem);
Assert.IsTrue(fileSystem.Directory.Exists(k_DemoDirectory));

{
var agentGo1 = new GameObject("TestAgent");
var bpA = agentGo1.AddComponent<BehaviorParameters>();
bpA.brainParameters.vectorObservationSize = 3;
bpA.brainParameters.numStackedVectorObservations = 1;
bpA.brainParameters.vectorActionDescriptions = new[] { "TestActionA", "TestActionB" };
bpA.brainParameters.vectorActionSize = new[] { 2, 2 };
bpA.brainParameters.vectorActionSpaceType = SpaceType.Discrete;
bpA.BrainParameters.VectorObservationSize = 3;
bpA.BrainParameters.NumStackedVectorObservations = 1;
bpA.BrainParameters.VectorActionDescriptions = new[] { "TestActionA", "TestActionB" };
bpA.BrainParameters.VectorActionSize = new[] { 2, 2 };
bpA.BrainParameters.VectorActionSpaceType = SpaceType.Discrete;
agentGo1.AddComponent<ObservationAgent>();
var agent1 = agentGo1.GetComponent<ObservationAgent>();

var fileSystem = new MockFileSystem();
demoRecorder.demonstrationDirectory = k_DemoDirectory;
demoRecorder.demonstrationName = "TestBrain";
demoRecorder.record = true;
demoRecorder.DemonstrationDirectory = k_DemoDirectory;
demoRecorder.DemonstrationName = "TestBrain";
demoRecorder.Record = true;
demoRecorder.LazyInitialize(fileSystem);
var agentEnableMethod = typeof(Agent).GetMethod("OnEnable",

var obs = agentInfoProto.Observations[2]; // skip dummy sensors
{
var vecObs = obs.FloatData.Data;
Assert.AreEqual(bpA.brainParameters.vectorObservationSize, vecObs.Count);
Assert.AreEqual(bpA.BrainParameters.VectorObservationSize, vecObs.Count);
for (var i = 0; i < vecObs.Count; i++)
{
Assert.AreEqual((float)i + 1, vecObs[i]);

26
com.unity.ml-agents/Tests/Editor/EditModeTestActionMasker.cs


public void FailsWithContinuous()
{
var bp = new BrainParameters();
bp.vectorActionSpaceType = SpaceType.Continuous;
bp.vectorActionSize = new[] {4};
bp.VectorActionSpaceType = SpaceType.Continuous;
bp.VectorActionSize = new[] {4};
var masker = new DiscreteActionMasker(bp);
masker.SetMask(0, new[] {0});
Assert.Catch<UnityAgentsException>(() => masker.GetMask());

public void NullMask()
{
var bp = new BrainParameters();
bp.vectorActionSpaceType = SpaceType.Discrete;
bp.VectorActionSpaceType = SpaceType.Discrete;
var masker = new DiscreteActionMasker(bp);
var mask = masker.GetMask();
Assert.IsNull(mask);

public void FirstBranchMask()
{
var bp = new BrainParameters();
bp.vectorActionSpaceType = SpaceType.Discrete;
bp.vectorActionSize = new[] {4, 5, 6};
bp.VectorActionSpaceType = SpaceType.Discrete;
bp.VectorActionSize = new[] {4, 5, 6};
var masker = new DiscreteActionMasker(bp);
var mask = masker.GetMask();
Assert.IsNull(mask);

{
var bp = new BrainParameters
{
vectorActionSpaceType = SpaceType.Discrete,
vectorActionSize = new[] { 4, 5, 6 }
VectorActionSpaceType = SpaceType.Discrete,
VectorActionSize = new[] { 4, 5, 6 }
};
var masker = new DiscreteActionMasker(bp);
masker.SetMask(1, new[] {1, 2, 3});

{
var bp = new BrainParameters
{
vectorActionSpaceType = SpaceType.Discrete,
vectorActionSize = new[] { 4, 5, 6 }
VectorActionSpaceType = SpaceType.Discrete,
VectorActionSize = new[] { 4, 5, 6 }
};
var masker = new DiscreteActionMasker(bp);
masker.SetMask(1, new[] {1, 2, 3});

{
var bp = new BrainParameters
{
vectorActionSpaceType = SpaceType.Discrete,
vectorActionSize = new[] { 4, 5, 6 }
VectorActionSpaceType = SpaceType.Discrete,
VectorActionSize = new[] { 4, 5, 6 }
};
var masker = new DiscreteActionMasker(bp);

public void MultipleMaskEdit()
{
var bp = new BrainParameters();
bp.vectorActionSpaceType = SpaceType.Discrete;
bp.vectorActionSize = new[] {4, 5, 6};
bp.VectorActionSpaceType = SpaceType.Discrete;
bp.VectorActionSize = new[] {4, 5, 6};
var masker = new DiscreteActionMasker(bp);
masker.SetMask(0, new[] {0, 1});
masker.SetMask(0, new[] {3});

8
com.unity.ml-agents/Tests/Editor/EditModeTestInternalBrainTensorGenerator.cs


{
var goA = new GameObject("goA");
var bpA = goA.AddComponent<BehaviorParameters>();
bpA.brainParameters.vectorObservationSize = 3;
bpA.brainParameters.numStackedVectorObservations = 1;
bpA.BrainParameters.VectorObservationSize = 3;
bpA.BrainParameters.NumStackedVectorObservations = 1;
bpB.brainParameters.vectorObservationSize = 3;
bpB.brainParameters.numStackedVectorObservations = 1;
bpB.BrainParameters.VectorObservationSize = 3;
bpB.BrainParameters.NumStackedVectorObservations = 1;
var agentB = goB.AddComponent<TestAgent>();
var agents = new List<TestAgent> { agentA, agentB };

35
com.unity.ml-agents/Tests/Editor/MLAgentsEditModeTest.cs


internal class TestPolicy : IPolicy
{
public Action OnRequestDecision;
private WriteAdapter m_Adapter = new WriteAdapter();
ObservationWriter m_ObsWriter = new ObservationWriter();
sensor.GetObservationProto(m_Adapter);
sensor.GetObservationProto(m_ObsWriter);
}
OnRequestDecision?.Invoke();
}

public override void Heuristic(float[] actionsOut)
{
var obs = GetObservations();
actionsOut[0] = obs[0];
heuristicCalls++;
}
}

return new[] { 0 };
}
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
{
numWriteCalls++;
// No-op

Assert.AreEqual(0, aca.EpisodeCount);
Assert.AreEqual(0, aca.StepCount);
Assert.AreEqual(0, aca.TotalStepCount);
Assert.AreNotEqual(null, SideChannelUtils.GetSideChannel<FloatPropertiesChannel>());
Assert.AreNotEqual(null, SideChannelsManager.GetSideChannel<EnvironmentParametersChannel>());
Assert.AreNotEqual(null, SideChannelsManager.GetSideChannel<EngineConfigurationChannel>());
Assert.AreNotEqual(null, SideChannelsManager.GetSideChannel<StatsSideChannel>());
// Check that Dispose is idempotent
aca.Dispose();

[Test]
public void TestAcademyDispose()
{
var floatProperties1 = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
var envParams1 = SideChannelsManager.GetSideChannel<EnvironmentParametersChannel>();
var engineParams1 = SideChannelsManager.GetSideChannel<EngineConfigurationChannel>();
var statsParams1 = SideChannelsManager.GetSideChannel<StatsSideChannel>();
var floatProperties2 = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
var envParams2 = SideChannelsManager.GetSideChannel<EnvironmentParametersChannel>();
var engineParams2 = SideChannelsManager.GetSideChannel<EngineConfigurationChannel>();
var statsParams2 = SideChannelsManager.GetSideChannel<StatsSideChannel>();
Assert.AreNotEqual(floatProperties1, floatProperties2);
Assert.AreNotEqual(envParams1, envParams2);
Assert.AreNotEqual(engineParams1, engineParams2);
Assert.AreNotEqual(statsParams1, statsParams2);
}
[Test]

var agentGo1 = new GameObject("TestAgent");
agentGo1.AddComponent<TestAgent>();
var behaviorParameters = agentGo1.GetComponent<BehaviorParameters>();
behaviorParameters.brainParameters.numStackedVectorObservations = 3;
behaviorParameters.BrainParameters.NumStackedVectorObservations = 3;
var agent1 = agentGo1.GetComponent<TestAgent>();
var aca = Academy.Instance;
agent1.LazyInitialize();

decisionRequester.Awake();
agent1.maxStep = 20;
agent1.MaxStep = 20;
agent2.LazyInitialize();
agent1.LazyInitialize();

for (var i = 0; i < 50; i++)
{
expectedAgent1ActionForEpisode += 1;
if (expectedAgent1ActionForEpisode == agent1.maxStep || i == 0)
if (expectedAgent1ActionForEpisode == agent1.MaxStep || i == 0)
{
expectedAgent1ActionForEpisode = 0;
}

decisionRequester.Awake();
const int maxStep = 6;
agent1.maxStep = maxStep;
agent1.MaxStep = maxStep;
agent1.LazyInitialize();
var expectedAgentStepCount = 0;

Assert.AreEqual(numSteps, agent1.heuristicCalls);
Assert.AreEqual(numSteps, agent1.sensor1.numWriteCalls);
Assert.AreEqual(numSteps, agent1.sensor2.numCompressedCalls);
// Make sure the Heuristic method read the observation and set the action
Assert.AreEqual(agent1.collectObservationsCallsForEpisode, agent1.GetAction()[0]);
}
}

18
com.unity.ml-agents/Tests/Editor/ModelRunnerTest.cs


private BrainParameters GetContinuous2vis8vec2actionBrainParameters()
{
var validBrainParameters = new BrainParameters();
validBrainParameters.vectorObservationSize = 8;
validBrainParameters.vectorActionSize = new int[] { 2 };
validBrainParameters.numStackedVectorObservations = 1;
validBrainParameters.vectorActionSpaceType = SpaceType.Continuous;
validBrainParameters.VectorObservationSize = 8;
validBrainParameters.VectorActionSize = new int[] { 2 };
validBrainParameters.NumStackedVectorObservations = 1;
validBrainParameters.VectorActionSpaceType = SpaceType.Continuous;
return validBrainParameters;
}

validBrainParameters.vectorObservationSize = 0;
validBrainParameters.vectorActionSize = new int[] { 2, 3 };
validBrainParameters.numStackedVectorObservations = 1;
validBrainParameters.vectorActionSpaceType = SpaceType.Discrete;
validBrainParameters.VectorObservationSize = 0;
validBrainParameters.VectorActionSize = new int[] { 2, 3 };
validBrainParameters.NumStackedVectorObservations = 1;
validBrainParameters.VectorActionSpaceType = SpaceType.Discrete;
return validBrainParameters;
}

Assert.IsNotNull(modelRunner.GetAction(1));
Assert.IsNotNull(modelRunner.GetAction(2));
Assert.IsNull(modelRunner.GetAction(3));
Assert.AreEqual(brainParameters.vectorActionSize.Count(), modelRunner.GetAction(1).Count());
Assert.AreEqual(brainParameters.VectorActionSize.Count(), modelRunner.GetAction(1).Count());
modelRunner.Dispose();
}
}

34
com.unity.ml-agents/Tests/Editor/ParameterLoaderTest.cs


return new int[] {m_Height, m_Width, m_Channels };
}
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
adapter[i] = 0.0f;
writer[i] = 0.0f;
}
return m_Width * m_Height * m_Channels;
}

private BrainParameters GetContinuous2vis8vec2actionBrainParameters()
{
var validBrainParameters = new BrainParameters();
validBrainParameters.vectorObservationSize = 8;
validBrainParameters.vectorActionSize = new int[] { 2 };
validBrainParameters.numStackedVectorObservations = 1;
validBrainParameters.vectorActionSpaceType = SpaceType.Continuous;
validBrainParameters.VectorObservationSize = 8;
validBrainParameters.VectorActionSize = new int[] { 2 };
validBrainParameters.NumStackedVectorObservations = 1;
validBrainParameters.VectorActionSpaceType = SpaceType.Continuous;
return validBrainParameters;
}

validBrainParameters.vectorObservationSize = 0;
validBrainParameters.vectorActionSize = new int[] { 2, 3 };
validBrainParameters.numStackedVectorObservations = 1;
validBrainParameters.vectorActionSpaceType = SpaceType.Discrete;
validBrainParameters.VectorObservationSize = 0;
validBrainParameters.VectorActionSize = new int[] { 2, 3 };
validBrainParameters.NumStackedVectorObservations = 1;
validBrainParameters.VectorActionSpaceType = SpaceType.Discrete;
return validBrainParameters;
}

var model = ModelLoader.Load(continuous2vis8vec2actionModel);
var brainParameters = GetContinuous2vis8vec2actionBrainParameters();
brainParameters.vectorObservationSize = 9; // Invalid observation
brainParameters.VectorObservationSize = 9; // Invalid observation
brainParameters.numStackedVectorObservations = 2;// Invalid stacking
brainParameters.NumStackedVectorObservations = 2;// Invalid stacking
errors = BarracudaModelParamLoader.CheckModel(model, brainParameters, new SensorComponent[] { sensor_21_20_3, sensor_20_22_3 });
Assert.Greater(errors.Count(), 0);
}

var model = ModelLoader.Load(discrete1vis0vec_2_3action_recurrModel);
var brainParameters = GetDiscrete1vis0vec_2_3action_recurrModelBrainParameters();
brainParameters.vectorObservationSize = 1; // Invalid observation
brainParameters.VectorObservationSize = 1; // Invalid observation
var errors = BarracudaModelParamLoader.CheckModel(model, brainParameters, new SensorComponent[] { sensor_21_20_3 });
Assert.Greater(errors.Count(), 0);
}

var model = ModelLoader.Load(continuous2vis8vec2actionModel);
var brainParameters = GetContinuous2vis8vec2actionBrainParameters();
brainParameters.vectorActionSize = new int[] { 3 }; // Invalid action
brainParameters.VectorActionSize = new int[] { 3 }; // Invalid action
brainParameters.vectorActionSpaceType = SpaceType.Discrete;// Invalid SpaceType
brainParameters.VectorActionSpaceType = SpaceType.Discrete;// Invalid SpaceType
errors = BarracudaModelParamLoader.CheckModel(model, brainParameters, new SensorComponent[] { sensor_21_20_3, sensor_20_22_3 });
Assert.Greater(errors.Count(), 0);
}

var model = ModelLoader.Load(discrete1vis0vec_2_3action_recurrModel);
var brainParameters = GetDiscrete1vis0vec_2_3action_recurrModelBrainParameters();
brainParameters.vectorActionSize = new int[] { 3, 3 }; // Invalid action
brainParameters.VectorActionSize = new int[] { 3, 3 }; // Invalid action
brainParameters.vectorActionSpaceType = SpaceType.Continuous;// Invalid SpaceType
brainParameters.VectorActionSpaceType = SpaceType.Continuous;// Invalid SpaceType
errors = BarracudaModelParamLoader.CheckModel(model, brainParameters, new SensorComponent[] { sensor_21_20_3 });
Assert.Greater(errors.Count(), 0);
}

42
com.unity.ml-agents/Tests/Editor/PublicAPI/PublicApiValidation.cs


var height = 16;
var sensorComponent = gameObject.AddComponent<CameraSensorComponent>();
sensorComponent.camera = Camera.main;
sensorComponent.sensorName = "camera1";
sensorComponent.width = width;
sensorComponent.height = height;
sensorComponent.grayscale = true;
sensorComponent.Camera = Camera.main;
sensorComponent.SensorName = "camera1";
sensorComponent.Width = width;
sensorComponent.Height = height;
sensorComponent.Grayscale = true;
Assert.AreEqual("camera1", sensorComponent.sensorName);
Assert.AreEqual(width, sensorComponent.width);
Assert.AreEqual(height, sensorComponent.height);
Assert.IsTrue(sensorComponent.grayscale);
Assert.AreEqual("camera1", sensorComponent.SensorName);
Assert.AreEqual(width, sensorComponent.Width);
Assert.AreEqual(height, sensorComponent.Height);
Assert.IsTrue(sensorComponent.Grayscale);
}
[Test]

var width = 24;
var height = 16;
var texture = new RenderTexture(width, height, 0);
sensorComponent.renderTexture = texture;
sensorComponent.sensorName = "rtx1";
sensorComponent.grayscale = true;
sensorComponent.RenderTexture = texture;
sensorComponent.SensorName = "rtx1";
sensorComponent.Grayscale = true;
Assert.AreEqual("rtx1", sensorComponent.sensorName);
Assert.IsTrue(sensorComponent.grayscale);
Assert.AreEqual("rtx1", sensorComponent.SensorName);
Assert.IsTrue(sensorComponent.Grayscale);
}
[Test]

var sensorComponent = gameObject.AddComponent<RayPerceptionSensorComponent3D>();
sensorComponent.sensorName = "ray3d";
sensorComponent.detectableTags = new List<string> { "Player", "Respawn" };
sensorComponent.raysPerDirection = 3;
sensorComponent.maxRayDegrees = 30;
sensorComponent.sphereCastRadius = .1f;
sensorComponent.rayLayerMask = 0;
sensorComponent.observationStacks = 2;
sensorComponent.SensorName = "ray3d";
sensorComponent.DetectableTags = new List<string> { "Player", "Respawn" };
sensorComponent.RaysPerDirection = 3;
sensorComponent.MaxRayDegrees = 30;
sensorComponent.SphereCastRadius = .1f;
sensorComponent.RayLayerMask = 0;
sensorComponent.ObservationStacks = 2;
sensorComponent.CreateSensor();
}

10
com.unity.ml-agents/Tests/Editor/Sensor/CameraSensorComponentTest.cs


var agentGameObj = new GameObject("agent");
var cameraComponent = agentGameObj.AddComponent<CameraSensorComponent>();
cameraComponent.camera = camera;
cameraComponent.height = height;
cameraComponent.width = width;
cameraComponent.grayscale = grayscale;
cameraComponent.compression = compression;
cameraComponent.Camera = camera;
cameraComponent.Height = height;
cameraComponent.Width = width;
cameraComponent.Grayscale = grayscale;
cameraComponent.CompressionType = compression;
var expectedShape = new[] { height, width, grayscale ? 1 : 3 };
Assert.AreEqual(expectedShape, cameraComponent.GetObservationShape());

4
com.unity.ml-agents/Tests/Editor/Sensor/CameraSensorTest.cs


var camera = Camera.main;
var sensor = new CameraSensor(camera, width, height, grayscale, "TestCameraSensor", compression);
var writeAdapter = new WriteAdapter();
var obs = sensor.GetObservationProto(writeAdapter);
var obsWriter = new ObservationWriter();
var obs = sensor.GetObservationProto(obsWriter);
Assert.AreEqual((int) compression, (int) obs.CompressionType);
var expectedShape = new[] { height, width, grayscale ? 1 : 3 };

6
com.unity.ml-agents/Tests/Editor/Sensor/FloatVisualSensorTests.cs


return null;
}
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
{
using (TimerStack.Instance.Scoped("Float2DSensor.Write"))
{

{
adapter[h, w, 0] = floatData[h, w];
writer[h, w, 0] = floatData[h, w];
}
}
var numWritten = Height * Width;

}
var output = new float[12];
var writer = new WriteAdapter();
var writer = new ObservationWriter();
writer.SetTarget(output, sensor.GetObservationShape(), 0);
sensor.Write(writer);
for (var i = 0; i < 9; i++)

88
com.unity.ml-agents/Tests/Editor/Sensor/RayPerceptionSensorTests.cs


var obj = new GameObject("agent");
var perception = obj.AddComponent<RayPerceptionSensorComponent3D>();
perception.raysPerDirection = 1;
perception.maxRayDegrees = 45;
perception.rayLength = 20;
perception.detectableTags = new List<string>();
perception.detectableTags.Add(k_CubeTag);
perception.detectableTags.Add(k_SphereTag);
perception.RaysPerDirection = 1;
perception.MaxRayDegrees = 45;
perception.RayLength = 20;
perception.DetectableTags = new List<string>();
perception.DetectableTags.Add(k_CubeTag);
perception.DetectableTags.Add(k_SphereTag);
perception.sphereCastRadius = castRadius;
perception.SphereCastRadius = castRadius;
var expectedObs = (2 * perception.raysPerDirection + 1) * (perception.detectableTags.Count + 2);
var expectedObs = (2 * perception.RaysPerDirection + 1) * (perception.DetectableTags.Count + 2);
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
writer.SetTarget(outputBuffer, sensor.GetObservationShape(), 0);
var numWritten = sensor.Write(writer);

// Hit is at z=9.0 in world space, ray length is 20
Assert.That(
outputBuffer[3], Is.EqualTo((9.5f - castRadius) / perception.rayLength).Within(.0005f)
outputBuffer[3], Is.EqualTo((9.5f - castRadius) / perception.RayLength).Within(.0005f)
);
// Spheres are at 5,0,5 and 5,0,-5, so 5*sqrt(2) units from origin

Assert.AreEqual(0.0f, outputBuffer[5]); // missed sphere
Assert.AreEqual(0.0f, outputBuffer[6]); // hit unknown tag -> all 0
Assert.That(
outputBuffer[7], Is.EqualTo(expectedHitLengthWorldSpace / perception.rayLength).Within(.0005f)
outputBuffer[7], Is.EqualTo(expectedHitLengthWorldSpace / perception.RayLength).Within(.0005f)
);
Assert.AreEqual(0.0f, outputBuffer[8]); // missed cube

outputBuffer[11], Is.EqualTo(expectedHitLengthWorldSpace / perception.rayLength).Within(.0005f)
outputBuffer[11], Is.EqualTo(expectedHitLengthWorldSpace / perception.RayLength).Within(.0005f)
);
}
}

var obj = new GameObject("agent");
var perception = obj.AddComponent<RayPerceptionSensorComponent3D>();
perception.raysPerDirection = 0;
perception.maxRayDegrees = 45;
perception.rayLength = 20;
perception.detectableTags = new List<string>();
perception.detectableTags.Add(k_CubeTag);
perception.detectableTags.Add(k_SphereTag);
perception.RaysPerDirection = 0;
perception.MaxRayDegrees = 45;
perception.RayLength = 20;
perception.DetectableTags = new List<string>();
perception.DetectableTags.Add(k_CubeTag);
perception.DetectableTags.Add(k_SphereTag);
var expectedObs = (2 * perception.raysPerDirection + 1) * (perception.detectableTags.Count + 2);
var expectedObs = (2 * perception.RaysPerDirection + 1) * (perception.DetectableTags.Count + 2);
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
writer.SetTarget(outputBuffer, sensor.GetObservationShape(), 0);
var numWritten = sensor.Write(writer);

var obj = new GameObject("agent");
var perception = obj.AddComponent<RayPerceptionSensorComponent3D>();
perception.raysPerDirection = 0;
perception.rayLength = 20;
perception.detectableTags = new List<string>();
perception.RaysPerDirection = 0;
perception.RayLength = 20;
perception.DetectableTags = new List<string>();
var filterCubeLayers = new[] { false, true };
foreach (var filterCubeLayer in filterCubeLayers)

{
layerMask &= ~(1 << cubeFiltered.layer);
}
perception.rayLayerMask = layerMask;
perception.RayLayerMask = layerMask;
var expectedObs = (2 * perception.raysPerDirection + 1) * (perception.detectableTags.Count + 2);
var expectedObs = (2 * perception.RaysPerDirection + 1) * (perception.DetectableTags.Count + 2);
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
writer.SetTarget(outputBuffer, sensor.GetObservationShape(), 0);
var numWritten = sensor.Write(writer);

{
// Hit the far cube because close was filtered.
Assert.That(outputBuffer[outputBuffer.Length - 1],
Is.EqualTo((9.5f - perception.sphereCastRadius) / perception.rayLength).Within(.0005f)
Is.EqualTo((9.5f - perception.SphereCastRadius) / perception.RayLength).Within(.0005f)
);
}
else

Is.EqualTo((4.5f - perception.sphereCastRadius) / perception.rayLength).Within(.0005f)
Is.EqualTo((4.5f - perception.SphereCastRadius) / perception.RayLength).Within(.0005f)
);
}
}

var perception = obj.AddComponent<RayPerceptionSensorComponent3D>();
obj.transform.localScale = new Vector3(2, 2, 2);
perception.raysPerDirection = 0;
perception.maxRayDegrees = 45;
perception.rayLength = 20;
perception.detectableTags = new List<string>();
perception.detectableTags.Add(k_CubeTag);
perception.RaysPerDirection = 0;
perception.MaxRayDegrees = 45;
perception.RayLength = 20;
perception.DetectableTags = new List<string>();
perception.DetectableTags.Add(k_CubeTag);
perception.sphereCastRadius = castRadius;
perception.SphereCastRadius = castRadius;
var expectedObs = (2 * perception.raysPerDirection + 1) * (perception.detectableTags.Count + 2);
var expectedObs = (2 * perception.RaysPerDirection + 1) * (perception.DetectableTags.Count + 2);
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
writer.SetTarget(outputBuffer, sensor.GetObservationShape(), 0);
var numWritten = sensor.Write(writer);

// Hit is at z=9.0 in world space, ray length was 20
// But scale increases the cast size and the ray length
var scaledRayLength = 2 * perception.rayLength;
var scaledRayLength = 2 * perception.RayLength;
var scaledCastRadius = 2 * castRadius;
Assert.That(
outputBuffer[2], Is.EqualTo((9.5f - scaledCastRadius) / scaledRayLength).Within(.0005f)

var obj = new GameObject("agent");
var perception = obj.AddComponent<RayPerceptionSensorComponent3D>();
perception.raysPerDirection = 0;
perception.rayLength = 0.0f;
perception.sphereCastRadius = .5f;
perception.detectableTags = new List<string>();
perception.detectableTags.Add(k_CubeTag);
perception.RaysPerDirection = 0;
perception.RayLength = 0.0f;
perception.SphereCastRadius = .5f;
perception.DetectableTags = new List<string>();
perception.DetectableTags.Add(k_CubeTag);
var expectedObs = (2 * perception.raysPerDirection + 1) * (perception.detectableTags.Count + 2);
var expectedObs = (2 * perception.RaysPerDirection + 1) * (perception.DetectableTags.Count + 2);
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
writer.SetTarget(outputBuffer, sensor.GetObservationShape(), 0);
var numWritten = sensor.Write(writer);

6
com.unity.ml-agents/Tests/Editor/Sensor/RenderTextureSensorComponentTests.cs


var agentGameObj = new GameObject("agent");
var renderTexComponent = agentGameObj.AddComponent<RenderTextureSensorComponent>();
renderTexComponent.renderTexture = texture;
renderTexComponent.grayscale = grayscale;
renderTexComponent.compression = compression;
renderTexComponent.RenderTexture = texture;
renderTexComponent.Grayscale = grayscale;
renderTexComponent.CompressionType = compression;
var expectedShape = new[] { height, width, grayscale ? 1 : 3 };
Assert.AreEqual(expectedShape, renderTexComponent.GetObservationShape());

4
com.unity.ml-agents/Tests/Editor/Sensor/RenderTextureSensorTests.cs


var texture = new RenderTexture(width, height, 0);
var sensor = new RenderTextureSensor(texture, grayscale, "TestCameraSensor", compression);
var writeAdapter = new WriteAdapter();
var obs = sensor.GetObservationProto(writeAdapter);
var obsWriter = new ObservationWriter();
var obs = sensor.GetObservationProto(obsWriter);
Assert.AreEqual((int)compression, (int)obs.CompressionType);
var expectedShape = new[] { height, width, grayscale ? 1 : 3 };

2
com.unity.ml-agents/Tests/Editor/Sensor/SensorShapeValidatorTests.cs


return null;
}
public int Write(WriteAdapter adapter)
public int Write(ObservationWriter writer)
{
return this.ObservationSize();
}

4
com.unity.ml-agents/Tests/Editor/Sensor/VectorSensorTests.cs


}
Assert.AreEqual(fill, output[0]);
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
// Make sure WriteAdapter didn't touch anything
// Make sure ObservationWriter didn't touch anything
Assert.AreEqual(fill, output[0]);
sensor.Write(writer);

8
com.unity.ml-agents/Tests/Editor/Sensor/ObservationWriterTests.cs


namespace MLAgents.Tests
{
public class WriteAdapterTests
public class ObservationWriterTests
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
var buffer = new[] { 0f, 0f, 0f };
var shape = new[] { 3 };

[Test]
public void TestWritesToTensor()
{
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
var t = new TensorProxy
{
valueType = TensorProxy.TensorType.FloatingPoint,

[Test]
public void TestWritesToTensor3D()
{
WriteAdapter writer = new WriteAdapter();
ObservationWriter writer = new ObservationWriter();
var t = new TensorProxy
{
valueType = TensorProxy.TensorType.FloatingPoint,

32
com.unity.ml-agents/Tests/Editor/SideChannelTests.cs


ChannelId = new Guid("6afa2c06-4f82-11ea-b238-784f4387d1f7");
}
public override void OnMessageReceived(IncomingMessage msg)
protected override void OnMessageReceived(IncomingMessage msg)
{
messagesReceived.Add(msg.ReadInt32());
}

intSender.SendInt(5);
intSender.SendInt(6);
byte[] fakeData = SideChannelUtils.GetSideChannelMessage(dictSender);
SideChannelUtils.ProcessSideChannelData(dictReceiver, fakeData);
byte[] fakeData = SideChannelsManager.GetSideChannelMessage(dictSender);
SideChannelsManager.ProcessSideChannelData(dictReceiver, fakeData);
Assert.AreEqual(intReceiver.messagesReceived[0], 4);
Assert.AreEqual(intReceiver.messagesReceived[1], 5);

strSender.SendRawBytes(Encoding.ASCII.GetBytes(str1));
strSender.SendRawBytes(Encoding.ASCII.GetBytes(str2));
byte[] fakeData = SideChannelUtils.GetSideChannelMessage(dictSender);
SideChannelUtils.ProcessSideChannelData(dictReceiver, fakeData);
byte[] fakeData = SideChannelsManager.GetSideChannelMessage(dictSender);
SideChannelsManager.ProcessSideChannelData(dictReceiver, fakeData);
var messages = strReceiver.GetAndClearReceivedMessages();

var dictSender = new Dictionary<Guid, SideChannel> { { propB.ChannelId, propB } };
propA.RegisterCallback(k1, f => { wasCalled++; });
var tmp = propB.GetPropertyWithDefault(k2, 3.0f);
var tmp = propB.GetWithDefault(k2, 3.0f);
propB.SetProperty(k2, 1.0f);
tmp = propB.GetPropertyWithDefault(k2, 3.0f);
propB.Set(k2, 1.0f);
tmp = propB.GetWithDefault(k2, 3.0f);
byte[] fakeData = SideChannelUtils.GetSideChannelMessage(dictSender);
SideChannelUtils.ProcessSideChannelData(dictReceiver, fakeData);
byte[] fakeData = SideChannelsManager.GetSideChannelMessage(dictSender);
SideChannelsManager.ProcessSideChannelData(dictReceiver, fakeData);
tmp = propA.GetPropertyWithDefault(k2, 3.0f);
tmp = propA.GetWithDefault(k2, 3.0f);
propB.SetProperty(k1, 1.0f);
propB.Set(k1, 1.0f);
fakeData = SideChannelUtils.GetSideChannelMessage(dictSender);
SideChannelUtils.ProcessSideChannelData(dictReceiver, fakeData);
fakeData = SideChannelsManager.GetSideChannelMessage(dictSender);
SideChannelsManager.ProcessSideChannelData(dictReceiver, fakeData);
var keysA = propA.ListProperties();
var keysA = propA.Keys();
var keysB = propA.ListProperties();
var keysB = propA.Keys();
Assert.AreEqual(2, keysB.Count);
Assert.IsTrue(keysB.Contains(k1));
Assert.IsTrue(keysB.Contains(k2));

33
com.unity.ml-agents/Tests/Runtime/RuntimeAPITest.cs


[UnityTest]
public IEnumerator RuntimeApiTestWithEnumeratorPasses()
{
Academy.Instance.InferenceSeed = 1337;
behaviorParams.brainParameters.vectorObservationSize = 3;
behaviorParams.brainParameters.numStackedVectorObservations = 2;
behaviorParams.brainParameters.vectorActionDescriptions = new[] { "TestActionA", "TestActionB" };
behaviorParams.brainParameters.vectorActionSize = new[] { 2, 2 };
behaviorParams.brainParameters.vectorActionSpaceType = SpaceType.Discrete;
behaviorParams.behaviorName = "TestBehavior";
behaviorParams.BrainParameters.VectorObservationSize = 3;
behaviorParams.BrainParameters.NumStackedVectorObservations = 2;
behaviorParams.BrainParameters.VectorActionDescriptions = new[] { "TestActionA", "TestActionB" };
behaviorParams.BrainParameters.VectorActionSize = new[] { 2, 2 };
behaviorParams.BrainParameters.VectorActionSpaceType = SpaceType.Discrete;
behaviorParams.BehaviorName = "TestBehavior";
behaviorParams.useChildSensors = true;
behaviorParams.UseChildSensors = true;
behaviorParams.behaviorType = BehaviorType.Default;
behaviorParams.BehaviorType = BehaviorType.Default;
sensorComponent.sensorName = "ray3d";
sensorComponent.detectableTags = new List<string> { "Player", "Respawn" };
sensorComponent.raysPerDirection = 3;
sensorComponent.SensorName = "ray3d";
sensorComponent.DetectableTags = new List<string> { "Player", "Respawn" };
sensorComponent.RaysPerDirection = 3;
// Make a StackingSensor that wraps the RayPerceptionSensorComponent3D
// This isn't necessarily practical, just to ensure that it can be done

// ISensor isn't set up yet.
Assert.IsNull(sensorComponent.raySensor);
Assert.IsNull(sensorComponent.RaySensor);
behaviorParams.behaviorType = BehaviorType.HeuristicOnly;
behaviorParams.BehaviorType = BehaviorType.HeuristicOnly;
// Agent needs to be added after everything else is setup.
var agent = gameObject.AddComponent<PublicApiAgent>();

// Initialization should set up the sensors
Assert.IsNotNull(sensorComponent.raySensor);
Assert.IsNotNull(sensorComponent.RaySensor);
var otherDevice = behaviorParams.inferenceDevice == InferenceDevice.CPU ? InferenceDevice.GPU : InferenceDevice.CPU;
agent.SetModel(behaviorParams.behaviorName, behaviorParams.model, otherDevice);
var otherDevice = behaviorParams.InferenceDevice == InferenceDevice.CPU ? InferenceDevice.GPU : InferenceDevice.CPU;
agent.SetModel(behaviorParams.BehaviorName, behaviorParams.Model, otherDevice);
agent.AddReward(1.0f);

4
com.unity.ml-agents/package.json


{
"name": "com.unity.ml-agents",
"displayName": "ML Agents",
"version": "0.15.1-preview",
"version": "1.0.0-preview",
}
}

30
config/trainer_config.yaml


strength: 1.0
gamma: 0.995
WormDynamic:
normalize: true
num_epoch: 3
time_horizon: 1000
batch_size: 2024
buffer_size: 20240
max_steps: 3.5e6
summary_freq: 30000
num_layers: 3
hidden_units: 512
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.995
WormStatic:
normalize: true
num_epoch: 3
time_horizon: 1000
batch_size: 2024
buffer_size: 20240
max_steps: 3.5e6
summary_freq: 30000
num_layers: 3
hidden_units: 512
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.995
Walker:
normalize: true
num_epoch: 3

2
docs/API-Reference.md


doxygen dox-ml-agents.conf
```
`dox-ml-agents.conf` is a Doxygen configuration file for the ML-Agents toolkit
`dox-ml-agents.conf` is a Doxygen configuration file for the ML-Agents Toolkit
that includes the classes that have been properly formatted. The generated HTML
files will be placed in the `html/` subdirectory. Open `index.html` within that
subdirectory to navigate to the API reference home. Note that `html/` is already

10
docs/Custom-SideChannels.md


You can create your own side channel in C# and Python and use it to communicate
custom data structures between the two. This can be useful for situations in
which the data to be sent is too complex or structured for the built-in
`FloatPropertiesChannel`, or is not related to any specific agent, and therefore
`EnvironmentParameters`, or is not related to any specific agent, and therefore
inappropriate as an agent observation.
## Overview

`base.QueueMessageToSend(msg)` method inside the side channel, and call the
`OutgoingMessage.Dispose()` method.
To register a side channel on the Unity side, call `SideChannelUtils.RegisterSideChannel` with the side channel
To register a side channel on the Unity side, call `SideChannelManager.RegisterSideChannel` with the side channel
as only argument.
### Python side

// When a Debug.Log message is created, we send it to the stringChannel
Application.logMessageReceived += stringChannel.SendDebugStatementToPython;
// The channel must be registered with the SideChannelUtils class
SideChannelUtils.RegisterSideChannel(stringChannel);
// The channel must be registered with the SideChannelManager class
SideChannelManager.RegisterSideChannel(stringChannel);
}
public void OnDestroy()

if (Academy.IsInitialized){
SideChannelUtils.UnregisterSideChannel(stringChannel);
SideChannelManager.UnregisterSideChannel(stringChannel);
}
}

4
docs/FAQ.md


- You're using 32-bit python instead of 64-bit. See the answer
[here](https://stackoverflow.com/a/1405971/224264) for how to tell which you
have installed.
- You're using python 3.8. Tensorflow plans to release packages for this as soon
as possible; see
[this issue](https://github.com/tensorflow/tensorflow/issues/33374) for more
details.
- You have the `tensorflow-gpu` package installed. This is equivalent to
`tensorflow`, however `pip` doesn't recognize this. The best way to resolve
this is to update to `tensorflow==1.15.0` which provides GPU support in the

6
docs/Learning-Environment-Design.md


the agent learns during training, it optimizes its decision making so that it
receives the maximum reward over time.
The ML-Agents toolkit uses a reinforcement learning technique called
The ML-Agents Toolkit uses a reinforcement learning technique called
[Proximal Policy Optimization (PPO)](https://blog.openai.com/openai-baselines-ppo/).
PPO uses a neural network to approximate the ideal function that maps an agent's
observations to the best action an agent can take in a given state. The

## Organizing the Unity Scene
To train and use the ML-Agents toolkit in a Unity scene, the scene as many Agent subclasses as you need.
To train and use the ML-Agents Toolkit in a Unity scene, the scene as many Agent subclasses as you need.
Agent instances should be attached to the GameObject representing that Agent.
### Academy

## Environments
An _environment_ in the ML-Agents toolkit can be any scene built in Unity. The
An _environment_ in the ML-Agents Toolkit can be any scene built in Unity. The
Unity scene provides the environment in which agents observe, act, and learn.
How you set up the Unity scene to serve as a learning environment really depends
on your goal. You may be trying to solve a specific reinforcement learning

25
docs/Learning-Environment-Examples.md


- Benchmark Mean Reward for `CrawlerStaticTarget`: 2000
- Benchmark Mean Reward for `CrawlerDynamicTarget`: 400
## Worm
![Worm](images/worm.png)
* Set-up: A worm with a head and 3 body segments.
* Goal: The agents must move its body toward the goal direction.
* `WormStaticTarget` - Goal direction is always forward.
* `WormDynamicTarget`- Goal direction is randomized.
* Agents: The environment contains 10 agents with same Behavior Parameters.
* Agent Reward Function (independent):
* +0.01 times body velocity in the goal direction.
* +0.01 times body direction alignment with goal direction.
* Behavior Parameters:
* Vector Observation space: 57 variables corresponding to position, rotation,
velocity, and angular velocities of each limb plus the acceleration and
angular acceleration of the body.
* Vector Action space: (Continuous) Size of 9, corresponding to target
rotations for joints.
* Visual Observations: None
* Float Properties: None
* Benchmark Mean Reward for `WormStaticTarget`: 200
* Benchmark Mean Reward for `WormDynamicTarget`: 150
## Food Collector
![Collector](images/foodCollector.png)

Behavior Parameters : SoccerTwos.
- Agent Reward Function (dependent):
- (1 - `accumulated time penalty`) When ball enters opponent's goal `accumulated time penalty` is incremented by
(1 / `maxStep`) every fixed update and is reset to 0 at the beginning of an episode.
(1 / `MaxStep`) every fixed update and is reset to 0 at the beginning of an episode.
- -1 When ball enters team's goal.
- Behavior Parameters:
- Vector Observation space: 336 corresponding to 11 ray-casts forward distributed over 120 degrees

26
docs/ML-Agents-Overview.md


VR/AR games. These trained agents can be used for multiple purposes, including
controlling NPC behavior (in a variety of settings such as multi-agent and
adversarial), automated testing of game builds and evaluating different game
design decisions pre-release. The ML-Agents toolkit is mutually beneficial for
design decisions pre-release. The ML-Agents Toolkit is mutually beneficial for
both game developers and AI researchers as it provides a central platform where
advances in AI can be evaluated on Unity’s rich environments and then made
accessible to the wider research and game developer communities.

transition to the ML-Agents toolkit easier, we provide several background pages
transition to the ML-Agents Toolkit easier, we provide several background pages
that include overviews and helpful resources on the [Unity
Engine](Background-Unity.md), [machine learning](Background-Machine-Learning.md)
and [TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing

The remainder of this page contains a deep dive into ML-Agents, its key
components, different training modes and scenarios. By the end of it, you should
have a good sense of _what_ the ML-Agents toolkit allows you to do. The
have a good sense of _what_ the ML-Agents Toolkit allows you to do. The
subsequent documentation pages provide examples of _how_ to use ML-Agents.
## Running Example: Training NPC Behaviors

**training phase**, while playing the game with an NPC that is using its learned
policy is called the **inference phase**.
The ML-Agents toolkit provides all the necessary tools for using Unity as the
The ML-Agents Toolkit provides all the necessary tools for using Unity as the
environment. In the next few sections, we discuss how the ML-Agents toolkit
environment. In the next few sections, we discuss how the ML-Agents Toolkit
The ML-Agents toolkit is a Unity plugin that contains three high-level
The ML-Agents Toolkit is a Unity plugin that contains three high-level
components:
- **Learning Environment** - which contains the Unity scene and all the game

border="10" />
</p>
_Example block diagram of ML-Agents toolkit for our sample game._
_Example block diagram of ML-Agents Toolkit for our sample game._
We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role
We have yet to discuss how the ML-Agents Toolkit trains behaviors, and what role
the Python API and External Communicator play. Before we dive into those
details, let's summarize the earlier components. Each character is attached to
an Agent, and each Agent has a Behavior. The Behavior can be thought as a function

### Built-in Training and Inference
As mentioned previously, the ML-Agents toolkit ships with several
As mentioned previously, the ML-Agents Toolkit ships with several
implementations of state-of-the-art algorithms for training intelligent agents.
More specifically, during training, all the medics in the
scene send their observations to the Python API through the External

In the previous mode, the Agents were used for training to generate
a TensorFlow model that the Agents can later use. However,
any user of the ML-Agents toolkit can leverage their own algorithms for
any user of the ML-Agents Toolkit can leverage their own algorithms for
training. In this case, the behaviors of all the Agents in the scene
will be controlled within Python.
You can even turn your environment into a [gym.](../gym-unity/README.md)

as the environment gradually increases in complexity. In our example, we can
imagine first training the medic when each team only contains one player, and
then iteratively increasing the number of players (i.e. the environment
complexity). The ML-Agents toolkit supports setting custom environment
complexity). The ML-Agents Toolkit supports setting custom environment
parameters within the Academy. This allows elements of the environment related
to difficulty or complexity to be dynamically adjusted based on training
progress.

## Additional Features
Beyond the flexible training scenarios available, the ML-Agents toolkit includes
Beyond the flexible training scenarios available, the ML-Agents Toolkit includes
additional features which improve the flexibility and interpretability of the
training process.

## Summary and Next Steps
To briefly summarize: The ML-Agents toolkit enables games and simulations built
To briefly summarize: The ML-Agents Toolkit enables games and simulations built
in Unity to serve as the platform for training intelligent agents. It is
designed to enable a large variety of training modes and scenarios and comes
packed with several features to enable researchers and developers to leverage

42
docs/Migrating.md


# Migrating
## Migrating from 0.15 to latest
## Migrating from Release 1 to latest
### Important changes
### Steps to Migrate
## Migrating from 0.15 to Release 1
### Important changes

- The signature of `Agent.Heuristic()` was changed to take a `float[]` as a
parameter, instead of returning the array. This was done to prevent a common
source of error where users would return arrays of the wrong size.
- The SideChannel API has changed (#3833, #3660) :
- Introduced the `SideChannelManager` to register, unregister and access side
channels.
- `EnvironmentParameters` replaces the default `FloatProperties`.
You can access the `EnvironmentParameters` with
`Academy.Instance.EnvironmentParameters` on C# and create an
`EnvironmentParametersChannel` on Python
- `SideChannel.OnMessageReceived` is now a protected method (was public)
- SideChannel IncomingMessages methods now take an optional default argument,
which is used when trying to read more data than the message contains.
- Added a feature to allow sending stats from C# environments to TensorBoard
(and other python StatsWriters). To do this from your code, use
`Academy.Instance.StatsRecorder.Add(key, value)`(#3660)
- Public fields and properties on several classes were renamed to follow Unity's
C# style conventions. All public fields and properties now use "PascalCase"
instead of "camelCase"; for example, `Agent.maxStep` was renamed to
`Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
- `WriteAdapter` was renamed to `ObservationWriter`. (#3834)
### Steps to Migrate

- To force-overwrite files from a pre-existing run, add the `--force`
command-line flag.
- The Jupyter notebooks have been removed from the repository.
- `Academy.FloatProperties` was removed.
- `Academy.RegisterSideChannel` and `Academy.UnregisterSideChannel` were
removed.
- Replace `Academy.FloatProperties` with
`SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()`.
- Replace `Academy.RegisterSideChannel` with
`SideChannelUtils.RegisterSideChannel()`.
- Replace `Academy.UnregisterSideChannel` with
`SideChannelUtils.UnregisterSideChannel`.
- If you used `SideChannels` you must:
- Replace `Academy.FloatProperties` with `Academy.Instance.EnvironmentParameters`.
- `Academy.RegisterSideChannel` and `Academy.UnregisterSideChannel` were
removed. Use `SideChannelManager.RegisterSideChannel` and
`SideChannelManager.UnregisterSideChannel` instead.
- Update uses of "camelCase" fields and properties to "PascalCase".
- If you have a custom `ISensor` implementation, you will need to change the signature of
its `Write()` method to use `ObservationWriter` instead of `WriteAdapter`.
## Migrating from 0.14 to 0.15

36
docs/Python-API.md


`EngineConfigurationChannel` has two methods :
* `set_configuration_parameters` which takes the following arguments:
* `width`: Defines the width of the display. Default 80.
* `height`: Defines the height of the display. Default 80.
* `quality_level`: Defines the quality level of the simulation. Default 1.
* `time_scale`: Defines the multiplier for the deltatime in the simulation. If set to a higher value, time will pass faster in the simulation but the physics may perform unpredictably. Default 20.
* `target_frame_rate`: Instructs simulation to try to render at a specified frame rate. Default -1.
* `width`: Defines the width of the display. (Must be set alongside height)
* `height`: Defines the height of the display. (Must be set alongside width)
* `quality_level`: Defines the quality level of the simulation.
* `time_scale`: Defines the multiplier for the deltatime in the simulation. If set to a higher value, time will pass faster in the simulation but the physics may perform unpredictably.
* `target_frame_rate`: Instructs simulation to try to render at a specified frame rate.
* `capture_frame_rate` Instructs the simulation to consider time between updates to always be constant, regardless of the actual frame rate.
* `set_configuration` with argument config which is an `EngineConfig`
NamedTuple object.

...
```
#### FloatPropertiesChannel
The `FloatPropertiesChannel` will allow you to get and set pre-defined numerical values in the environment. This can be useful for adjusting environment-specific settings, or for reading non-agent related information from the environment. You can call `get_property` and `set_property` on the side channel to read and write properties.
#### EnvironmentParameters
The `EnvironmentParameters` will allow you to get and set pre-defined numerical values in the environment. This can be useful for adjusting environment-specific settings, or for reading non-agent related information from the environment. You can call `get_property` and `set_property` on the side channel to read and write properties.
`FloatPropertiesChannel` has three methods:
`EnvironmentParametersChannel` has one methods:
* `set_property` Sets a property in the Unity Environment.
* `set_float_parameter` Sets a float parameter in the Unity Environment.
* `get_property` Gets a property in the Unity Environment. If the property was not found, will return None.
* key: The string identifier of the property.
* `list_properties` Returns a list of all the string identifiers of the properties
from mlagents_envs.side_channel.float_properties_channel import FloatPropertiesChannel
from mlagents_envs.side_channel.environment_parameters_channel import EnvironmentParametersChannel
channel = FloatPropertiesChannel()
channel = EnvironmentParametersChannel()
channel.set_property("parameter_1", 2.0)
channel.set_float_parameter("parameter_1", 2.0)
readout_value = channel.get_property("parameter_2")
...
```

var sharedProperties = SideChannelUtils.GetSideChannel<FloatPropertiesChannel>();
float property1 = sharedProperties.GetPropertyWithDefault("parameter_1", 0.0f);
var envParameters = Academy.Instance.EnvironmentParameters;
float property1 = envParameters.GetWithDefault("parameter_1", 0.0f);
```
#### Custom side channels

2
docs/Readme.md


## Translations
To make the Unity ML-Agents toolkit accessible to the global research and Unity
To make the Unity ML-Agents Toolkit accessible to the global research and Unity
developer communities, we're attempting to create and maintain translations of
our documentation. We've started with translating a subset of the documentation
to one language (Chinese), but we hope to continue translating more pages and to

32
docs/Training-Curriculum-Learning.md


# Training with Curriculum Learning
## Sample Environment
Curriculum learning is a feature of ML-Agents which allows for the properties of environments to be changed during the training process to aid in learning.
## An Instructional Example
*[**Note**: The example provided below is for instructional purposes, and was based on an early version of the [Wall Jump example environment](Example-Environments.md). As such, it is not possible to directly replicate the results here using that environment.]*
Imagine a task in which an agent needs to scale a wall to arrive at a goal. The
starting point when training an agent to accomplish this task will be a random

then the agent can easily learn to accomplish the task. From there, we can
slowly add to the difficulty of the task by increasing the size of the wall
until the agent can complete the initially near-impossible task of scaling the
wall. We have included an environment to demonstrate this with ML-Agents,
called __Wall Jump__.
wall.
_Demonstration of a curriculum training scenario in which a progressively taller
_Demonstration of a hypothetical curriculum training scenario in which a progressively taller
To see curriculum learning in action, observe the two learning curves below. Each
displays the reward over time for an agent trained using PPO with the same set of
training hyperparameters. The difference is that one agent was trained using the
full-height wall version of the task, and the other agent was trained using the
curriculum version of the task. As you can see, without using curriculum
learning the agent has a lot of difficulty. We think that by using well-crafted
curricula, agents trained using reinforcement learning will be able to
accomplish tasks otherwise much more difficult.
![Log](images/curriculum_progress.png)
## How-To
Each group of Agents under the same `Behavior Name` in an environment can have

In order to define the curricula, the first step is to decide which parameters of
the environment will vary. In the case of the Wall Jump environment,
the height of the wall is what varies. We define this as a `Shared Float Property`
that can be accessed in `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()`, and by doing
the height of the wall is what varies. We define this as a `Environment Parameters`
that can be accessed in `Academy.Instance.EnvironmentParameters`, and by doing
so it becomes adjustable via the Python API.
Rather than adjusting it by hand, we will create a YAML file which
describes the structure of the curricula. Within it, we can specify which

Once we have specified our metacurriculum and curricula, we can launch
`mlagents-learn` using the `–curriculum` flag to point to the config file
for our curricula and PPO will train using Curriculum Learning. For example,
to train agents in the Wall Jump environment with curriculum learning, we can run:
to train agents in the Wall Jump environment with curriculum learning, you can run:
We can then keep track of the current lessons and progresses via TensorBoard.
You can then keep track of the current lessons and progresses via TensorBoard.
__Note__: If you are resuming a training session that uses curriculum, please pass the number of the last-reached lesson using the `--lesson` flag when running `mlagents-learn`.

2
docs/Training-Imitation-Learning.md


width="700" border="0" />
</p>
The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations.
The ML-Agents Toolkit provides two features that enable your agent to learn from demonstrations.
In most scenarios, you can combine these two features.
* GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to

2
docs/Training-PPO.md


Furthermore, we could mix reward signals to help the learning process.
Using `reward_signals` allows you to define [reward signals.](Reward-Signals.md)
The ML-Agents toolkit provides three reward signals by default, the Extrinsic (environment)
The ML-Agents Toolkit provides three reward signals by default, the Extrinsic (environment)
reward signal, the Curiosity reward signal, which can be used to encourage exploration in
sparse extrinsic reward environments, and the GAIL reward signal. Please see [Reward Signals](Reward-Signals.md)
for additional details.

部分文件因为文件数量过多而无法显示

正在加载...
取消
保存