浏览代码

Merge remote-tracking branch 'origin/develop' into try-tf2-support

/develop-gpu-test
Chris Elion 5 年前
当前提交
3d8a70fb
共有 321 个文件被更改,包括 3716 次插入2690 次删除
  1. 67
      .circleci/config.yml
  2. 19
      .pre-commit-config.yaml
  3. 4
      SURVEY.md
  4. 61
      UnitySDK/Assets/ML-Agents/Editor/AgentEditor.cs
  5. 214
      UnitySDK/Assets/ML-Agents/Editor/BrainParametersDrawer.cs
  6. 30
      UnitySDK/Assets/ML-Agents/Editor/DemonstrationDrawer.cs
  7. 2
      UnitySDK/Assets/ML-Agents/Editor/Tests/DemonstrationTests.cs
  8. 26
      UnitySDK/Assets/ML-Agents/Editor/Tests/EditModeTestInternalBrainTensorGenerator.cs
  9. 113
      UnitySDK/Assets/ML-Agents/Editor/Tests/MLAgentsEditModeTest.cs
  10. 29
      UnitySDK/Assets/ML-Agents/Examples/3DBall/Prefabs/3DBall.prefab
  11. 24
      UnitySDK/Assets/ML-Agents/Examples/3DBall/Prefabs/3DBallHardNew.prefab
  12. 47
      UnitySDK/Assets/ML-Agents/Examples/3DBall/Scenes/3DBall.unity
  13. 34
      UnitySDK/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs
  14. 25
      UnitySDK/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs
  15. 24
      UnitySDK/Assets/ML-Agents/Examples/Basic/Prefabs/Basic.prefab
  16. 13
      UnitySDK/Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs
  17. 24
      UnitySDK/Assets/ML-Agents/Examples/Bouncer/Prefabs/Environment.prefab
  18. 10
      UnitySDK/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs
  19. 25
      UnitySDK/Assets/ML-Agents/Examples/Crawler/Prefabs/DynamicPlatform.prefab
  20. 24
      UnitySDK/Assets/ML-Agents/Examples/Crawler/Prefabs/FixedPlatform.prefab
  21. 128
      UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Prefabs/FoodCollectorArea.prefab
  22. 868
      UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Prefabs/VisualFoodCollectorArea.prefab
  23. 92
      UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scenes/VisualFoodCollector.unity
  24. 102
      UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorAgent.cs
  25. 849
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scenes/GridWorld.unity
  26. 113
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAcademy.cs
  27. 32
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs
  28. 24
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Prefabs/SymbolFinderArea.prefab
  29. 43
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Prefabs/VisualSymbolFinderArea.prefab
  30. 79
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/VisualHallway.unity
  31. 57
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Scripts/HallwayAgent.cs
  32. 24
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Prefabs/PushBlockArea.prefab
  33. 37
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Prefabs/PushBlockVisualArea.prefab
  34. 59
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/VisualPushBlock.unity
  35. 21
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scripts/PushAgentBasic.cs
  36. 24
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Prefabs/AreaPB.prefab
  37. 55
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Prefabs/VisualAreaPyramids.prefab
  38. 73
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scenes/VisualPyramids.unity
  39. 65
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scripts/PyramidAgent.cs
  40. 24
      UnitySDK/Assets/ML-Agents/Examples/Reacher/Prefabs/Agent.prefab
  41. 9
      UnitySDK/Assets/ML-Agents/Examples/SharedAssets/Scripts/RayPerception.cs
  42. 9
      UnitySDK/Assets/ML-Agents/Examples/SharedAssets/Scripts/RayPerception2D.cs
  43. 4
      UnitySDK/Assets/ML-Agents/Examples/SharedAssets/Scripts/RayPerception3D.cs
  44. 96
      UnitySDK/Assets/ML-Agents/Examples/Soccer/Prefabs/SoccerFieldTwos.prefab
  45. 48
      UnitySDK/Assets/ML-Agents/Examples/Tennis/Prefabs/TennisArea.prefab
  46. 9
      UnitySDK/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs
  47. 24
      UnitySDK/Assets/ML-Agents/Examples/Walker/Prefabs/WalkerPair.prefab
  48. 30
      UnitySDK/Assets/ML-Agents/Examples/WallJump/Prefabs/WallJumpArea.prefab
  49. 8
      UnitySDK/Assets/ML-Agents/Examples/WallJump/Scenes/WallJump.unity
  50. 37
      UnitySDK/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs
  51. 2
      UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/Barracuda.md
  52. 2
      UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/LICENSE.md
  53. 10
      UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/ReleaseNotes.md
  54. 120
      UnitySDK/Assets/ML-Agents/Scripts/Academy.cs
  55. 316
      UnitySDK/Assets/ML-Agents/Scripts/Agent.cs
  56. 4
      UnitySDK/Assets/ML-Agents/Scripts/DemonstrationRecorder.cs
  57. 69
      UnitySDK/Assets/ML-Agents/Scripts/Grpc/CommunicatorObjects/AgentInfo.cs
  58. 44
      UnitySDK/Assets/ML-Agents/Scripts/Grpc/CommunicatorObjects/BrainParameters.cs
  59. 9
      UnitySDK/Assets/ML-Agents/Scripts/Grpc/CommunicatorObjects/SpaceType.cs
  60. 2
      UnitySDK/Assets/ML-Agents/Scripts/Grpc/CommunicatorObjects/CompressedObservation.cs.meta
  61. 56
      UnitySDK/Assets/ML-Agents/Scripts/Grpc/GrpcExtensions.cs
  62. 181
      UnitySDK/Assets/ML-Agents/Scripts/Grpc/RpcCommunicator.cs
  63. 22
      UnitySDK/Assets/ML-Agents/Scripts/ICommunicator.cs
  64. 354
      UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/BarracudaModelParamLoader.cs
  65. 16
      UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/GeneratorImpl.cs
  66. 23
      UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/TensorGenerator.cs
  67. 35
      UnitySDK/Assets/ML-Agents/Scripts/Timer.cs
  68. 69
      UnitySDK/Assets/ML-Agents/Scripts/Utilities.cs
  69. 4
      UnitySDK/README.md
  70. 8
      config/gail_config.yaml
  71. 2
      config/offline_bc_config.yaml
  72. 53
      config/sac_trainer_config.yaml
  73. 42
      config/trainer_config.yaml
  74. 2
      docs/Background-TensorFlow.md
  75. 75
      docs/Basic-Guide.md
  76. 20
      docs/Creating-Custom-Protobuf-Messages.md
  77. 2
      docs/FAQ.md
  78. 4
      docs/Feature-Memory.md
  79. 118
      docs/Getting-Started-with-Balance-Ball.md
  80. 12
      docs/Glossary.md
  81. 16
      docs/Installation.md
  82. 7
      docs/Learning-Environment-Best-Practices.md
  83. 212
      docs/Learning-Environment-Create-New.md
  84. 2
      docs/Learning-Environment-Design-Academy.md
  85. 206
      docs/Learning-Environment-Design-Agents.md
  86. 66
      docs/Learning-Environment-Design.md
  87. 76
      docs/Learning-Environment-Examples.md
  88. 17
      docs/Learning-Environment-Executable.md
  89. 97
      docs/ML-Agents-Overview.md
  90. 9
      docs/Migrating.md
  91. 6
      docs/Readme.md
  92. 2
      docs/Reward-Signals.md
  93. 35
      docs/Training-Behavioral-Cloning.md
  94. 14
      docs/Training-Curriculum-Learning.md
  95. 70
      docs/Training-Generalized-Reinforcement-Learning-Agents.md
  96. 26
      docs/Training-ML-Agents.md
  97. 2
      docs/Training-PPO.md
  98. 2
      docs/Training-SAC.md
  99. 2
      docs/Training-Using-Concurrent-Unity-Instances.md
  100. 5
      docs/Training-on-Amazon-Web-Service.md

67
.circleci/config.yml


. venv/bin/activate
pip install --upgrade pip
pip install grpcio-tools==1.13.0 --progress-bar=off
pip install mypy-protobuf --progress-bar=off
pip install mypy-protobuf==1.16.0 --progress-bar=off
- save_cache:
paths:
- ./venv

path: /tmp/proto.patch
destination: proto.patch
deploy:
parameters:
directory:
type: string
description: Local directory to use for publishing (e.g. ml-agents)
docker:
- image: circleci/python:3.6
steps:
- checkout
- run:
name: install python dependencies
command: |
python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install setuptools wheel twine
- run:
name: verify git tag vs. version
command: |
python3 -m venv venv
. venv/bin/activate
cd << parameters.directory >>
python setup.py verify
- run:
name: create packages
command: |
. venv/bin/activate
cd << parameters.directory >>
python setup.py sdist
python setup.py bdist_wheel
- run:
name: upload to pypi
# To upload to test, just add the following flag to twine upload:
# --repository-url https://test.pypi.org/legacy/
# and change the username to "mlagents-test"
command: |
. venv/bin/activate
cd << parameters.directory >>
twine upload -u mlagents -p $PYPI_PASSWORD dist/*
version: 2
workflow:
jobs:
- build_python:

pip_constraints: test_constraints_max_tf2_version.txt
- markdown_link_check
- protobuf_generation_check
- deploy:
name: deploy ml-agents-envs
directory: ml-agents-envs
filters:
tags:
only: /[0-9]+(\.[0-9]+)*(\.dev[0-9]+)*/
branches:
ignore: /.*/
- deploy:
name: deploy ml-agents
directory: ml-agents
filters:
tags:
only: /[0-9]+(\.[0-9]+)*(\.dev[0-9]+)*/
branches:
ignore: /.*/
- deploy:
name: deploy gym-unity
directory: gym-unity
filters:
tags:
only: /[0-9]+(\.[0-9]+)*(\.dev[0-9]+)*/
branches:
ignore: /.*/

19
.pre-commit-config.yaml


.*_pb2.py|
.*_pb2_grpc.py
)$
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.720
hooks:

name: mypy-gym-unity
files: "gym-unity/.*"
args: [--ignore-missing-imports, --disallow-incomplete-defs]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.2.3
hooks:

.*_pb2.py|
.*_pb2_grpc.py
)$
additional_dependencies: [flake8-tidy-imports]
# flake8-tidy-imports is used for banned-modules, not actually tidying
additional_dependencies: [flake8-comprehensions, flake8-tidy-imports]
- id: trailing-whitespace
name: trailing-whitespace-markdown
types: [markdown]
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.4.1 # Use the ref you want to point at
hooks:
- id: python-check-mock-methods
# "Local" hooks, see https://pre-commit.com/#repository-local-hooks
- repo: local
hooks:

exclude: ".*localized.*"
# Only run manually, e.g. pre-commit run --hook-stage manual markdown-link-check
stages: [manual]
- id: validate-versions
name: validate library versions
language: script
entry: utils/validate_versions.py
files: ".*/setup.py"

4
SURVEY.md


# Unity ML-Agents Toolkit Survey
Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to let us know about it.
Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to let us know about it.
[Fill out the survey](https://goo.gl/forms/qFMYSYr5TlINvG6f1)
[Fill out the survey](https://goo.gl/forms/qFMYSYr5TlINvG6f1)

61
UnitySDK/Assets/ML-Agents/Editor/AgentEditor.cs


using UnityEngine;
using UnityEditor;
using Barracuda;
/*
This code is meant to modify the behavior of the inspector on Brain Components.
Depending on the type of brain that is used, the available fields will be modified in the inspector accordingly.
*/
/*
This code is meant to modify the behavior of the inspector on Agent Components.
*/
[CustomEditor(typeof(Agent), true)]
[CanEditMultipleObjects]
public class AgentEditor : Editor

var serializedAgent = serializedObject;
serializedAgent.Update();
var brain = serializedAgent.FindProperty("brain");
var actionsPerDecision = serializedAgent.FindProperty(
"agentParameters.numberOfActionsBetweenDecisions");
var maxSteps = serializedAgent.FindProperty(

var isOdd = serializedAgent.FindProperty(
"agentParameters.onDemandDecision");
var cameras = serializedAgent.FindProperty(
"agentParameters.agentCameras");
var renderTextures = serializedAgent.FindProperty(
"agentParameters.agentRenderTextures");
EditorGUILayout.PropertyField(brain);
if (cameras.arraySize > 0 && renderTextures.arraySize > 0)
{
EditorGUILayout.HelpBox("Brain visual observations created by first getting all cameras then all render textures.", MessageType.Info);
}
EditorGUILayout.LabelField("Agent Cameras");
for (var i = 0; i < cameras.arraySize; i++)
{
EditorGUILayout.PropertyField(
cameras.GetArrayElementAtIndex(i),
new GUIContent("Camera " + (i + 1) + ": "));
}
EditorGUILayout.BeginHorizontal();
if (GUILayout.Button("Add Camera", EditorStyles.miniButton))
{
cameras.arraySize++;
}
if (GUILayout.Button("Remove Camera", EditorStyles.miniButton))
{
cameras.arraySize--;
}
EditorGUILayout.EndHorizontal();
EditorGUILayout.LabelField("Agent RenderTextures");
for (var i = 0; i < renderTextures.arraySize; i++)
{
EditorGUILayout.PropertyField(
renderTextures.GetArrayElementAtIndex(i),
new GUIContent("RenderTexture " + (i + 1) + ": "));
}
EditorGUILayout.BeginHorizontal();
if (GUILayout.Button("Add RenderTextures", EditorStyles.miniButton))
{
renderTextures.arraySize++;
}
if (GUILayout.Button("Remove RenderTextures", EditorStyles.miniButton))
{
renderTextures.arraySize--;
}
EditorGUILayout.EndHorizontal();
EditorGUILayout.PropertyField(

214
UnitySDK/Assets/ML-Agents/Editor/BrainParametersDrawer.cs


// The height of a line in the Unity Inspectors
private const float k_LineHeight = 17f;
private const int k_VecObsNumLine = 3;
private const string k_CamResPropName = "cameraResolutions";
private const string k_CamWidthPropName = "width";
private const string k_CamHeightPropName = "height";
private const string k_CamGrayPropName = "blackAndWhite";
private const int k_DefaultCameraWidth = 84;
private const int k_DefaultCameraHeight = 84;
private const bool k_DefaultCameraGray = false;
if (property.isExpanded)
{
return k_LineHeight +
GetHeightDrawVectorObservation() +
GetHeightDrawVisualObservation(property) +
GetHeightDrawVectorAction(property) +
GetHeightDrawVectorActionDescriptions(property);
}
return k_LineHeight;
return GetHeightDrawVectorObservation() +
GetHeightDrawVectorAction(property);
property.isExpanded = EditorGUI.Foldout(position, property.isExpanded, label);
position.y += k_LineHeight;
if (property.isExpanded)
{
EditorGUI.BeginProperty(position, label, property);
EditorGUI.indentLevel++;
// Vector Observations
DrawVectorObservation(position, property);
position.y += GetHeightDrawVectorObservation();
EditorGUI.BeginProperty(position, label, property);
EditorGUI.indentLevel++;
//Visual Observations
DrawVisualObservations(position, property);
position.y += GetHeightDrawVisualObservation(property);
// Vector Observations
DrawVectorObservation(position, property);
position.y += GetHeightDrawVectorObservation();
// Vector Action
DrawVectorAction(position, property);
position.y += GetHeightDrawVectorAction(property);
// Vector Action
DrawVectorAction(position, property);
position.y += GetHeightDrawVectorAction(property);
// Vector Action Descriptions
DrawVectorActionDescriptions(position, property);
position.y += GetHeightDrawVectorActionDescriptions(property);
EditorGUI.EndProperty();
}
EditorGUI.EndProperty();
EditorGUI.indentLevel = indent;
}

}
/// <summary>
/// Draws the Visual Observations parameters for the Brain Parameters
/// </summary>
/// <param name="position">Rectangle on the screen to use for the property GUI.</param>
/// <param name="property">The SerializedProperty of the BrainParameters
/// to make the custom GUI for.</param>
private static void DrawVisualObservations(Rect position, SerializedProperty property)
{
EditorGUI.LabelField(position, "Visual Observations");
position.y += k_LineHeight;
var quarter = position.width / 4;
var resolutions = property.FindPropertyRelative(k_CamResPropName);
DrawVisualObsButtons(position, resolutions);
position.y += k_LineHeight;
// Display the labels for the columns : Index, Width, Height and Gray
var indexRect = new Rect(position.x, position.y, quarter, position.height);
var widthRect = new Rect(position.x + quarter, position.y, quarter, position.height);
var heightRect = new Rect(position.x + 2 * quarter, position.y, quarter, position.height);
var bwRect = new Rect(position.x + 3 * quarter, position.y, quarter, position.height);
EditorGUI.indentLevel++;
if (resolutions.arraySize > 0)
{
EditorGUI.LabelField(indexRect, "Index");
indexRect.y += k_LineHeight;
EditorGUI.LabelField(widthRect, "Width");
widthRect.y += k_LineHeight;
EditorGUI.LabelField(heightRect, "Height");
heightRect.y += k_LineHeight;
EditorGUI.LabelField(bwRect, "Gray");
bwRect.y += k_LineHeight;
}
// Iterate over the resolutions
for (var i = 0; i < resolutions.arraySize; i++)
{
EditorGUI.LabelField(indexRect, "Obs " + i);
indexRect.y += k_LineHeight;
var res = resolutions.GetArrayElementAtIndex(i);
var w = res.FindPropertyRelative("width");
w.intValue = EditorGUI.IntField(widthRect, w.intValue);
widthRect.y += k_LineHeight;
var h = res.FindPropertyRelative("height");
h.intValue = EditorGUI.IntField(heightRect, h.intValue);
heightRect.y += k_LineHeight;
var bw = res.FindPropertyRelative("blackAndWhite");
bw.boolValue = EditorGUI.Toggle(bwRect, bw.boolValue);
bwRect.y += k_LineHeight;
}
EditorGUI.indentLevel--;
}
/// <summary>
/// Draws the buttons to add and remove the visual observations parameters
/// </summary>
/// <param name="position">Rectangle on the screen to use for the property GUI.</param>
/// <param name="resolutions">The SerializedProperty of the resolution array
/// to make the custom GUI for.</param>
private static void DrawVisualObsButtons(Rect position, SerializedProperty resolutions)
{
var widthEighth = position.width / 8;
var addButtonRect = new Rect(position.x + widthEighth, position.y,
3 * widthEighth, position.height);
var removeButtonRect = new Rect(position.x + 4 * widthEighth, position.y,
3 * widthEighth, position.height);
if (resolutions.arraySize == 0)
{
addButtonRect.width *= 2;
}
// Display the buttons
if (GUI.Button(addButtonRect, "Add New", EditorStyles.miniButton))
{
resolutions.arraySize += 1;
var newRes = resolutions.GetArrayElementAtIndex(resolutions.arraySize - 1);
newRes.FindPropertyRelative(k_CamWidthPropName).intValue = k_DefaultCameraWidth;
newRes.FindPropertyRelative(k_CamHeightPropName).intValue = k_DefaultCameraHeight;
newRes.FindPropertyRelative(k_CamGrayPropName).boolValue = k_DefaultCameraGray;
}
if (resolutions.arraySize > 0)
{
if (GUI.Button(removeButtonRect, "Remove Last", EditorStyles.miniButton))
{
resolutions.arraySize -= 1;
}
}
}
/// <summary>
/// The Height required to draw the Visual Observations parameters
/// </summary>
/// <returns>The height of the drawer of the Visual Observations </returns>
private static float GetHeightDrawVisualObservation(SerializedProperty property)
{
var visObsSize = property.FindPropertyRelative(k_CamResPropName).arraySize + 2;
if (property.FindPropertyRelative(k_CamResPropName).arraySize > 0)
{
visObsSize += 1;
}
return k_LineHeight * visObsSize;
}
/// <summary>
/// Draws the Vector Actions parameters for the Brain Parameters
/// </summary>
/// <param name="position">Rectangle on the screen to use for the property GUI.</param>

actionSize += 1;
}
return actionSize * k_LineHeight;
}
/// <summary>
/// Draws the Vector Actions descriptions for the Brain Parameters
/// </summary>
/// <param name="position">Rectangle on the screen to use for the property GUI.</param>
/// <param name="property">The SerializedProperty of the BrainParameters
/// to make the custom GUI for.</param>
private static void DrawVectorActionDescriptions(Rect position, SerializedProperty property)
{
var bpVectorActionType = property.FindPropertyRelative(k_ActionTypePropName);
var vecActionSize = property.FindPropertyRelative(k_ActionSizePropName);
var numberOfDescriptions = 0;
if (bpVectorActionType.enumValueIndex == 1)
{
numberOfDescriptions = vecActionSize.GetArrayElementAtIndex(0).intValue;
}
else
{
numberOfDescriptions = vecActionSize.arraySize;
}
EditorGUI.indentLevel++;
var vecActionDescriptions =
property.FindPropertyRelative(k_ActionDescriptionPropName);
vecActionDescriptions.arraySize = numberOfDescriptions;
if (bpVectorActionType.enumValueIndex == 1)
{
//Continuous case :
EditorGUI.PropertyField(
position,
vecActionDescriptions,
new GUIContent("Action Descriptions",
"A list of strings used to name the available actionsm for the Brain."),
true);
position.y += k_LineHeight;
}
else
{
// Discrete case :
EditorGUI.PropertyField(
position,
vecActionDescriptions,
new GUIContent("Branch Descriptions",
"A list of strings used to name the available branches for the Brain."),
true);
position.y += k_LineHeight;
}
}
/// <summary>
/// The Height required to draw the Action Descriptions
/// </summary>
/// <returns>The height of the drawer of the Action Descriptions </returns>
private static float GetHeightDrawVectorActionDescriptions(SerializedProperty property)
{
var descriptionSize = 1;
if (property.FindPropertyRelative(k_ActionDescriptionPropName).isExpanded)
{
var descriptions = property.FindPropertyRelative(k_ActionDescriptionPropName);
descriptionSize += descriptions.arraySize + 1;
}
return descriptionSize * k_LineHeight;
}
}
}

30
UnitySDK/Assets/ML-Agents/Editor/DemonstrationDrawer.cs


return actionLabel.ToString();
}
/// <summary>
/// Constructs complex label for each CameraResolution object.
/// An example of this could be `[ 84 X 84 ]`
/// for a single camera with 84 pixels height and width.
/// </summary>
private static string BuildCameraResolutionLabel(SerializedProperty cameraArray)
{
var numCameras = cameraArray.arraySize;
var cameraLabel = new StringBuilder("[ ");
for (var i = 0; i < numCameras; i++)
{
var camHeightPropName =
cameraArray.GetArrayElementAtIndex(i).FindPropertyRelative("height");
cameraLabel.Append(camHeightPropName.intValue);
cameraLabel.Append(" X ");
var camWidthPropName =
cameraArray.GetArrayElementAtIndex(i).FindPropertyRelative("width");
cameraLabel.Append(camWidthPropName.intValue);
if (i < numCameras - 1)
{
cameraLabel.Append(", ");
}
}
cameraLabel.Append(" ]");
return cameraLabel.ToString();
}
/// <summary>
/// Renders Inspector UI for Brain Parameters of Demonstration.

var vecObsSizeProp = property.FindPropertyRelative("vectorObservationSize");
var numStackedProp = property.FindPropertyRelative("numStackedVectorObservations");
var actSizeProperty = property.FindPropertyRelative("vectorActionSize");
var camResProp = property.FindPropertyRelative("cameraResolutions");
var actSpaceTypeProp = property.FindPropertyRelative("vectorActionSpaceType");
var vecObsSizeLabel = vecObsSizeProp.displayName + ": " + vecObsSizeProp.intValue;

var camResLabel = camResProp.displayName + ": " + BuildCameraResolutionLabel(camResProp);
var actSpaceTypeLabel = actSpaceTypeProp.displayName + ": " +
(SpaceType)actSpaceTypeProp.enumValueIndex;

EditorGUILayout.LabelField(camResLabel);
EditorGUILayout.LabelField(actSpaceTypeLabel);
}

2
UnitySDK/Assets/ML-Agents/Editor/Tests/DemonstrationTests.cs


{
vectorObservationSize = 3,
numStackedVectorObservations = 2,
cameraResolutions = new[] {new Resolution()},
vectorActionDescriptions = new[] {"TestActionA", "TestActionB"},
vectorActionSize = new[] {2, 2},
vectorActionSpaceType = SpaceType.Discrete

var agentInfo = new AgentInfo
{
reward = 1f,
visualObservations = new List<Texture2D>(),
actionMasks = new[] {false, true},
done = true,
id = 5,

26
UnitySDK/Assets/ML-Agents/Editor/Tests/EditModeTestInternalBrainTensorGenerator.cs


{
public class EditModeTestInternalBrainTensorGenerator
{
private class TestAgent : Agent
{
}
private static IEnumerable<Agent> GetFakeAgentInfos()
{
var goA = new GameObject("goA");

stackedVectorObservation = new[] {1f, 2f, 3f}.ToList(),
stackedVectorObservation = new[] { 1f, 2f, 3f }.ToList(),
storedVectorActions = new[] {1f, 2f},
storedVectorActions = new[] { 1f, 2f },
actionMasks = null
};
var goB = new GameObject("goB");

stackedVectorObservation = new[] {4f, 5f, 6f}.ToList(),
memories = new[] {1f, 1f, 1f}.ToList(),
storedVectorActions = new[] {3f, 4f},
actionMasks = new[] {true, false, false, false, false},
stackedVectorObservation = new[] { 4f, 5f, 6f }.ToList(),
memories = new[] { 1f, 1f, 1f }.ToList(),
storedVectorActions = new[] { 3f, 4f },
actionMasks = new[] { true, false, false, false, false },
return new List<Agent> {agentA, agentB};
return new List<Agent> { agentA, agentB };
}
[Test]

{
var inputTensor = new TensorProxy
{
shape = new long[] {2, 3}
shape = new long[] { 2, 3 }
};
const int batchSize = 4;
var agentInfos = GetFakeAgentInfos();

{
var inputTensor = new TensorProxy
{
shape = new long[] {2, 5}
shape = new long[] { 2, 5 }
};
const int batchSize = 4;
var agentInfos = GetFakeAgentInfos();

{
var inputTensor = new TensorProxy
{
shape = new long[] {2, 2},
shape = new long[] { 2, 2 },
valueType = TensorProxy.TensorType.Integer
};
const int batchSize = 4;

{
var inputTensor = new TensorProxy
{
shape = new long[] {2, 5},
shape = new long[] { 2, 5 },
valueType = TensorProxy.TensorType.FloatingPoint
};
const int batchSize = 4;

113
UnitySDK/Assets/ML-Agents/Editor/Tests/MLAgentsEditModeTest.cs


using UnityEngine;
using NUnit.Framework;
using System.Reflection;
using MLAgents.Sensor;
using MLAgents.InferenceBrain;
namespace MLAgents.Tests
{

public override void InitializeAgent()
{
initializeAgentCalls += 1;
// Add in some custom sensors so we can confirm they get sorted as expected.
var sensor1 = new TestSensor("testsensor1");
var sensor2 = new TestSensor("testsensor2");
m_Sensors.Add(sensor2);
m_Sensors.Add(sensor1);
AddVectorObs(0f);
}
public override void AgentAction(float[] vectorAction, string textAction)

public override void AgentOnDone()
{
agentOnDoneCalls += 1;
}
public override float[] Heuristic()
{
return new float[0];
// This is an empty class for testing the behavior of agents and academy
// It is left empty because we are not testing any brain behavior
public class TestBrain : Brain
public class TestSensor : ISensor
public int numberOfCallsToInitialize;
public int numberOfCallsToDecideAction;
public static TestBrain Instantiate()
public string sensorName;
public TestSensor(string n)
{
sensorName = n;
}
public int[] GetFloatObservationShape()
{
return new[] { 1 };
}
public void WriteToTensor(TensorProxy tensorProxy, int agentIndex) { }
public byte[] GetCompressedObservation()
return CreateInstance<TestBrain>();
return null;
protected override void Initialize()
public CompressionType GetCompressionType()
numberOfCallsToInitialize++;
return CompressionType.None;
protected override void DecideAction()
public string GetName()
numberOfCallsToDecideAction++;
m_Agents.Clear();
return sensorName;
public class EditModeTestGeneration
{

//This will call the method even though it is private
var academyInitializeMethod = typeof(Academy).GetMethod("InitializeEnvironment",
BindingFlags.Instance | BindingFlags.NonPublic);
academyInitializeMethod?.Invoke(aca, new object[] {});
academyInitializeMethod?.Invoke(aca, new object[] { });
Assert.AreEqual(1, aca.initializeAcademyCalls);
Assert.AreEqual(0, aca.GetEpisodeCount());
Assert.AreEqual(0, aca.GetStepCount());

acaGo.AddComponent<TestAcademy>();
var aca = acaGo.GetComponent<TestAcademy>();
aca.resetParameters = new ResetParameters();
var brain = TestBrain.Instantiate();
brain.brainParameters = new BrainParameters();
brain.brainParameters.vectorObservationSize = 0;
agent1.GiveBrain(brain);
agent2.GiveBrain(brain);
Assert.AreEqual(false, agent1.IsDone());
Assert.AreEqual(false, agent2.IsDone());

agentEnableMethod?.Invoke(agent2, new object[] { aca });
academyInitializeMethod?.Invoke(aca, new object[] {});
academyInitializeMethod?.Invoke(aca, new object[] { });
agentEnableMethod?.Invoke(agent1, new object[] { aca });
Assert.AreEqual(false, agent1.IsDone());

Assert.AreEqual(1, agent2.initializeAgentCalls);
Assert.AreEqual(0, agent1.agentActionCalls);
Assert.AreEqual(0, agent2.agentActionCalls);
// Make sure the sensors were sorted
Assert.AreEqual(agent1.m_Sensors[0].GetName(), "testsensor1");
Assert.AreEqual(agent1.m_Sensors[1].GetName(), "testsensor2");
}
}

aca.resetParameters = new ResetParameters();
var academyInitializeMethod = typeof(Academy).GetMethod("InitializeEnvironment",
BindingFlags.Instance | BindingFlags.NonPublic);
academyInitializeMethod?.Invoke(aca, new object[] {});
academyInitializeMethod?.Invoke(aca, new object[] { });
var academyStepMethod = typeof(Academy).GetMethod("EnvironmentStep",
BindingFlags.Instance | BindingFlags.NonPublic);

{
numberReset += 1;
}
academyStepMethod?.Invoke(aca, new object[] {});
academyStepMethod?.Invoke(aca, new object[] { });
}
}

acaGo.AddComponent<TestAcademy>();
var aca = acaGo.GetComponent<TestAcademy>();
aca.resetParameters = new ResetParameters();
var brain = TestBrain.Instantiate();
var agentEnableMethod = typeof(Agent).GetMethod(

agent1.agentParameters = new AgentParameters();
agent2.agentParameters = new AgentParameters();
brain.brainParameters = new BrainParameters();
// We use event based so the agent will now try to send anything to the brain
agent1.agentParameters.onDemandDecision = false;
agent1.agentParameters.numberOfActionsBetweenDecisions = 2;

brain.brainParameters.vectorObservationSize = 0;
brain.brainParameters.cameraResolutions = new Resolution[0];
agent1.GiveBrain(brain);
agent2.GiveBrain(brain);
academyInitializeMethod?.Invoke(aca, new object[] {});
academyInitializeMethod?.Invoke(aca, new object[] { });
var academyStepMethod = typeof(Academy).GetMethod(
"EnvironmentStep", BindingFlags.Instance | BindingFlags.NonPublic);

requestAction += 1;
agent2.RequestAction();
}
academyStepMethod?.Invoke(aca, new object[] {});
academyStepMethod?.Invoke(aca, new object[] { });
}
}
}

aca.resetParameters = new ResetParameters();
var academyInitializeMethod = typeof(Academy).GetMethod(
"InitializeEnvironment", BindingFlags.Instance | BindingFlags.NonPublic);
academyInitializeMethod?.Invoke(aca, new object[] {});
academyInitializeMethod?.Invoke(aca, new object[] { });
var academyStepMethod = typeof(Academy).GetMethod(
"EnvironmentStep", BindingFlags.Instance | BindingFlags.NonPublic);

}
stepsSinceReset += 1;
academyStepMethod.Invoke(aca, new object[] {});
academyStepMethod.Invoke(aca, new object[] { });
}
}

acaGo.AddComponent<TestAcademy>();
var aca = acaGo.GetComponent<TestAcademy>();
aca.resetParameters = new ResetParameters();
var brain = TestBrain.Instantiate();
var agentEnableMethod = typeof(Agent).GetMethod(

agent1.agentParameters = new AgentParameters();
agent2.agentParameters = new AgentParameters();
brain.brainParameters = new BrainParameters();
// agent2 will request decisions only when RequestDecision is called
brain.brainParameters.vectorObservationSize = 0;
brain.brainParameters.cameraResolutions = new Resolution[0];
agent1.GiveBrain(brain);
agent2.GiveBrain(brain);
academyInitializeMethod?.Invoke(aca, new object[] {});
academyInitializeMethod?.Invoke(aca, new object[] { });
var numberAgent1Reset = 0;
var numberAgent2Reset = 0;

agent2StepSinceReset += 1;
//Agent 1 is only initialized at step 2
if (i < 2)
{}
academyStepMethod?.Invoke(aca, new object[] {});
{ }
academyStepMethod?.Invoke(aca, new object[] { });
}
}
}

acaGo.AddComponent<TestAcademy>();
var aca = acaGo.GetComponent<TestAcademy>();
aca.resetParameters = new ResetParameters();
var brain = TestBrain.Instantiate();
var agentEnableMethod = typeof(Agent).GetMethod(

agent1.agentParameters = new AgentParameters();
agent2.agentParameters = new AgentParameters();
brain.brainParameters = new BrainParameters();
// We use event based so the agent will now try to send anything to the brain
agent1.agentParameters.onDemandDecision = false;
// agent1 will take an action at every step and request a decision every steps

//Here we specify that the agent does not reset when done
agent1.agentParameters.resetOnDone = false;
agent2.agentParameters.resetOnDone = false;
brain.brainParameters.vectorObservationSize = 0;
brain.brainParameters.cameraResolutions = new Resolution[0];
agent1.GiveBrain(brain);
agent2.GiveBrain(brain);
academyInitializeMethod?.Invoke(aca, new object[] {});
academyInitializeMethod?.Invoke(aca, new object[] { });
agentEnableMethod?.Invoke(agent1, new object[] { aca });
var agent1ResetOnDone = 0;

}
academyStepMethod?.Invoke(aca, new object[] {});
academyStepMethod?.Invoke(aca, new object[] { });
}
}

acaGo.AddComponent<TestAcademy>();
var aca = acaGo.GetComponent<TestAcademy>();
aca.resetParameters = new ResetParameters();
var brain = TestBrain.Instantiate();
var agentEnableMethod = typeof(Agent).GetMethod(

agent1.agentParameters = new AgentParameters();
agent2.agentParameters = new AgentParameters();
brain.brainParameters = new BrainParameters();
// We use event based so the agent will now try to send anything to the brain
agent1.agentParameters.onDemandDecision = false;
agent1.agentParameters.numberOfActionsBetweenDecisions = 3;

agent1.agentParameters.maxStep = 20;
brain.brainParameters.vectorObservationSize = 0;
brain.brainParameters.cameraResolutions = new Resolution[0];
agent1.GiveBrain(brain);
agent2.GiveBrain(brain);
academyInitializeMethod?.Invoke(aca, new object[] {});
academyInitializeMethod?.Invoke(aca, new object[] { });
agentEnableMethod?.Invoke(agent1, new object[] { aca });

Assert.LessOrEqual(Mathf.Abs(i * 0.1f - agent2.GetCumulativeReward()), 0.05f);
academyStepMethod?.Invoke(aca, new object[] {});
academyStepMethod?.Invoke(aca, new object[] { });
agent1.AddReward(10f);
if ((i % 21 == 0) && (i > 0))

29
UnitySDK/Assets/ML-Agents/Examples/3DBall/Prefabs/3DBall.prefab


m_Component:
- component: {fileID: 4780098186595842}
- component: {fileID: 65010516625723872}
- component: {fileID: 114259948429386406}
- component: {fileID: 114368073295828880}
- component: {fileID: 114715123104194396}
m_Layer: 0
m_Name: Agent
m_TagString: Untagged

serializedVersion: 2
m_Size: {x: 1, y: 1, z: 1}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114259948429386406
--- !u!114 &114368073295828880
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1424713891854676}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 8
numStackedVectorObservations: 1
vectorActionSize: 02000000
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: a0e8d1fda5a6f41be955d2b30479c2a1, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: 3DBall
--- !u!114 &114715123104194396
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}

m_Script: {fileID: 11500000, guid: aaba48bf82bee4751aa7b89569e57f73, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 383c589e8bb76464eadc2525b5b0f2c1, type: 2}
agentCameras: []
agentRenderTextures: []
maxStep: 5000
resetOnDone: 1
onDemandDecision: 0

24
UnitySDK/Assets/ML-Agents/Examples/3DBall/Prefabs/3DBallHardNew.prefab


m_Component:
- component: {fileID: 4895942152145390}
- component: {fileID: 65170961617201804}
- component: {fileID: 114284317994838100}
- component: {fileID: 114466000339026140}
m_Layer: 0
m_Name: Agent

serializedVersion: 2
m_Size: {x: 1, y: 1, z: 1}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114284317994838100
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1829721031899636}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 5
numStackedVectorObservations: 9
vectorActionSize: 02000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: cee7d20369b814d549573de7e76c4a81, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: 3DBallHard
--- !u!114 &114466000339026140
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: edf26e11cf4ed42eaa3ffb7b91bb4676, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 4f74e089fbb75455ebf6f0495e30be6e, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

47
UnitySDK/Assets/ML-Agents/Examples/3DBall/Scenes/3DBall.unity


propertyPath: m_Name
value: 3DBall (7)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
m_IsPrefabParent: 0

propertyPath: m_Name
value: 3DBall (5)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
m_IsPrefabParent: 0

- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_Name
value: 3DBall (6)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}

propertyPath: m_Name
value: 3DBall (3)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
m_IsPrefabParent: 0

propertyPath: m_Name
value: 3DBall (8)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
m_IsPrefabParent: 0

- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_Name
value: 3DBall (9)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}

propertyPath: m_Name
value: 3DBall (10)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
m_IsPrefabParent: 0

propertyPath: m_Name
value: 3DBall (1)
objectReference: {fileID: 0}
- target: {fileID: 0}
propertyPath: m_BrainParameters.numStackedVectorObservations
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
m_IsPrefabParent: 0

- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_Name
value: 3DBall (11)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}

propertyPath: m_Name
value: 3DBall (4)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
m_IsPrefabParent: 0

m_Script: {fileID: 11500000, guid: eb15e3c3d55e54abaafb74c635b6a458, type: 3}
m_Name:
m_EditorClassIdentifier:
broadcastHub:
brainsToControl:
- {fileID: 11400000, guid: 383c589e8bb76464eadc2525b5b0f2c1, type: 2}
m_TrainingConfiguration:
width: 300
height: 200

- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_Name
value: 3DBall (2)
objectReference: {fileID: 0}
- target: {fileID: 1321468028730240, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: cfa81c019162c4e3caf6e2999c6fdf48, type: 2}

34
UnitySDK/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs


public override void AgentAction(float[] vectorAction, string textAction)
{
if (brain.brainParameters.vectorActionSpaceType == SpaceType.Continuous)
var actionZ = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
var actionX = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
if ((gameObject.transform.rotation.z < 0.25f && actionZ > 0f) ||
(gameObject.transform.rotation.z > -0.25f && actionZ < 0f))
var actionZ = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
var actionX = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
gameObject.transform.Rotate(new Vector3(0, 0, 1), actionZ);
}
if ((gameObject.transform.rotation.z < 0.25f && actionZ > 0f) ||
(gameObject.transform.rotation.z > -0.25f && actionZ < 0f))
{
gameObject.transform.Rotate(new Vector3(0, 0, 1), actionZ);
}
if ((gameObject.transform.rotation.x < 0.25f && actionX > 0f) ||
(gameObject.transform.rotation.x > -0.25f && actionX < 0f))
{
gameObject.transform.Rotate(new Vector3(1, 0, 0), actionX);
}
if ((gameObject.transform.rotation.x < 0.25f && actionX > 0f) ||
(gameObject.transform.rotation.x > -0.25f && actionX < 0f))
{
gameObject.transform.Rotate(new Vector3(1, 0, 0), actionX);
}
if ((ball.transform.position.y - gameObject.transform.position.y) < -2f ||
Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||

+ gameObject.transform.position;
//Reset the parameters when the Agent is reset.
SetResetParameters();
}
public override float[] Heuristic()
{
var action = new float[2];
action[0] = -Input.GetAxis("Horizontal");
action[1] = Input.GetAxis("Vertical");
return action;
}
public void SetBall()

25
UnitySDK/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs


public override void AgentAction(float[] vectorAction, string textAction)
{
if (brain.brainParameters.vectorActionSpaceType == SpaceType.Continuous)
var actionZ = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
var actionX = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
if ((gameObject.transform.rotation.z < 0.25f && actionZ > 0f) ||
(gameObject.transform.rotation.z > -0.25f && actionZ < 0f))
var actionZ = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
var actionX = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
gameObject.transform.Rotate(new Vector3(0, 0, 1), actionZ);
}
if ((gameObject.transform.rotation.z < 0.25f && actionZ > 0f) ||
(gameObject.transform.rotation.z > -0.25f && actionZ < 0f))
{
gameObject.transform.Rotate(new Vector3(0, 0, 1), actionZ);
}
if ((gameObject.transform.rotation.x < 0.25f && actionX > 0f) ||
(gameObject.transform.rotation.x > -0.25f && actionX < 0f))
{
gameObject.transform.Rotate(new Vector3(1, 0, 0), actionX);
}
if ((gameObject.transform.rotation.x < 0.25f && actionX > 0f) ||
(gameObject.transform.rotation.x > -0.25f && actionX < 0f))
{
gameObject.transform.Rotate(new Vector3(1, 0, 0), actionX);
}
if ((ball.transform.position.y - gameObject.transform.position.y) < -2f ||
Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||

24
UnitySDK/Assets/ML-Agents/Examples/Basic/Prefabs/Basic.prefab


m_Component:
- component: {fileID: 4170723581433160}
- component: {fileID: 65968285873374238}
- component: {fileID: 114502619508238574}
- component: {fileID: 114827551040495112}
m_Layer: 0
m_Name: BasicAgent

serializedVersion: 2
m_Size: {x: 1, y: 1, z: 1}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114502619508238574
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1263463520136984}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 20
numStackedVectorObservations: 1
vectorActionSize: 03000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: 53fa7c392ce3c492281be273668f6aaf, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Basic
--- !u!114 &114827551040495112
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: 624480a72e46148118ab2e2d89b537de, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: e5cf0e35e16264ea483f8863e5115c3c, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

13
UnitySDK/Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs


largeGoal.transform.position = new Vector3(m_LargeGoalPosition - 10f, 0f, 0f);
}
public override float[] Heuristic()
{
if (Input.GetKey(KeyCode.D))
{
return new float[] { 2 };
}
if (Input.GetKey(KeyCode.A))
{
return new float[] { 1 };
}
return new float[] { 0 };
}
public override void AgentOnDone()
{
}

24
UnitySDK/Assets/ML-Agents/Examples/Bouncer/Prefabs/Environment.prefab


- component: {fileID: 33085749764809866}
- component: {fileID: 65800894914404220}
- component: {fileID: 54030303118153432}
- component: {fileID: 114938751572484598}
- component: {fileID: 114878620968301562}
m_Layer: 0
m_Name: Agent

m_Script: {fileID: 11500000, guid: 0f09741cbce2e44bc88d3e92917eea0e, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 573920e3a672d40038169c7ffdbdca05, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

target: {fileID: 1160631129428284}
bodyObject: {fileID: 1680588139522898}
strength: 500
--- !u!114 &114938751572484598
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1604827395706042}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 6
numStackedVectorObservations: 3
vectorActionSize: 03000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: f5250a39cb2134db49b833e3c92527a1, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Bouncer

10
UnitySDK/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs


}
}
public override float[] Heuristic()
{
var action = new float[3];
action[0] = Input.GetAxis("Horizontal");
action[1] = Input.GetKey(KeyCode.Space) ? 1.0f : 0.0f;
action[2] = Input.GetAxis("Vertical");
return action;
}
private void Update()
{
if (m_LookDir.magnitude > float.Epsilon)

25
UnitySDK/Assets/ML-Agents/Examples/Crawler/Prefabs/DynamicPlatform.prefab


serializedVersion: 5
m_Component:
- component: {fileID: 4313455366547514}
- component: {fileID: 114060650647145362}
- component: {fileID: 114590693924030052}
- component: {fileID: 114423363226357902}
m_Layer: 0

m_UseColorTemperature: 0
m_ShadowRadius: 0
m_ShadowAngle: 0
--- !u!114 &114060650647145362
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1515093357607024}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 126
numStackedVectorObservations: 1
vectorActionSize: 14000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: abc9c8f2180154ed7ba3f116ab0beb90, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: CrawlerDynamic
--- !u!114 &114157055237627828
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: 2f37c30a5e8d04117947188818902ef3, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 0e3b44d36c7a047c4addb92457b12be5, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

target: {fileID: 4490950947783742}
ground: {fileID: 4684408634944056}
detectTargets: 1
targetIsStatic: 0
respawnTargetWhenTouched: 1
targetSpawnRadius: 40
body: {fileID: 4331762859142564}

24
UnitySDK/Assets/ML-Agents/Examples/Crawler/Prefabs/FixedPlatform.prefab


serializedVersion: 5
m_Component:
- component: {fileID: 4743084330461368}
- component: {fileID: 114727679958902886}
- component: {fileID: 114230237520033992}
- component: {fileID: 114375802757824636}
m_Layer: 0

m_Script: {fileID: 11500000, guid: 2f37c30a5e8d04117947188818902ef3, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 0505e961608004377974940ed17e03d5, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

penalizeGroundContact: 0
groundContactPenalty: 0
touchingGround: 0
--- !u!114 &114727679958902886
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1492298671135358}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 126
numStackedVectorObservations: 1
vectorActionSize: 14000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: 48982d8fa360a4ed0bb265495e4f378b, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: CrawlerStatic
--- !u!114 &114954029223843696
MonoBehaviour:
m_ObjectHideFlags: 1

128
UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Prefabs/FoodCollectorArea.prefab


- component: {fileID: 4419274671784554}
- component: {fileID: 65550728419070768}
- component: {fileID: 54936164982484646}
- component: {fileID: 114374774605792098}
- component: {fileID: 114762047763154270}
- component: {fileID: 114176228333253036}
m_Layer: 0

- component: {fileID: 4756368533889646}
- component: {fileID: 65905012397919158}
- component: {fileID: 54504078365531932}
- component: {fileID: 114522573150607728}
- component: {fileID: 114416645532260476}
- component: {fileID: 114711827726849508}
m_Layer: 0

- component: {fileID: 4426245476092464}
- component: {fileID: 65152194455140476}
- component: {fileID: 54961653455021136}
- component: {fileID: 114980787530065684}
- component: {fileID: 114192565006091356}
- component: {fileID: 114542632553128056}
m_Layer: 0

- component: {fileID: 4259834826122778}
- component: {fileID: 65761952312736034}
- component: {fileID: 54819001862035794}
- component: {fileID: 114878550018296316}
- component: {fileID: 114661830999747712}
- component: {fileID: 114189751434580810}
m_Layer: 0

- component: {fileID: 4137908820211030}
- component: {fileID: 65367560123033576}
- component: {fileID: 54895479068989492}
- component: {fileID: 114035338027591536}
- component: {fileID: 114821937036444478}
- component: {fileID: 114235147148547996}
m_Layer: 0

serializedVersion: 2
m_Size: {x: 1, y: 1, z: 1}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114035338027591536
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1706274796045088}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 53
numStackedVectorObservations: 1
vectorActionSize: 03000000030000000300000002000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: d32fca21cf4c04536ab7f88eb9de83e0, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: FoodCollector
--- !u!114 &114176228333253036
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: c66e6845309d241c78a6d77ee2567928, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 9e7865ec29c894c2d8c1617b0fa392f9, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

m_Script: {fileID: 11500000, guid: c66e6845309d241c78a6d77ee2567928, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 9e7865ec29c894c2d8c1617b0fa392f9, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

badMaterial: {fileID: 2100000, guid: 88b9ae7af2c1748a0a1f63407587a601, type: 2}
goodMaterial: {fileID: 2100000, guid: c67450f290f3e4897bc40276a619e78d, type: 2}
frozenMaterial: {fileID: 2100000, guid: 66163cf35956a4be08e801b750c26f33, type: 2}
myLaser: {fileID: 1617924810425504}
myLaser: {fileID: 1081721624670010}
contribute: 0
useVectorObs: 1
--- !u!114 &114192565006091356

m_Script: {fileID: 11500000, guid: c66e6845309d241c78a6d77ee2567928, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 9e7865ec29c894c2d8c1617b0fa392f9, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

badMaterial: {fileID: 2100000, guid: 88b9ae7af2c1748a0a1f63407587a601, type: 2}
goodMaterial: {fileID: 2100000, guid: c67450f290f3e4897bc40276a619e78d, type: 2}
frozenMaterial: {fileID: 2100000, guid: 66163cf35956a4be08e801b750c26f33, type: 2}
myLaser: {fileID: 1045923826166930}
myLaser: {fileID: 1081721624670010}
--- !u!114 &114374774605792098
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1464820575638702}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 53
numStackedVectorObservations: 1
vectorActionSize: 03000000030000000300000002000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: d32fca21cf4c04536ab7f88eb9de83e0, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: FoodCollector
--- !u!114 &114416645532260476
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: bb172294dbbcc408286b156a2c4b553c, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!114 &114522573150607728
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1495617568563208}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 53
numStackedVectorObservations: 1
vectorActionSize: 03000000030000000300000002000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: d32fca21cf4c04536ab7f88eb9de83e0, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: FoodCollector
--- !u!114 &114542632553128056
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: c66e6845309d241c78a6d77ee2567928, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 9e7865ec29c894c2d8c1617b0fa392f9, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

badMaterial: {fileID: 2100000, guid: 88b9ae7af2c1748a0a1f63407587a601, type: 2}
goodMaterial: {fileID: 2100000, guid: c67450f290f3e4897bc40276a619e78d, type: 2}
frozenMaterial: {fileID: 2100000, guid: 66163cf35956a4be08e801b750c26f33, type: 2}
myLaser: {fileID: 1421240237750412}
myLaser: {fileID: 1081721624670010}
contribute: 0
useVectorObs: 1
--- !u!114 &114661830999747712

m_Script: {fileID: 11500000, guid: c66e6845309d241c78a6d77ee2567928, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 9e7865ec29c894c2d8c1617b0fa392f9, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

badMaterial: {fileID: 2100000, guid: 88b9ae7af2c1748a0a1f63407587a601, type: 2}
goodMaterial: {fileID: 2100000, guid: c67450f290f3e4897bc40276a619e78d, type: 2}
frozenMaterial: {fileID: 2100000, guid: 66163cf35956a4be08e801b750c26f33, type: 2}
myLaser: {fileID: 1941433838307300}
myLaser: {fileID: 1081721624670010}
contribute: 0
useVectorObs: 1
--- !u!114 &114762047763154270

m_Script: {fileID: 11500000, guid: bb172294dbbcc408286b156a2c4b553c, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!114 &114878550018296316
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1672905243433088}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 53
numStackedVectorObservations: 1
vectorActionSize: 03000000030000000300000002000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: d32fca21cf4c04536ab7f88eb9de83e0, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: FoodCollector
--- !u!114 &114980787530065684
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1601500200010266}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 53
numStackedVectorObservations: 1
vectorActionSize: 03000000030000000300000002000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: d32fca21cf4c04536ab7f88eb9de83e0, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: FoodCollector

868
UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Prefabs/VisualFoodCollectorArea.prefab
文件差异内容过多而无法显示
查看文件

92
UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scenes/VisualFoodCollector.unity


m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 3ce107b4a79bc4eef83afde434932a68, type: 2}
m_IsPrefabParent: 0
--- !u!1001 &868060419
Prefab:
m_ObjectHideFlags: 0
serializedVersion: 2
m_Modification:
m_TransformParent: {fileID: 0}
m_Modifications:
- target: {fileID: 4307641258646068, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalPosition.x
value: 0
objectReference: {fileID: 0}
- target: {fileID: 4307641258646068, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalPosition.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 4307641258646068, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalPosition.z
value: 0
objectReference: {fileID: 0}
- target: {fileID: 4307641258646068, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalRotation.x
value: -0
objectReference: {fileID: 0}
- target: {fileID: 4307641258646068, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalRotation.y
value: -0
objectReference: {fileID: 0}
- target: {fileID: 4307641258646068, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalRotation.z
value: -0
objectReference: {fileID: 0}
- target: {fileID: 4307641258646068, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalRotation.w
value: 1
objectReference: {fileID: 0}
- target: {fileID: 4307641258646068, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_RootOrder
value: 5
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
m_IsPrefabParent: 0
--- !u!1 &1009000883
GameObject:
m_ObjectHideFlags: 0

m_OcclusionCulling: 1
m_StereoConvergence: 10
m_StereoSeparation: 0.022
--- !u!1001 &1081822017
Prefab:
m_ObjectHideFlags: 0
serializedVersion: 2
m_Modification:
m_TransformParent: {fileID: 0}
m_Modifications:
- target: {fileID: 4612263362188236, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalPosition.x
value: 0
objectReference: {fileID: 0}
- target: {fileID: 4612263362188236, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalPosition.y
value: 0
objectReference: {fileID: 0}
- target: {fileID: 4612263362188236, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalPosition.z
value: 0
objectReference: {fileID: 0}
- target: {fileID: 4612263362188236, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalRotation.x
value: -0
objectReference: {fileID: 0}
- target: {fileID: 4612263362188236, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalRotation.y
value: -0
objectReference: {fileID: 0}
- target: {fileID: 4612263362188236, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalRotation.z
value: -0
objectReference: {fileID: 0}
- target: {fileID: 4612263362188236, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_LocalRotation.w
value: 1
objectReference: {fileID: 0}
- target: {fileID: 4612263362188236, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
propertyPath: m_RootOrder
value: 5
objectReference: {fileID: 0}
- target: {fileID: 23446453883495642, guid: c85b585836e104587b4efdc4d8b9d62b,
type: 2}
propertyPath: m_Materials.Array.data[0]
value:
objectReference: {fileID: 2100000, guid: 580f2003972f64189826f085e2498080, type: 3}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: c85b585836e104587b4efdc4d8b9d62b, type: 2}
m_IsPrefabParent: 0
--- !u!1 &1086444495
GameObject:
m_ObjectHideFlags: 0

m_Script: {fileID: 11500000, guid: 4fe57113e76a5426297487dd6faadc5b, type: 3}
m_Name:
m_EditorClassIdentifier:
broadcastHub:
brainsToControl:
- {fileID: 11400000, guid: 24e823594179d48189b2c78003c50ce0, type: 2}
m_TrainingConfiguration:
width: 500
height: 500

102
UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorAgent.cs


if (!m_Frozen)
{
var shootCommand = false;
if (brain.brainParameters.vectorActionSpaceType == SpaceType.Continuous)
var forwardAxis = (int)act[0];
var rightAxis = (int)act[1];
var rotateAxis = (int)act[2];
var shootAxis = (int)act[3];
switch (forwardAxis)
dirToGo = transform.forward * Mathf.Clamp(act[0], -1f, 1f);
rotateDir = transform.up * Mathf.Clamp(act[1], -1f, 1f);
shootCommand = Mathf.Clamp(act[2], -1f, 1f) > 0.5f;
case 1:
dirToGo = transform.forward;
break;
case 2:
dirToGo = -transform.forward;
break;
else
switch (rightAxis)
var forwardAxis = (int)act[0];
var rightAxis = (int)act[1];
var rotateAxis = (int)act[2];
var shootAxis = (int)act[3];
case 1:
dirToGo = transform.right;
break;
case 2:
dirToGo = -transform.right;
break;
}
switch (forwardAxis)
{
case 1:
dirToGo = transform.forward;
break;
case 2:
dirToGo = -transform.forward;
break;
}
switch (rightAxis)
{
case 1:
dirToGo = transform.right;
break;
case 2:
dirToGo = -transform.right;
break;
}
switch (rotateAxis)
{
case 1:
rotateDir = -transform.up;
break;
case 2:
rotateDir = transform.up;
break;
}
switch (shootAxis)
{
case 1:
shootCommand = true;
break;
}
switch (rotateAxis)
{
case 1:
rotateDir = -transform.up;
break;
case 2:
rotateDir = transform.up;
break;
}
switch (shootAxis)
{
case 1:
shootCommand = true;
break;
}
if (shootCommand)
{

public override void AgentAction(float[] vectorAction, string textAction)
{
MoveAgent(vectorAction);
}
public override float[] Heuristic()
{
var action = new float[4];
if (Input.GetKey(KeyCode.D))
{
action[2] = 2f;
}
if (Input.GetKey(KeyCode.W))
{
action[0] = 1f;
}
if (Input.GetKey(KeyCode.A))
{
action[2] = 1f;
}
if (Input.GetKey(KeyCode.S))
{
action[0] = 2f;
}
action[3] = Input.GetKey(KeyCode.Space) ? 1.0f : 0.0f;
return action;
}
public override void AgentReset()

849
UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scenes/GridWorld.unity
文件差异内容过多而无法显示
查看文件

113
UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAcademy.cs


using System.Collections.Generic;
using System.Linq;
[HideInInspector]
public List<GameObject> actorObjs;
[HideInInspector]
public int[] players;
public Camera MainCamera;
public GameObject trueAgent;
public int gridSize;
public GameObject camObject;
Camera m_Cam;
Camera m_AgentCam;
public GameObject agentPref;
public GameObject goalPref;
public GameObject pitPref;
GameObject[] m_Objects;
GameObject m_Plane;
GameObject m_Sn;
GameObject m_Ss;
GameObject m_Se;
GameObject m_Sw;
public override void InitializeAcademy()
public override void AcademyReset()
gridSize = (int)resetParameters["gridSize"];
m_Cam = camObject.GetComponent<Camera>();
m_Objects = new[] {agentPref, goalPref, pitPref};
m_AgentCam = GameObject.Find("agentCam").GetComponent<Camera>();
actorObjs = new List<GameObject>();
m_Plane = GameObject.Find("Plane");
m_Sn = GameObject.Find("sN");
m_Ss = GameObject.Find("sS");
m_Sw = GameObject.Find("sW");
m_Se = GameObject.Find("sE");
}
public void SetEnvironment()
{
m_Cam.transform.position = new Vector3(-((int)resetParameters["gridSize"] - 1) / 2f,
MainCamera.transform.position = new Vector3(-((int)resetParameters["gridSize"] - 1) / 2f,
m_Cam.orthographicSize = ((int)resetParameters["gridSize"] + 5f) / 2f;
var playersList = new List<int>();
for (var i = 0; i < (int)resetParameters["numObstacles"]; i++)
{
playersList.Add(2);
}
for (var i = 0; i < (int)resetParameters["numGoals"]; i++)
{
playersList.Add(1);
}
players = playersList.ToArray();
m_Plane.transform.localScale = new Vector3(gridSize / 10.0f, 1f, gridSize / 10.0f);
m_Plane.transform.position = new Vector3((gridSize - 1) / 2f, -0.5f, (gridSize - 1) / 2f);
m_Sn.transform.localScale = new Vector3(1, 1, gridSize + 2);
m_Ss.transform.localScale = new Vector3(1, 1, gridSize + 2);
m_Sn.transform.position = new Vector3((gridSize - 1) / 2f, 0.0f, gridSize);
m_Ss.transform.position = new Vector3((gridSize - 1) / 2f, 0.0f, -1);
m_Se.transform.localScale = new Vector3(1, 1, gridSize + 2);
m_Sw.transform.localScale = new Vector3(1, 1, gridSize + 2);
m_Se.transform.position = new Vector3(gridSize, 0.0f, (gridSize - 1) / 2f);
m_Sw.transform.position = new Vector3(-1, 0.0f, (gridSize - 1) / 2f);
m_AgentCam.orthographicSize = (gridSize) / 2f;
m_AgentCam.transform.position = new Vector3((gridSize - 1) / 2f, gridSize + 1f, (gridSize - 1) / 2f);
}
public override void AcademyReset()
{
foreach (var actor in actorObjs)
{
DestroyImmediate(actor);
}
SetEnvironment();
actorObjs.Clear();
var numbers = new HashSet<int>();
while (numbers.Count < players.Length + 1)
{
numbers.Add(Random.Range(0, gridSize * gridSize));
}
var numbersA = Enumerable.ToArray(numbers);
for (var i = 0; i < players.Length; i++)
{
var x = (numbersA[i]) / gridSize;
var y = (numbersA[i]) % gridSize;
var actorObj = Instantiate(m_Objects[players[i]]);
actorObj.transform.position = new Vector3(x, -0.25f, y);
actorObjs.Add(actorObj);
}
var xA = (numbersA[players.Length]) / gridSize;
var yA = (numbersA[players.Length]) % gridSize;
trueAgent.transform.position = new Vector3(xA, -0.25f, yA);
}
public override void AcademyStep()
{
MainCamera.orthographicSize = ((int)resetParameters["gridSize"] + 5f) / 2f;
}
}

32
UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs


using UnityEngine;
using System.Linq;
using MLAgents;
using UnityEngine.Serialization;
private Academy m_Academy;
[FormerlySerializedAs("m_Area")]
private GridAcademy m_Academy;
public GridArea area;
public float timeBetweenDecisionsAtInference;
private float m_TimeSinceDecision;

public override void InitializeAgent()
{
m_Academy = FindObjectOfType(typeof(GridAcademy)) as GridAcademy;
m_Academy = FindObjectOfType<Academy>();
}
public override void CollectObservations()

// Prevents the agent from picking an action that would make it collide with a wall
var positionX = (int)transform.position.x;
var positionZ = (int)transform.position.z;
var maxPosition = m_Academy.gridSize - 1;
var maxPosition = (int)m_Academy.resetParameters["gridSize"] - 1;
if (positionX == 0)
{

}
}
public override float[] Heuristic()
{
if (Input.GetKey(KeyCode.D))
{
return new float[] { k_Right };
}
if (Input.GetKey(KeyCode.W))
{
return new float[] { k_Up };
}
if (Input.GetKey(KeyCode.A))
{
return new float[] { k_Left };
}
if (Input.GetKey(KeyCode.S))
{
return new float[] { k_Down };
}
return new float[] { k_NoAction };
}
m_Academy.AcademyReset();
area.AreaReset();
}
public void FixedUpdate()

24
UnitySDK/Assets/ML-Agents/Examples/Hallway/Prefabs/SymbolFinderArea.prefab


- component: {fileID: 4933884233896554}
- component: {fileID: 65639693558106190}
- component: {fileID: 54112968250075710}
- component: {fileID: 114907778469006590}
- component: {fileID: 114286701363010626}
- component: {fileID: 114569343444552314}
m_Layer: 0

m_Script: {fileID: 11500000, guid: b446afae240924105b36d07e8d17a608, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 533f2edd327794ca996d0320901b501c, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

m_Script: {fileID: 11500000, guid: bb172294dbbcc408286b156a2c4b553c, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!114 &114907778469006590
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1471560210313468}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 36
numStackedVectorObservations: 3
vectorActionSize: 05000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: b3f3b601fa5e84185862261041525ea9, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Hallway

43
UnitySDK/Assets/ML-Agents/Examples/Hallway/Prefabs/VisualSymbolFinderArea.prefab


- component: {fileID: 4291041439716878}
- component: {fileID: 65678389736547598}
- component: {fileID: 54606255118850520}
- component: {fileID: 114090834606594908}
- component: {fileID: 114065716362190190}
m_Layer: 0
m_Name: Agent
m_TagString: agent

serializedVersion: 2
m_Size: {x: 1, y: 1, z: 1}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114065716362190190
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1234267001558658}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 282f342c2ab144bf38be65d4d0c4e07d, type: 3}
m_Name:
m_EditorClassIdentifier:
camera: {fileID: 20961984019151212}
sensorName: CameraSensor
width: 84
height: 84
grayscale: 0
--- !u!114 &114090834606594908
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1234267001558658}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 0
numStackedVectorObservations: 1
vectorActionSize: 05000000
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 0}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: VisualHallway
--- !u!114 &114451776683649118
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: b446afae240924105b36d07e8d17a608, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: a36aad05c06144991a0a5e87de40d003, type: 2}
agentCameras:
- {fileID: 20961984019151212}
agentRenderTextures: []
maxStep: 3000
resetOnDone: 1
onDemandDecision: 0

79
UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/VisualHallway.unity


value:
objectReference: {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2,
type: 2}
- target: {fileID: 114516857402348526, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 32612447}
--- !u!20 &32612447 stripped
Camera:
m_PrefabParentObject: {fileID: 20309822448307506, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
m_PrefabInternal: {fileID: 32612446}
--- !u!1 &255077123
GameObject:
m_ObjectHideFlags: 0

value:
objectReference: {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2,
type: 2}
- target: {fileID: 114516857402348526, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 341018563}
--- !u!20 &341018563 stripped
Camera:
m_PrefabParentObject: {fileID: 20309822448307506, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
m_PrefabInternal: {fileID: 341018562}
--- !u!1 &365376270
GameObject:
m_ObjectHideFlags: 0

value:
objectReference: {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2,
type: 2}
- target: {fileID: 114516857402348526, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 721234460}
--- !u!20 &721234460 stripped
Camera:
m_PrefabParentObject: {fileID: 20309822448307506, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
m_PrefabInternal: {fileID: 721234459}
--- !u!1001 &977960505
Prefab:
m_ObjectHideFlags: 0

value:
objectReference: {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2,
type: 2}
- target: {fileID: 114516857402348526, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 977960506}
--- !u!20 &977960506 stripped
Camera:
m_PrefabParentObject: {fileID: 20309822448307506, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
m_PrefabInternal: {fileID: 977960505}
--- !u!1 &1257687048
GameObject:
m_ObjectHideFlags: 0

propertyPath: m_RootOrder
value: 6
objectReference: {fileID: 0}
- target: {fileID: 114451776683649118, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
propertyPath: brain
value:
objectReference: {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2,
type: 2}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: f2281a3adc3e640b490f89407c2e12d1, type: 2}
m_IsPrefabParent: 0

value:
objectReference: {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2,
type: 2}
- target: {fileID: 114516857402348526, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 1388008249}
--- !u!20 &1388008249 stripped
Camera:
m_PrefabParentObject: {fileID: 20309822448307506, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
m_PrefabInternal: {fileID: 1388008248}
--- !u!1001 &1436760868
Prefab:
m_ObjectHideFlags: 0

value:
objectReference: {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2,
type: 2}
- target: {fileID: 114516857402348526, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 1436760869}
--- !u!20 &1436760869 stripped
Camera:
m_PrefabParentObject: {fileID: 20309822448307506, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
m_PrefabInternal: {fileID: 1436760868}
--- !u!1 &1574236047
GameObject:
m_ObjectHideFlags: 0

m_Script: {fileID: 11500000, guid: 40db664a3061b46a0a0628f90b2264f7, type: 3}
m_Name:
m_EditorClassIdentifier:
broadcastHub:
brainsToControl:
- {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2, type: 2}
m_TrainingConfiguration:
width: 128
height: 128

value:
objectReference: {fileID: 11400000, guid: fe56dd72ed38a4c2fb5419aba1e2d5f2,
type: 2}
- target: {fileID: 114516857402348526, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 1746153507}
--- !u!20 &1746153507 stripped
Camera:
m_PrefabParentObject: {fileID: 20309822448307506, guid: f2281a3adc3e640b490f89407c2e12d1,
type: 2}
m_PrefabInternal: {fileID: 1746153506}
--- !u!1001 &2025898844
Prefab:
m_ObjectHideFlags: 0

57
UnitySDK/Assets/ML-Agents/Examples/Hallway/Scripts/HallwayAgent.cs


var dirToGo = Vector3.zero;
var rotateDir = Vector3.zero;
if (brain.brainParameters.vectorActionSpaceType == SpaceType.Continuous)
var action = Mathf.FloorToInt(act[0]);
switch (action)
dirToGo = transform.forward * Mathf.Clamp(act[0], -1f, 1f);
rotateDir = transform.up * Mathf.Clamp(act[1], -1f, 1f);
}
else
{
var action = Mathf.FloorToInt(act[0]);
switch (action)
{
case 1:
dirToGo = transform.forward * 1f;
break;
case 2:
dirToGo = transform.forward * -1f;
break;
case 3:
rotateDir = transform.up * 1f;
break;
case 4:
rotateDir = transform.up * -1f;
break;
}
case 1:
dirToGo = transform.forward * 1f;
break;
case 2:
dirToGo = transform.forward * -1f;
break;
case 3:
rotateDir = transform.up * 1f;
break;
case 4:
rotateDir = transform.up * -1f;
break;
}
transform.Rotate(rotateDir, Time.deltaTime * 150f);
m_AgentRb.AddForce(dirToGo * m_Academy.agentRunSpeed, ForceMode.VelocityChange);

}
Done();
}
}
public override float[] Heuristic()
{
if (Input.GetKey(KeyCode.D))
{
return new float[] { 3 };
}
if (Input.GetKey(KeyCode.W))
{
return new float[] { 1 };
}
if (Input.GetKey(KeyCode.A))
{
return new float[] { 4 };
}
if (Input.GetKey(KeyCode.S))
{
return new float[] { 2 };
}
return new float[] { 0 };
}
public override void AgentReset()

24
UnitySDK/Assets/ML-Agents/Examples/PushBlock/Prefabs/PushBlockArea.prefab


m_Component:
- component: {fileID: 4188187884171146}
- component: {fileID: 54817351390947638}
- component: {fileID: 114306175693660464}
- component: {fileID: 114505490781873732}
- component: {fileID: 114421647563711602}
- component: {fileID: 65880096262939968}

m_Name:
m_EditorClassIdentifier:
agent: {fileID: 0}
--- !u!114 &114306175693660464
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1489716781518988}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 70
numStackedVectorObservations: 3
vectorActionSize: 07000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: c60a63ad5dc0c4a029d7360054667457, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: PushBlock
--- !u!114 &114421647563711602
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: dea8c4f2604b947e6b7b97750dde87ca, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: e8b2d719f6a324b1abb68d8cf2859f5c, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

37
UnitySDK/Assets/ML-Agents/Examples/PushBlock/Prefabs/PushBlockVisualArea.prefab


m_Component:
- component: {fileID: 4456685767774680}
- component: {fileID: 54790445914364846}
- component: {fileID: 114923027571458262}
- component: {fileID: 114650520402303970}
- component: {fileID: 114505118440755634}
m_Layer: 0
m_Name: Agent
m_TagString: agent

serializedVersion: 2
m_Size: {x: 1, y: 1, z: 1}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114650520402303970
--- !u!114 &114505118440755634
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}

m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: bb172294dbbcc408286b156a2c4b553c, type: 3}
m_Script: {fileID: 11500000, guid: 282f342c2ab144bf38be65d4d0c4e07d, type: 3}
camera: {fileID: 20961401228419460}
sensorName: CameraSensor
width: 84
height: 84
grayscale: 0
--- !u!114 &114690277332619348
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: dea8c4f2604b947e6b7b97750dde87ca, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: cc62140bff6494e0399caaed0b56020d, type: 2}
agentCameras:
- {fileID: 20961401228419460}
agentRenderTextures: []
maxStep: 5000
resetOnDone: 1
onDemandDecision: 0

block: {fileID: 1609037632005304}
goalDetect: {fileID: 0}
useVectorObs: 0
--- !u!114 &114923027571458262
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1626010291821672}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 0
numStackedVectorObservations: 1
vectorActionSize: 07000000
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 0}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: VisualHallway

59
UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/VisualPushBlock.unity


value:
objectReference: {fileID: 11400000, guid: d359d2290a825421e930c94284994e3f,
type: 2}
- target: {fileID: 114024228081418500, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 116640260}
--- !u!20 &116640260 stripped
Camera:
m_PrefabParentObject: {fileID: 20961401228419460, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
m_PrefabInternal: {fileID: 116640259}
--- !u!1 &255077123
GameObject:
m_ObjectHideFlags: 0

value:
objectReference: {fileID: 11400000, guid: d359d2290a825421e930c94284994e3f,
type: 2}
- target: {fileID: 114024228081418500, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 731659952}
--- !u!20 &731659952 stripped
Camera:
m_PrefabParentObject: {fileID: 20961401228419460, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
m_PrefabInternal: {fileID: 731659951}
--- !u!1 &762086410
GameObject:
m_ObjectHideFlags: 0

value:
objectReference: {fileID: 11400000, guid: d359d2290a825421e930c94284994e3f,
type: 2}
- target: {fileID: 114024228081418500, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 912811241}
--- !u!20 &912811241 stripped
Camera:
m_PrefabParentObject: {fileID: 20961401228419460, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
m_PrefabInternal: {fileID: 912811240}
--- !u!1 &1009000883
GameObject:
m_ObjectHideFlags: 0

m_Script: {fileID: 11500000, guid: a2ca406dad5ec4ede8184998f4f9067d, type: 3}
m_Name:
m_EditorClassIdentifier:
broadcastHub:
brainsToControl:
- {fileID: 11400000, guid: d359d2290a825421e930c94284994e3f, type: 2}
m_TrainingConfiguration:
width: 1280
height: 720

value:
objectReference: {fileID: 11400000, guid: d359d2290a825421e930c94284994e3f,
type: 2}
- target: {fileID: 114024228081418500, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 1878756100}
--- !u!20 &1878756100 stripped
Camera:
m_PrefabParentObject: {fileID: 20961401228419460, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
m_PrefabInternal: {fileID: 1878756099}
--- !u!1001 &1942601654
Prefab:
m_ObjectHideFlags: 0

value:
objectReference: {fileID: 11400000, guid: d359d2290a825421e930c94284994e3f,
type: 2}
- target: {fileID: 114024228081418500, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 1942601655}
--- !u!20 &1942601655 stripped
Camera:
m_PrefabParentObject: {fileID: 20961401228419460, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
m_PrefabInternal: {fileID: 1942601654}
--- !u!1001 &1954420364
Prefab:
m_ObjectHideFlags: 0

propertyPath: m_RootOrder
value: 6
objectReference: {fileID: 0}
- target: {fileID: 114812843792483960, guid: 9d9b85a2a80e74e5294bdfb248825335,
type: 2}
propertyPath: brain
value:
objectReference: {fileID: 11400000, guid: d359d2290a825421e930c94284994e3f,
type: 2}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 9d9b85a2a80e74e5294bdfb248825335, type: 2}
m_IsPrefabParent: 0

21
UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scripts/PushAgentBasic.cs


AddReward(-1f / agentParameters.maxStep);
}
public override float[] Heuristic()
{
if (Input.GetKey(KeyCode.D))
{
return new float[] { 3 };
}
if (Input.GetKey(KeyCode.W))
{
return new float[] { 1 };
}
if (Input.GetKey(KeyCode.A))
{
return new float[] { 4 };
}
if (Input.GetKey(KeyCode.S))
{
return new float[] { 2 };
}
return new float[] { 0 };
}
/// <summary>
/// Resets the block position and velocities.
/// </summary>

24
UnitySDK/Assets/ML-Agents/Examples/Pyramids/Prefabs/AreaPB.prefab


m_Component:
- component: {fileID: 4518417139497368}
- component: {fileID: 54596704247224538}
- component: {fileID: 114399412043818042}
- component: {fileID: 114937736047215868}
- component: {fileID: 114507422577425370}
- component: {fileID: 65345930959735878}

- {fileID: 1589816231338102}
numPyra: 1
range: 45
--- !u!114 &114399412043818042
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1131043459059966}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 172
numStackedVectorObservations: 1
vectorActionSize: 05000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: 9bafa731bfcbc4f0faa73c365e7af924, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Pyramids
--- !u!114 &114507422577425370
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: b8db44472779248d3be46895c4d562d5, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 7b7715ed1d436417db67026a47f17576, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

55
UnitySDK/Assets/ML-Agents/Examples/Pyramids/Prefabs/VisualAreaPyramids.prefab


m_Component:
- component: {fileID: 4464253672231148}
- component: {fileID: 54125904932801864}
- component: {fileID: 114722927650955174}
- component: {fileID: 114027965503222182}
- component: {fileID: 114674665608406760}
m_Layer: 0
m_Name: Agent
m_TagString: agent

serializedVersion: 2
m_Size: {x: 1450.4971, y: 985.00024, z: 100}
m_Center: {x: -575.24854, y: 292.50012, z: -3144.2273}
--- !u!114 &114027965503222182
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1736680821577442}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: bb172294dbbcc408286b156a2c4b553c, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!114 &114404304054259594
MonoBehaviour:
m_ObjectHideFlags: 1

- {fileID: 1625610554007742}
numPyra: 1
range: 45
--- !u!114 &114674665608406760
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1736680821577442}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 282f342c2ab144bf38be65d4d0c4e07d, type: 3}
m_Name:
m_EditorClassIdentifier:
camera: {fileID: 20712684238256298}
sensorName: CameraSensor
width: 84
height: 84
grayscale: 0
--- !u!114 &114722927650955174
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1736680821577442}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 0
numStackedVectorObservations: 1
vectorActionSize: 05000000
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 0}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: VisualPyramids
--- !u!114 &114741503533626942
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: b8db44472779248d3be46895c4d562d5, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 60f0ffcd08c3b43a6bdc746cfc0c4059, type: 2}
agentCameras:
- {fileID: 20712684238256298}
agentRenderTextures: []
maxStep: 5000
resetOnDone: 1
onDemandDecision: 0

73
UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scenes/VisualPyramids.unity


m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 0567215293abe487b932aec366b57c8e, type: 2}
m_IsPrefabParent: 0
--- !u!20 &177604012 stripped
Camera:
m_PrefabParentObject: {fileID: 20712684238256298, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
m_PrefabInternal: {fileID: 309299717}
--- !u!1001 &281839921
Prefab:
m_ObjectHideFlags: 0

propertyPath: m_IsActive
value: 0
objectReference: {fileID: 0}
- target: {fileID: 114538851081060382, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 973199703}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 0567215293abe487b932aec366b57c8e, type: 2}
m_IsPrefabParent: 0

propertyPath: m_IsActive
value: 0
objectReference: {fileID: 0}
- target: {fileID: 114538851081060382, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 177604012}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 0567215293abe487b932aec366b57c8e, type: 2}
m_IsPrefabParent: 0

propertyPath: m_IsActive
value: 0
objectReference: {fileID: 0}
- target: {fileID: 114538851081060382, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 1529303581}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 0567215293abe487b932aec366b57c8e, type: 2}
m_IsPrefabParent: 0

propertyPath: m_IsActive
value: 0
objectReference: {fileID: 0}
- target: {fileID: 114538851081060382, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 816767823}
--- !u!20 &816767823 stripped
Camera:
m_PrefabParentObject: {fileID: 20712684238256298, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
m_PrefabInternal: {fileID: 816767822}
--- !u!20 &828837071 stripped
Camera:
m_PrefabParentObject: {fileID: 20712684238256298, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
m_PrefabInternal: {fileID: 1728325040}
--- !u!20 &973199703 stripped
Camera:
m_PrefabParentObject: {fileID: 20712684238256298, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
m_PrefabInternal: {fileID: 281839921}
--- !u!1 &1009000883
GameObject:
m_ObjectHideFlags: 0

m_OcclusionCulling: 1
m_StereoConvergence: 10
m_StereoSeparation: 0.022
--- !u!20 &1074152210 stripped
Camera:
m_PrefabParentObject: {fileID: 20712684238256298, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
m_PrefabInternal: {fileID: 1818326666}
--- !u!1001 &1155497957
Prefab:
m_ObjectHideFlags: 0

propertyPath: m_IsActive
value: 0
objectReference: {fileID: 0}
- target: {fileID: 114538851081060382, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 1194295937}
--- !u!20 &1194295937 stripped
Camera:
m_PrefabParentObject: {fileID: 20712684238256298, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
m_PrefabInternal: {fileID: 1155497957}
--- !u!20 &1529303581 stripped
Camera:
m_PrefabParentObject: {fileID: 20712684238256298, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
m_PrefabInternal: {fileID: 714012435}
--- !u!1 &1574236047
GameObject:
m_ObjectHideFlags: 0

m_Script: {fileID: 11500000, guid: dba8df9c8b16946dc88d331a301d0ab3, type: 3}
m_Name:
m_EditorClassIdentifier:
broadcastHub:
brainsToControl:
- {fileID: 11400000, guid: 60f0ffcd08c3b43a6bdc746cfc0c4059, type: 2}
m_TrainingConfiguration:
width: 80
height: 80

propertyPath: m_IsActive
value: 0
objectReference: {fileID: 0}
- target: {fileID: 114538851081060382, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 828837071}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 0567215293abe487b932aec366b57c8e, type: 2}
m_IsPrefabParent: 0

propertyPath: m_IsActive
value: 0
objectReference: {fileID: 0}
- target: {fileID: 114538851081060382, guid: 0567215293abe487b932aec366b57c8e,
type: 2}
propertyPath: camera
value:
objectReference: {fileID: 1074152210}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 0567215293abe487b932aec366b57c8e, type: 2}
m_IsPrefabParent: 0

65
UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scripts/PyramidAgent.cs


if (useVectorObs)
{
const float rayDistance = 35f;
float[] rayAngles = {20f, 90f, 160f, 45f, 135f, 70f, 110f};
float[] rayAngles1 = {25f, 95f, 165f, 50f, 140f, 75f, 115f};
float[] rayAngles2 = {15f, 85f, 155f, 40f, 130f, 65f, 105f};
float[] rayAngles = { 20f, 90f, 160f, 45f, 135f, 70f, 110f };
float[] rayAngles1 = { 25f, 95f, 165f, 50f, 140f, 75f, 115f };
float[] rayAngles2 = { 15f, 85f, 155f, 40f, 130f, 65f, 105f };
string[] detectableObjects = {"block", "wall", "goal", "switchOff", "switchOn", "stone"};
string[] detectableObjects = { "block", "wall", "goal", "switchOff", "switchOn", "stone" };
AddVectorObs(m_RayPer.Perceive(rayDistance, rayAngles, detectableObjects, 0f, 0f));
AddVectorObs(m_RayPer.Perceive(rayDistance, rayAngles1, detectableObjects, 0f, 5f));
AddVectorObs(m_RayPer.Perceive(rayDistance, rayAngles2, detectableObjects, 0f, 10f));

var dirToGo = Vector3.zero;
var rotateDir = Vector3.zero;
if (brain.brainParameters.vectorActionSpaceType == SpaceType.Continuous)
var action = Mathf.FloorToInt(act[0]);
switch (action)
dirToGo = transform.forward * Mathf.Clamp(act[0], -1f, 1f);
rotateDir = transform.up * Mathf.Clamp(act[1], -1f, 1f);
}
else
{
var action = Mathf.FloorToInt(act[0]);
switch (action)
{
case 1:
dirToGo = transform.forward * 1f;
break;
case 2:
dirToGo = transform.forward * -1f;
break;
case 3:
rotateDir = transform.up * 1f;
break;
case 4:
rotateDir = transform.up * -1f;
break;
}
case 1:
dirToGo = transform.forward * 1f;
break;
case 2:
dirToGo = transform.forward * -1f;
break;
case 3:
rotateDir = transform.up * 1f;
break;
case 4:
rotateDir = transform.up * -1f;
break;
}
transform.Rotate(rotateDir, Time.deltaTime * 200f);
m_AgentRb.AddForce(dirToGo * 2f, ForceMode.VelocityChange);

{
AddReward(-1f / agentParameters.maxStep);
MoveAgent(vectorAction);
}
public override float[] Heuristic()
{
if (Input.GetKey(KeyCode.D))
{
return new float[] { 3 };
}
if (Input.GetKey(KeyCode.W))
{
return new float[] { 1 };
}
if (Input.GetKey(KeyCode.A))
{
return new float[] { 4 };
}
if (Input.GetKey(KeyCode.S))
{
return new float[] { 2 };
}
return new float[] { 0 };
}
public override void AgentReset()

24
UnitySDK/Assets/ML-Agents/Examples/Reacher/Prefabs/Agent.prefab


serializedVersion: 5
m_Component:
- component: {fileID: 4067321601414524}
- component: {fileID: 114731167133171590}
- component: {fileID: 114955921823023820}
m_Layer: 0
m_Name: Agent

m_Interpolate: 0
m_Constraints: 0
m_CollisionDetection: 0
--- !u!114 &114731167133171590
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1395682910799436}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 33
numStackedVectorObservations: 1
vectorActionSize: 04000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: 0c779bd93060f405cbe4446e1dcbf2a6, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Reacher
--- !u!114 &114928491800121992
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: 220b156e3b142406c8b76d4db981d044, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: aee5a4acc5804447682bf509557afa4f, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

9
UnitySDK/Assets/ML-Agents/Examples/SharedAssets/Scripts/RayPerception.cs


using System.Collections.Generic;
using System.Collections.Generic;
using UnityEngine;
public abstract class RayPerception : MonoBehaviour

public virtual List<float> Perceive(float rayDistance,
abstract public List<float> Perceive(float rayDistance,
float startOffset, float endOffset)
{
return m_PerceptionBuffer;
}
float startOffset=0.0f, float endOffset=0.0f);
/// <summary>
/// Converts degrees to radians.

9
UnitySDK/Assets/ML-Agents/Examples/SharedAssets/Scripts/RayPerception2D.cs


using System.Collections.Generic;
using System.Collections.Generic;
using UnityEngine;
namespace MLAgents

/// <param name="rayDistance">Radius of rays</param>
/// <param name="rayAngles">Angles of rays (starting from (1,0) on unit circle).</param>
/// <param name="detectableObjects">List of tags which correspond to object types agent can see</param>
public List<float> Perceive(float rayDistance,
float[] rayAngles, string[] detectableObjects)
/// <param name="startOffset">Unused</param>
/// <param name="endOffset">Unused</param>
public override List<float> Perceive(float rayDistance,
float[] rayAngles, string[] detectableObjects,
float startOffset=0.0f, float endOffset=0.0f)
{
m_PerceptionBuffer.Clear();
// For each ray sublist stores categorical information on detected object

4
UnitySDK/Assets/ML-Agents/Examples/SharedAssets/Scripts/RayPerception3D.cs


using System;
using System;
using System.Collections.Generic;
using UnityEngine;

/// <param name="endOffset">Ending height offset of ray from center of agent.</param>
public override List<float> Perceive(float rayDistance,
float[] rayAngles, string[] detectableObjects,
float startOffset, float endOffset)
float startOffset=0.0f, float endOffset=0.0f)
{
if (m_SubList == null || m_SubList.Length != detectableObjects.Length + 2)
m_SubList = new float[detectableObjects.Length + 2];

96
UnitySDK/Assets/ML-Agents/Examples/Soccer/Prefabs/SoccerFieldTwos.prefab


- component: {fileID: 4277721046484044}
- component: {fileID: 54348679551516588}
- component: {fileID: 135232974003521068}
- component: {fileID: 114734187185382186}
- component: {fileID: 114492261207303438}
- component: {fileID: 114692966630797794}
m_Layer: 13

- component: {fileID: 4485793831109164}
- component: {fileID: 54250052574815742}
- component: {fileID: 135154818167532598}
- component: {fileID: 114105115387635628}
- component: {fileID: 114698199869072806}
- component: {fileID: 114381244552195858}
m_Layer: 11

- component: {fileID: 4444285537983296}
- component: {fileID: 54609996481602788}
- component: {fileID: 135208952479003512}
- component: {fileID: 114387866097048300}
- component: {fileID: 114850431417842684}
- component: {fileID: 114965771318032104}
m_Layer: 13

- component: {fileID: 4002186104597906}
- component: {fileID: 54629836435839708}
- component: {fileID: 135133947297127334}
- component: {fileID: 114529615399004778}
- component: {fileID: 114284769194328828}
- component: {fileID: 114724674330921748}
m_Layer: 11

serializedVersion: 2
m_Size: {x: 30, y: 0.1, z: 16}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114105115387635628
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1124213441168130}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 112
numStackedVectorObservations: 1
vectorActionSize: 05000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 0}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Goalie
--- !u!114 &114273807544954564
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: 2a2688ef4a36349f9aa010020c32d198, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 090fa5a8588f5433bb7f878e6f5ac954, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

m_Script: {fileID: 11500000, guid: bb172294dbbcc408286b156a2c4b553c, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!114 &114387866097048300
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1131626411948014}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 112
numStackedVectorObservations: 1
vectorActionSize: 07000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 0}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Striker
--- !u!114 &114492261207303438
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: 2a2688ef4a36349f9aa010020c32d198, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 29ed78b3e8fef4340b3a1f6954b88f18, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

agentRole: 0
area: {fileID: 114559182131992928}
agentRb: {fileID: 0}
--- !u!114 &114529615399004778
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1890219402901316}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 112
numStackedVectorObservations: 1
vectorActionSize: 05000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 0}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Goalie
--- !u!114 &114559182131992928
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: 2a2688ef4a36349f9aa010020c32d198, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 090fa5a8588f5433bb7f878e6f5ac954, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

m_Script: {fileID: 11500000, guid: bb172294dbbcc408286b156a2c4b553c, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!114 &114734187185382186
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1095606497496374}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 112
numStackedVectorObservations: 1
vectorActionSize: 07000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 0}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Striker
--- !u!114 &114850431417842684
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: 2a2688ef4a36349f9aa010020c32d198, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 29ed78b3e8fef4340b3a1f6954b88f18, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

48
UnitySDK/Assets/ML-Agents/Examples/Tennis/Prefabs/TennisArea.prefab


- component: {fileID: 23050935508163814}
- component: {fileID: 54815576193067388}
- component: {fileID: 65276341973995358}
- component: {fileID: 114176423636690854}
- component: {fileID: 114915946461826994}
m_Layer: 0
m_Name: AgentA

- component: {fileID: 23268445935516234}
- component: {fileID: 54459681652844648}
- component: {fileID: 65280384434867516}
- component: {fileID: 114399072728845634}
- component: {fileID: 114800310164848628}
m_Layer: 0
m_Name: AgentB

serializedVersion: 2
m_Size: {x: 0.5, y: 8, z: 11}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114176423636690854
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1170495812642400}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 8
numStackedVectorObservations: 3
vectorActionSize: 02000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: c85010de4f32e4c88bac16d9688aaadc, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Tennis
--- !u!114 &114399072728845634
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1882383181950958}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 8
numStackedVectorObservations: 3
vectorActionSize: 02000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: c85010de4f32e4c88bac16d9688aaadc, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Tennis
--- !u!114 &114800310164848628
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: e51a3fb0b3186433ea84fc1e0549cc91, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 1674996276be448c2ad51fb139e21e05, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

m_Script: {fileID: 11500000, guid: e51a3fb0b3186433ea84fc1e0549cc91, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 1674996276be448c2ad51fb139e21e05, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

9
UnitySDK/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs


m_TextComponent.text = score.ToString();
}
public override float[] Heuristic()
{
var action = new float[2];
action[0] = Input.GetAxis("Horizontal");
action[1] = Input.GetKey(KeyCode.Space) ? 1f : 0f;
return action;
}
public override void AgentReset()
{
m_InvertMult = invertX ? -1f : 1f;

24
UnitySDK/Assets/ML-Agents/Examples/Walker/Prefabs/WalkerPair.prefab


serializedVersion: 5
m_Component:
- component: {fileID: 4821824385666130}
- component: {fileID: 114052351078996708}
- component: {fileID: 114363722412740164}
- component: {fileID: 114614375190687060}
m_Layer: 0

serializedVersion: 2
m_Size: {x: 1, y: 1, z: 1}
m_Center: {x: 0, y: 0, z: 0}
--- !u!114 &114052351078996708
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1800913799254612}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 212
numStackedVectorObservations: 1
vectorActionSize: 27000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 1
m_Model: {fileID: 11400000, guid: 693a2a44fd7c64d3ca80d7444f782520, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: Walker
--- !u!114 &114110225517277148
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: ccb0f85f0009540d7ad997952e2aed7b, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 3541a9a488cf54088a4526cff85512cc, type: 2}
agentParameters:
agentCameras: []
agentRenderTextures: []

30
UnitySDK/Assets/ML-Agents/Examples/WallJump/Prefabs/WallJumpArea.prefab


- component: {fileID: 4651390251185036}
- component: {fileID: 65193133000831296}
- component: {fileID: 54678503543725326}
- component: {fileID: 114898893333200490}
- component: {fileID: 114925928594762506}
- component: {fileID: 114092229367912210}
m_Layer: 0

m_Script: {fileID: 11500000, guid: bb172294dbbcc408286b156a2c4b553c, type: 3}
m_Name:
m_EditorClassIdentifier:
--- !u!114 &114898893333200490
MonoBehaviour:
m_ObjectHideFlags: 1
m_PrefabParentObject: {fileID: 0}
m_PrefabInternal: {fileID: 100100000}
m_GameObject: {fileID: 1195095783991828}
m_Enabled: 1
m_EditorHideFlags: 0
m_Script: {fileID: 11500000, guid: 5d1c4e0b1822b495aa52bc52839ecb30, type: 3}
m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 74
numStackedVectorObservations: 6
vectorActionSize: 03000000030000000300000002000000
cameraResolutions: []
vectorActionDescriptions: []
vectorActionSpaceType: 0
m_Model: {fileID: 11400000, guid: ef4a2c4f314e94d718e08c7c71b3c5f0, type: 3}
m_InferenceDevice: 0
m_UseHeuristic: 0
m_BehaviorName: SmallWallJump
--- !u!114 &114925928594762506
MonoBehaviour:
m_ObjectHideFlags: 1

m_Script: {fileID: 11500000, guid: 676fca959b8ee45539773905ca71afa1, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 0}
agentParameters:
agentCameras: []
agentRenderTextures: []

numberOfActionsBetweenDecisions: 5
noWallBrain: {fileID: 11400000, guid: 2069d6ef649a549feb29054d6af8a86f, type: 2}
smallWallBrain: {fileID: 11400000, guid: 2069d6ef649a549feb29054d6af8a86f, type: 2}
bigWallBrain: {fileID: 11400000, guid: b5f530c5bf8d64bf8a18df92e283bb9c, type: 2}
noWallBrain: {fileID: 11400000, guid: ef4a2c4f314e94d718e08c7c71b3c5f0, type: 3}
smallWallBrain: {fileID: 11400000, guid: ef4a2c4f314e94d718e08c7c71b3c5f0, type: 3}
bigWallBrain: {fileID: 11400000, guid: b036370dc05b9481bbcee7db40d40b5d, type: 3}
ground: {fileID: 1324926338613664}
spawnArea: {fileID: 1886170194660384}
goal: {fileID: 1982078136115924}

8
UnitySDK/Assets/ML-Agents/Examples/WallJump/Scenes/WallJump.unity


propertyPath: m_Name
value: Canvas - Watermark
objectReference: {fileID: 0}
- target: {fileID: 1537641056927260, guid: 3ce107b4a79bc4eef83afde434932a68, type: 2}
propertyPath: m_IsActive
value: 1
objectReference: {fileID: 0}
m_RemovedComponents: []
m_ParentPrefab: {fileID: 100100000, guid: 3ce107b4a79bc4eef83afde434932a68, type: 2}
m_IsPrefabParent: 0

m_Script: {fileID: 11500000, guid: 50b93afe82bc647b581a706891913e7f, type: 3}
m_Name:
m_EditorClassIdentifier:
broadcastHub:
brainsToControl:
- {fileID: 11400000, guid: 2069d6ef649a549feb29054d6af8a86f, type: 2}
- {fileID: 11400000, guid: b5f530c5bf8d64bf8a18df92e283bb9c, type: 2}
m_TrainingConfiguration:
width: 80
height: 80

37
UnitySDK/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs


using System.Collections;
using UnityEngine;
using MLAgents;
using Barracuda;
public class WallJumpAgent : Agent
{

public Brain noWallBrain;
public NNModel noWallBrain;
public Brain smallWallBrain;
public NNModel smallWallBrain;
public Brain bigWallBrain;
public NNModel bigWallBrain;
public GameObject ground;
public GameObject spawnArea;

}
}
public override float[] Heuristic()
{
var action = new float[4];
if (Input.GetKey(KeyCode.D))
{
action[1] = 2f;
}
if (Input.GetKey(KeyCode.W))
{
action[0] = 1f;
}
if (Input.GetKey(KeyCode.A))
{
action[1] = 1f;
}
if (Input.GetKey(KeyCode.S))
{
action[0] = 2f;
}
action[3] = Input.GetKey(KeyCode.Space) ? 1.0f : 0.0f;
return action;
}
// Detect when the agent hits the goal
void OnTriggerStay(Collider col)
{

m_Academy.resetParameters["no_wall_height"],
localScale.z);
wall.transform.localScale = localScale;
GiveBrain(noWallBrain);
GiveModel("SmallWallJump", noWallBrain);
}
else if (config == 1)
{

localScale.z);
wall.transform.localScale = localScale;
GiveBrain(smallWallBrain);
GiveModel("SmallWallJump", smallWallBrain);
}
else
{

height,
localScale.z);
wall.transform.localScale = localScale;
GiveBrain(bigWallBrain);
GiveModel("BigWallJump", bigWallBrain);
}
}
}

2
UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/Barracuda.md


Tanh
```
P.S. some of these operations are under limited support and not all configurations are properly supported
P.S. some of these operations are under limited support and not all configurations are properly supported
P.P.S. Python 3.5 or 3.6 is recommended

2
UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/LICENSE.md


Barracuda cross-platform Neural Net engine copyright © 2018 Unity Technologies ApS
Licensed under the Unity Companion License for Unity-dependent projects--see [Unity Companion License](http://www.unity3d.com/legal/licenses/Unity_Companion_License).
Licensed under the Unity Companion License for Unity-dependent projects--see [Unity Companion License](http://www.unity3d.com/legal/licenses/Unity_Companion_License).
Unless expressly provided otherwise, the Software under this license is made available strictly on an “AS IS” BASIS WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. Please review the license for details on these and other terms and conditions.

10
UnitySDK/Assets/ML-Agents/Plugins/Barracuda.Core/ReleaseNotes.md


- TF importer: made detection of actual output node from LSTM/GRU pattern more bullet proof by skipping Const nodes.
- TF importer: improved InstanceNormalization handling.
- TF importer: fixed SquareDifference pattern.
- TF importer: fixed Conv2DBackpropInput (transpose convolution) import.
- TF importer: fixed Conv2DBackpropInput (transpose convolution) import.
- Fixed Conv2D performance regression on some GPUs.
- Fixed TextureAsTensorData.Download() to work properly with InterpretDepthAs.Channels.
- Fixed bug when identity/nop layers would reuse input as an output and later causing premature release of that tensor as part of intermediate data cleanup.

## 0.2.0
- Version bumped to 0.2.0 as it brings breaking API changes, for details look below.
- Version bumped to 0.2.0 as it brings breaking API changes, for details look below.
- Significantly reduced temporary memory allocations by introducing internal allocator support. Now memory is re-used between layer execution as much as possible.
- Improved small workload performance on CSharp backend
- Added parallel implementation for multiple activation functions on CSharp backend

- Added `Summary()` method to `Worker`. Currently returns allocator information.
- Tabs to spaces! Aiming at higher salary (https://stackoverflow.blog/2017/06/15/developers-use-spaces-make-money-use-tabs/).
- Renamed worker type enum members: `CSharp` -> `CSharpRef`, `CSharpFast` -> `CSharp`, `Compute` -> `ComputeRef`, `ComputeFast` -> `Compute`.
- Implemented new optimized `ComputePrecompiled` worker. This worker caches Compute kernels and state beforehand to reduce CPU overhead.
- Implemented new optimized `ComputePrecompiled` worker. This worker caches Compute kernels and state beforehand to reduce CPU overhead.
- Added `ExecuteAsync()` to `IWorker` interface, it returns `IEnumerator`, which enables you to control how many layers to schedule per frame (one iteration == one layer).
- Added `Log` op support on Compute workers.
- Optimized activation functions and ScaleBias by accessing tensor as continuous array. Gained ~2.0ms on 4 batch MobileNet (MBP2016).

- Fixed compilation issues on Xbox One.
- TexConv2D support was temporary disabled.
- Barracuda logging now can be configured via static fields of ``Barracuda.D`` class, it allows both disable specific logging levels or just disable stack trace collection (helps with performance when profiling).
- Compute Concat implementation now will fall back to C# implementation instead of throwing exception when unsupported configuration is encountered.
- Fixed several ``ComputeBuffer`` release issues.
- Compute Concat implementation now will fall back to C# implementation instead of throwing exception when unsupported configuration is encountered.
- Fixed several ``ComputeBuffer`` release issues.
- Added constructor for ``Tensor`` that allows to pass in data array.
- Improved Flatten handling in TensorFlow models.
- Added helper func ``ModelLoader.LoadFromStreamingAssets``.

120
UnitySDK/Assets/ML-Agents/Scripts/Academy.cs


using UnityEngine;
using System.Linq;
using System.Collections.Generic;
using MLAgents.InferenceBrain;
using Barracuda;
* The ML-Agents toolkit contains five entities: Academy, Brain, Agent, Communicator and
* Python API. The academy, and all its brains and connected agents live within
* a learning environment (herin called Environment), while the communicator
* The ML-Agents toolkit contains four entities: Academy, Agent, Communicator and
* Python API. The academy and connected agents live within
* a learning environment (herein called Environment), while the communicator
* manages the communication between the learning environment and the Python
* API. For more information on each of these entities, in addition to how to
* set-up a learning environment and train the behavior of characters in a

}
/// <summary>
/// An Academy is where Agent objects go to train their behaviors. More
/// specifically, an academy is a collection of Brain objects and each agent
/// in a scene is attached to one brain (a single brain may be attached to
/// multiple agents). Currently, this class is expected to be extended to
/// An Academy is where Agent objects go to train their behaviors.
/// Currently, this class is expected to be extended to
/// implement the desired academy behavior.
/// </summary>
/// <remarks>

/// the states and observations of each agent are sent through the
/// communicator. In the absence of a communicator, the academy is run in
/// inference mode where the agent behavior is determined by the brain
/// attached to it (which may be internal, heuristic or player).
/// inference mode where the agent behavior is determined by the Policy
/// attached to it.
/// </remarks>
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/master/" +
"docs/Learning-Environment-Design-Academy.md")]

/// </summary>
/// <remarks>
/// Default reset parameters are specified in the academy Editor, and can
/// be modified when training with an external Brain by passing a config
/// be modified when training by passing a config
/// dictionary at reset.
/// </remarks>
[SerializeField]

/// <returns>
/// <c>true</c>, if communicator is on, <c>false</c> otherwise.
/// </returns>
bool IsCommunicatorOn
public bool IsCommunicatorOn
{
get { return Communicator != null; }
}

/// or absence of a communicator. Furthermore, it can be modified by an
/// external Brain during reset via <see cref="SetIsInference"/>.
/// or absence of a communicator. Furthermore, it can be modified during
/// training via <see cref="SetIsInference"/>.
bool m_IsInference = true;
/// The number of episodes completed by the environment. Incremented

int m_TotalStepCount;
/// Flag that indicates whether the inference/training mode of the
/// environment was switched by the external Brain. This impacts the
/// environment was switched by the training process. This impacts the
private bool m_Initialized;
private List<ModelRunner> m_ModelRunners = new List<ModelRunner>();
// The Academy uses a series of events to communicate with agents and
// brains to facilitate synchronization. More specifically, it ensure
// The Academy uses a series of events to communicate with agents
// to facilitate synchronization. More specifically, it ensure
// Signals to all the Brains at each environment step so they can decide
// actions for their agents.
public event System.Action BrainDecideAction;
// Signals to all the Agents at each environment step so they can use
// their Policy to decide on their next action.
public event System.Action DecideAction;
// Signals to all the listeners that the academy is being destroyed
public event System.Action DestroyAction;

public event System.Action AgentResetIfDone;
// Signals to all the agents at each environment step so they can send
// their state to their Brain if they have requested a decision.
// their state to their Policy if they have requested a decision.
public event System.Action AgentSendState;
// Signals to all the agents at each environment step so they can act if

/// </summary>
void Awake()
{
InitializeEnvironment();
LazyInitialization();
}
public void LazyInitialization()
{
if (!m_Initialized)
{
InitializeEnvironment();
m_Initialized = true;
}
}
// Used to read Python-provided environment parameters

Communicator = null;
}
if (Communicator != null){
if (Communicator != null)
{
Communicator.QuitCommandReceived += OnQuitCommandReceived;
Communicator.ResetCommandReceived += OnResetCommand;
Communicator.RLInputReceived += OnRLInputReceived;

SetIsInference(!IsCommunicatorOn);
BrainDecideAction += () => {};
DestroyAction += () => {};
AgentSetStatus += i => {};
AgentResetIfDone += () => {};
AgentSendState += () => {};
AgentAct += () => {};
AgentForceReset += () => {};
DecideAction += () => { };
DestroyAction += () => { };
AgentSetStatus += i => { };
AgentResetIfDone += () => { };
AgentSendState += () => { };
AgentAct += () => { };
AgentForceReset += () => { };
ConfigureEnvironment();
}

void ForcedFullReset()
{
EnvironmentReset();
AgentForceReset();
AgentForceReset?.Invoke();
/// Performs a single environment update to the Academy, Brain and Agent
/// Performs a single environment update to the Academy, and Agent
/// objects within the environment.
/// </summary>
void EnvironmentStep()

ForcedFullReset();
}
AgentSetStatus(m_StepCount);
AgentSetStatus?.Invoke(m_StepCount);
AgentResetIfDone();
AgentResetIfDone?.Invoke();
AgentSendState();
AgentSendState?.Invoke();
using (TimerStack.Instance.Scoped("BrainDecideAction"))
using (TimerStack.Instance.Scoped("DecideAction"))
BrainDecideAction();
DecideAction?.Invoke();
}
using (TimerStack.Instance.Scoped("AcademyStep"))

using (TimerStack.Instance.Scoped("AgentAct"))
{
AgentAct();
AgentAct?.Invoke();
}
m_StepCount += 1;

}
/// <summary>
/// Creates or retrieves an existing ModelRunner that uses the same
/// NNModel and the InferenceDevice as provided.
/// </summary>
/// <param name="model"> The NNModel the ModelRunner must use </param>
/// <param name="brainParameters"> The brainParameters used to create
/// the ModelRunner </param>
/// <param name="inferenceDevice"> The inference device (CPU or GPU)
/// the ModelRunner will use </param>
/// <returns> The ModelRunner compatible with the input settings</returns>
public ModelRunner GetOrCreateModelRunner(
NNModel model, BrainParameters brainParameters, InferenceDevice inferenceDevice)
{
var modelRunner = m_ModelRunners.Find(x => x.HasModel(model, inferenceDevice));
if (modelRunner == null)
{
modelRunner = new ModelRunner(
model, brainParameters, inferenceDevice);
m_ModelRunners.Add(modelRunner);
}
return modelRunner;
}
/// <summary>
/// Cleanup function
/// </summary>
protected virtual void OnDestroy()

Time.maximumDeltaTime = m_OriginalMaximumDeltaTime;
// Signal to listeners that the academy is being destroyed now
DestroyAction();
DestroyAction?.Invoke();
foreach (var mr in m_ModelRunners)
{
mr.Dispose();
}
// TODO - Pass worker ID or some other identifier,
// so that multiple envs won't overwrite each others stats.

316
UnitySDK/Assets/ML-Agents/Scripts/Agent.cs


using System.Collections.Generic;
using UnityEngine;
using Barracuda;
using MLAgents.Sensor;
namespace MLAgents

public List<float> stackedVectorObservation;
/// <summary>
/// Most recent agent camera (i.e. texture) observation.
/// Most recent compressed observations.
public List<Texture2D> visualObservations;
public List<CompressedObservation> compressedObservations;
/// <summary>
/// Most recent text observation.

/// TODO(cgoy): All references to protobuf objects should be removed.
/// </summary>
public CommunicatorObjects.CustomObservationProto customObservation;
/// <summary>
/// Remove the visual observations from memory. Call at each timestep
/// to avoid memory leaks.
/// </summary>
public void ClearVisualObs()
{
foreach (var obs in visualObservations)
{
Object.Destroy(obs);
}
visualObservations.Clear();
}
}
/// <summary>

public class AgentParameters
{
/// <summary>
/// The list of the Camera GameObjects the agent uses for visual
/// observations.
/// </summary>
public List<Camera> agentCameras = new List<Camera>();
/// <summary>
/// The list of the RenderTextures the agent uses for visual
/// observations.
/// </summary>
public List<RenderTexture> agentRenderTextures = new List<RenderTexture>();
/// <summary>
/// The maximum number of steps the agent takes before being done.
/// </summary>
/// <remarks>

/// environment. Observations are determined by the cameras attached
/// to the agent in addition to the vector observations implemented by the
/// user in <see cref="CollectObservations"/>. On the other hand, actions
/// are determined by decisions produced by a linked Brain. Currently, this
/// are determined by decisions produced by a Policy. Currently, this
/// linked brain and in return receives an action from its brain. In practice,
/// policy and in return receives an action. In practice,
/// little may have changed between sucessive steps. Currently, how often an
/// agent updates its brain with a fresh observation is determined by the
/// Academy.
/// little may have changed between successive steps.
///
/// At any step, an agent may be considered <see cref="m_Done"/>.
/// This could occur due to a variety of reasons:

/// set to a value larger than the academy max steps value, then the academy
/// value takes precedence (since the agent max step will never be reached).
///
/// Lastly, note that at any step the brain linked to the agent is allowed to
/// change programmatically with <see cref="GiveBrain"/>.
/// Lastly, note that at any step the policy to the agent is allowed to
/// change model with <see cref="GiveModel"/>.
///
/// Implementation-wise, it is required that this class is extended and the
/// virtual methods overridden. For sample implementations of agent behavior,

"docs/Learning-Environment-Design-Agents.md")]
[System.Serializable]
[RequireComponent(typeof(BehaviorParameters))]
/// <summary>
/// The Brain attached to this agent. A brain can be attached either
/// directly from the Editor through AgentEditor or
/// programmatically through <see cref="GiveBrain"/>. It is OK for an agent
/// to not have a brain, as long as no decision is requested.
/// </summary>
[HideInInspector] public Brain brain;
private IPolicy m_Brain;
private BehaviorParameters m_PolicyFactory;
/// <summary>
/// Agent parameters specified within the Editor via AgentEditor.

public AgentInfo Info
{
get { return m_Info; }
set { m_Info = value; }
set { m_Info = value; }
}
/// Current Agent action (message sent from Brain).

/// </summary>
private DemonstrationRecorder m_Recorder;
public List<ISensor> m_Sensors;
/// Monobehavior function that is called when the attached GameObject
/// becomes enabled or active.
void OnEnable()

academy.LazyInitialization();
OnEnableHelper(academy);
m_Recorder = GetComponent<DemonstrationRecorder>();

{
m_Info = new AgentInfo();
m_Action = new AgentAction();
m_Sensors = new List<ISensor>();
if (academy == null)
{

academy.AgentSetStatus += SetStatus;
academy.AgentResetIfDone += ResetIfDone;
academy.AgentSendState += SendInfo;
academy.DecideAction += DecideAction;
if (brain != null)
{
ResetData();
}
else
{
Debug.Log(
string.Format(
"The Agent component attached to the " +
"GameObject {0} was initialized without a brain.",
gameObject.name));
}
m_PolicyFactory = GetComponent<BehaviorParameters>();
m_Brain = m_PolicyFactory.GeneratePolicy(Heuristic);
ResetData();
InitializeSensors();
}
/// Monobehavior function that is called when the attached GameObject

academy.AgentSetStatus -= SetStatus;
academy.AgentResetIfDone -= ResetIfDone;
academy.AgentSendState -= SendInfo;
academy.DecideAction -= DecideAction;
m_Brain?.Dispose();
/// Updates the Brain for the agent. Any brain currently assigned to the
/// agent will be replaced with the provided one.
/// Updates the Model for the agent. Any model currently assigned to the
/// agent will be replaced with the provided one. If the arguments are
/// identical to the current parameters of the agent, the model will
/// remain unchanged.
/// <remarks>
/// The agent unsubscribes from its current brain (if it has one) and
/// subscribes to the provided brain. This enables contextual brains, that
/// is, updating the behaviour (hence brain) of the agent depending on
/// the context of the game. For example, we may utilize one (wandering)
/// brain when an agent is randomly exploring an open world, but switch
/// to another (fighting) brain when it comes into contact with an enemy.
/// </remarks>
/// <param name="givenBrain">New brain to subscribe this agent to</param>
public void GiveBrain(Brain givenBrain)
/// <param name="behaviorName"> The identifier of the behavior. This
/// will categorize the agent when training.
/// </param>
/// <param name="model"> The model to use for inference.</param>
/// <param name = "inferenceDevide"> Define on what device the model
/// will be run.</param>
public void GiveModel(
string behaviorName,
NNModel model,
InferenceDevice inferenceDevice = InferenceDevice.CPU)
brain = givenBrain;
ResetData();
m_PolicyFactory.GiveModel(behaviorName, model, inferenceDevice);
m_Brain?.Dispose();
m_Brain = m_PolicyFactory.GeneratePolicy(Heuristic);
}
/// <summary>

/// at the end of an episode.
void ResetData()
{
if (brain == null)
{
return;
}
var param = brain.brainParameters;
var param = m_PolicyFactory.brainParameters;
m_ActionMasker = new ActionMasker(param);
// If we haven't initialized vectorActions, initialize to 0. This should only
// happen during the creation of the Agent. In subsequent episodes, vectorAction

new List<float>(param.vectorObservationSize);
m_Info.stackedVectorObservation =
new List<float>(param.vectorObservationSize
* brain.brainParameters.numStackedVectorObservations);
* param.numStackedVectorObservations);
m_Info.visualObservations = new List<Texture2D>();
m_Info.compressedObservations = new List<CompressedObservation>();
m_Info.customObservation = null;
}

{
}
/// <summary>
/// When the Agent uses Heuristics, it will call this method every time it
/// needs an action. This can be used for debugging or controlling the agent
/// with keyboard.
/// </summary>
/// <returns> A float array corresponding to the next action of the Agent
/// </returns>
public virtual float[] Heuristic()
{
throw new UnityAgentsException(string.Format(
"The Heuristic method was not implemented for the Agent on the " +
"{0} GameObject.",
gameObject.name));
}
/// <summary>
/// Set up the list of ISensors on the Agent. By default, this will select any
/// SensorBase's attached to the Agent.
/// </summary>
public void InitializeSensors()
{
var attachedSensorComponents = GetComponents<SensorComponent>();
m_Sensors.Capacity += attachedSensorComponents.Length;
foreach (var component in attachedSensorComponents)
{
m_Sensors.Add(component.CreateSensor());
}
// Sort the sensors by name to ensure determinism
m_Sensors.Sort((x, y) => x.GetName().CompareTo(y.GetName()));
#if DEBUG
// Make sure the names are actually unique
for (var i = 0; i < m_Sensors.Count - 1; i++)
{
Debug.Assert(!m_Sensors[i].GetName().Equals(m_Sensors[i + 1].GetName()), "Sensor names must be unique.");
}
#endif
}
if (brain == null)
if (m_Brain == null)
{
return;
}

m_Info.storedTextActions = m_Action.textActions;
m_Info.vectorObservation.Clear();
m_Info.compressedObservations.Clear();
m_ActionMasker.ResetMask();
using (TimerStack.Instance.Scoped("CollectObservations"))
{

var param = brain.brainParameters;
var param = m_PolicyFactory.brainParameters;
"Vector Observation size mismatch between continuous " +
"agent {0} and brain {1}. " +
"Was Expecting {2} but received {3}. ",
gameObject.name, brain.name,
brain.brainParameters.vectorObservationSize,
"Vector Observation size mismatch in continuous " +
"agent {0}. " +
"Was Expecting {1} but received {2}. ",
gameObject.name,
param.vectorObservationSize,
m_Info.vectorObservation.Count));
}

m_Info.visualObservations.Clear();
var visualObservationCount = agentParameters.agentCameras.Count + agentParameters.agentRenderTextures.Count;
if (param.cameraResolutions.Length > visualObservationCount)
{
throw new UnityAgentsException(string.Format(
"Not enough cameras/renderTextures for agent {0} : Brain {1} expecting at " +
"least {2} cameras/renderTextures but only {3} were present.",
gameObject.name, brain.name,
brain.brainParameters.cameraResolutions.Length,
visualObservationCount));
}
//First add all cameras
for (var i = 0; i < agentParameters.agentCameras.Count; i++)
{
var obsTexture = ObservationToTexture(
agentParameters.agentCameras[i],
param.cameraResolutions[i].width,
param.cameraResolutions[i].height);
m_Info.visualObservations.Add(obsTexture);
}
//Then add all renderTextures
var camCount = agentParameters.agentCameras.Count;
for (var i = 0; i < agentParameters.agentRenderTextures.Count; i++)
{
var obsTexture = ObservationToTexture(
agentParameters.agentRenderTextures[i],
param.cameraResolutions[camCount + i].width,
param.cameraResolutions[camCount + i].height);
m_Info.visualObservations.Add(obsTexture);
}
brain.SubscribeAgentForDecision(this);
m_Brain.RequestDecision(this);
// This is a bit of a hack - if we're in inference mode, compressed observations won't be generated
// But we need these to be generated for the recorder. So generate them here.
if (m_Info.compressedObservations.Count == 0)
{
GenerateSensorData();
}
m_Recorder.WriteExperience(m_Info);
}

public void ClearVisualObservations()
/// <summary>
/// Generate data for each sensor and store it on the Agent's AgentInfo.
/// NOTE: At the moment, this is only called during training or when using a DemonstrationRecorder;
/// during inference the sensors are used to write directly to the Tensor data. This will likely change in the
/// future to be controlled by the type of brain being used.
/// </summary>
public void GenerateSensorData()
m_Info.ClearVisualObs();
// Generate data for all sensors
// TODO add bool argument indicating when to compress? For now, we always will compress.
for (var i = 0; i < m_Sensors.Count; i++)
{
var sensor = m_Sensors[i];
var compressedObs = new CompressedObservation
{
Data = sensor.GetCompressedObservation(),
Shape = sensor.GetFloatObservationShape(),
CompressionType = sensor.GetCompressionType()
};
m_Info.compressedObservations.Add(compressedObs);
}
}
/// <summary>

AgentOnDone();
}
if ((m_RequestAction) && (brain != null))
if ((m_RequestAction) && (m_Brain != null))
{
m_RequestAction = false;
AgentAction(m_Action.vectorActions, m_Action.textActions, m_Action.customAction);

}
}
/// <summary>
/// Converts a camera and corresponding resolution to a 2D texture.
/// </summary>
/// <returns>The 2D texture.</returns>
/// <param name="obsCamera">Camera.</param>
/// <param name="width">Width of resulting 2D texture.</param>
/// <param name="height">Height of resulting 2D texture.</param>
/// <returns name="texture2D">Texture2D to render to.</returns>
public static Texture2D ObservationToTexture(Camera obsCamera, int width, int height)
{
var texture2D = new Texture2D(width, height, TextureFormat.RGB24, false);
var oldRec = obsCamera.rect;
obsCamera.rect = new Rect(0f, 0f, 1f, 1f);
var depth = 24;
var format = RenderTextureFormat.Default;
var readWrite = RenderTextureReadWrite.Default;
var tempRt =
RenderTexture.GetTemporary(width, height, depth, format, readWrite);
var prevActiveRt = RenderTexture.active;
var prevCameraRt = obsCamera.targetTexture;
// render to offscreen texture (readonly from CPU side)
RenderTexture.active = tempRt;
obsCamera.targetTexture = tempRt;
obsCamera.Render();
texture2D.ReadPixels(new Rect(0, 0, texture2D.width, texture2D.height), 0, 0);
obsCamera.targetTexture = prevCameraRt;
obsCamera.rect = oldRec;
RenderTexture.active = prevActiveRt;
RenderTexture.ReleaseTemporary(tempRt);
return texture2D;
}
/// <summary>
/// Converts a RenderTexture and correspinding resolution to a 2D texture.
/// </summary>
/// <returns>The 2D texture.</returns>
/// <param name="obsTexture">RenderTexture.</param>
/// <param name="width">Width of resulting 2D texture.</param>
/// <param name="height">Height of resulting 2D texture.</param>
/// <returns name="texture2D">Texture2D to render to.</returns>
public static Texture2D ObservationToTexture(RenderTexture obsTexture, int width, int height)
void DecideAction()
var texture2D = new Texture2D(width, height, TextureFormat.RGB24, false);
if (width != texture2D.width || height != texture2D.height)
{
texture2D.Resize(width, height);
}
if (width != obsTexture.width || height != obsTexture.height)
{
throw new UnityAgentsException(string.Format(
"RenderTexture {0} : width/height is {1}/{2} brain is expecting {3}/{4}.",
obsTexture.name, obsTexture.width, obsTexture.height, width, height));
}
var prevActiveRt = RenderTexture.active;
RenderTexture.active = obsTexture;
texture2D.ReadPixels(new Rect(0, 0, texture2D.width, texture2D.height), 0, 0);
texture2D.Apply();
RenderTexture.active = prevActiveRt;
return texture2D;
m_Brain?.DecideAction();
}
/// <summary>

4
UnitySDK/Assets/ML-Agents/Scripts/DemonstrationRecorder.cs


demonstrationName = SanitizeName(demonstrationName, MaxNameLength);
m_DemoStore.Initialize(
demonstrationName,
m_RecordingAgent.brain.brainParameters,
m_RecordingAgent.brain.name);
GetComponent<BehaviorParameters>().brainParameters,
GetComponent<BehaviorParameters>().behaviorName);
Monitor.Log("Recording Demonstration of Agent: ", m_RecordingAgent.name);
}

69
UnitySDK/Assets/ML-Agents/Scripts/Grpc/CommunicatorObjects/AgentInfo.cs


byte[] descriptorData = global::System.Convert.FromBase64String(
string.Concat(
"CjNtbGFnZW50cy9lbnZzL2NvbW11bmljYXRvcl9vYmplY3RzL2FnZW50X2lu",
"Zm8ucHJvdG8SFGNvbW11bmljYXRvcl9vYmplY3RzGjttbGFnZW50cy9lbnZz",
"L2NvbW11bmljYXRvcl9vYmplY3RzL2N1c3RvbV9vYnNlcnZhdGlvbi5wcm90",
"byLcAgoOQWdlbnRJbmZvUHJvdG8SIgoac3RhY2tlZF92ZWN0b3Jfb2JzZXJ2",
"YXRpb24YASADKAISGwoTdmlzdWFsX29ic2VydmF0aW9ucxgCIAMoDBIYChB0",
"ZXh0X29ic2VydmF0aW9uGAMgASgJEh0KFXN0b3JlZF92ZWN0b3JfYWN0aW9u",
"cxgEIAMoAhIbChNzdG9yZWRfdGV4dF9hY3Rpb25zGAUgASgJEhAKCG1lbW9y",
"aWVzGAYgAygCEg4KBnJld2FyZBgHIAEoAhIMCgRkb25lGAggASgIEhgKEG1h",
"eF9zdGVwX3JlYWNoZWQYCSABKAgSCgoCaWQYCiABKAUSEwoLYWN0aW9uX21h",
"c2sYCyADKAgSSAoSY3VzdG9tX29ic2VydmF0aW9uGAwgASgLMiwuY29tbXVu",
"aWNhdG9yX29iamVjdHMuQ3VzdG9tT2JzZXJ2YXRpb25Qcm90b0IfqgIcTUxB",
"Z2VudHMuQ29tbXVuaWNhdG9yT2JqZWN0c2IGcHJvdG8z"));
"Zm8ucHJvdG8SFGNvbW11bmljYXRvcl9vYmplY3RzGj9tbGFnZW50cy9lbnZz",
"L2NvbW11bmljYXRvcl9vYmplY3RzL2NvbXByZXNzZWRfb2JzZXJ2YXRpb24u",
"cHJvdG8aO21sYWdlbnRzL2VudnMvY29tbXVuaWNhdG9yX29iamVjdHMvY3Vz",
"dG9tX29ic2VydmF0aW9uLnByb3RvIpgDCg5BZ2VudEluZm9Qcm90bxIiChpz",
"dGFja2VkX3ZlY3Rvcl9vYnNlcnZhdGlvbhgBIAMoAhIYChB0ZXh0X29ic2Vy",
"dmF0aW9uGAMgASgJEh0KFXN0b3JlZF92ZWN0b3JfYWN0aW9ucxgEIAMoAhIb",
"ChNzdG9yZWRfdGV4dF9hY3Rpb25zGAUgASgJEhAKCG1lbW9yaWVzGAYgAygC",
"Eg4KBnJld2FyZBgHIAEoAhIMCgRkb25lGAggASgIEhgKEG1heF9zdGVwX3Jl",
"YWNoZWQYCSABKAgSCgoCaWQYCiABKAUSEwoLYWN0aW9uX21hc2sYCyADKAgS",
"SAoSY3VzdG9tX29ic2VydmF0aW9uGAwgASgLMiwuY29tbXVuaWNhdG9yX29i",
"amVjdHMuQ3VzdG9tT2JzZXJ2YXRpb25Qcm90bxJRChdjb21wcmVzc2VkX29i",
"c2VydmF0aW9ucxgNIAMoCzIwLmNvbW11bmljYXRvcl9vYmplY3RzLkNvbXBy",
"ZXNzZWRPYnNlcnZhdGlvblByb3RvSgQIAhADQh+qAhxNTEFnZW50cy5Db21t",
"dW5pY2F0b3JPYmplY3RzYgZwcm90bzM="));
new pbr::FileDescriptor[] { global::MLAgents.CommunicatorObjects.CustomObservationReflection.Descriptor, },
new pbr::FileDescriptor[] { global::MLAgents.CommunicatorObjects.CompressedObservationReflection.Descriptor, global::MLAgents.CommunicatorObjects.CustomObservationReflection.Descriptor, },
new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.AgentInfoProto), global::MLAgents.CommunicatorObjects.AgentInfoProto.Parser, new[]{ "StackedVectorObservation", "VisualObservations", "TextObservation", "StoredVectorActions", "StoredTextActions", "Memories", "Reward", "Done", "MaxStepReached", "Id", "ActionMask", "CustomObservation" }, null, null, null)
new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.AgentInfoProto), global::MLAgents.CommunicatorObjects.AgentInfoProto.Parser, new[]{ "StackedVectorObservation", "TextObservation", "StoredVectorActions", "StoredTextActions", "Memories", "Reward", "Done", "MaxStepReached", "Id", "ActionMask", "CustomObservation", "CompressedObservations" }, null, null, null)
}));
}
#endregion

[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public AgentInfoProto(AgentInfoProto other) : this() {
stackedVectorObservation_ = other.stackedVectorObservation_.Clone();
visualObservations_ = other.visualObservations_.Clone();
textObservation_ = other.textObservation_;
storedVectorActions_ = other.storedVectorActions_.Clone();
storedTextActions_ = other.storedTextActions_;

id_ = other.id_;
actionMask_ = other.actionMask_.Clone();
CustomObservation = other.customObservation_ != null ? other.CustomObservation.Clone() : null;
compressedObservations_ = other.compressedObservations_.Clone();
_unknownFields = pb::UnknownFieldSet.Clone(other._unknownFields);
}

get { return stackedVectorObservation_; }
}
/// <summary>Field number for the "visual_observations" field.</summary>
public const int VisualObservationsFieldNumber = 2;
private static readonly pb::FieldCodec<pb::ByteString> _repeated_visualObservations_codec
= pb::FieldCodec.ForBytes(18);
private readonly pbc::RepeatedField<pb::ByteString> visualObservations_ = new pbc::RepeatedField<pb::ByteString>();
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public pbc::RepeatedField<pb::ByteString> VisualObservations {
get { return visualObservations_; }
}
/// <summary>Field number for the "text_observation" field.</summary>
public const int TextObservationFieldNumber = 3;
private string textObservation_ = "";

}
}
/// <summary>Field number for the "compressed_observations" field.</summary>
public const int CompressedObservationsFieldNumber = 13;
private static readonly pb::FieldCodec<global::MLAgents.CommunicatorObjects.CompressedObservationProto> _repeated_compressedObservations_codec
= pb::FieldCodec.ForMessage(106, global::MLAgents.CommunicatorObjects.CompressedObservationProto.Parser);
private readonly pbc::RepeatedField<global::MLAgents.CommunicatorObjects.CompressedObservationProto> compressedObservations_ = new pbc::RepeatedField<global::MLAgents.CommunicatorObjects.CompressedObservationProto>();
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public pbc::RepeatedField<global::MLAgents.CommunicatorObjects.CompressedObservationProto> CompressedObservations {
get { return compressedObservations_; }
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public override bool Equals(object other) {
return Equals(other as AgentInfoProto);

return true;
}
if(!stackedVectorObservation_.Equals(other.stackedVectorObservation_)) return false;
if(!visualObservations_.Equals(other.visualObservations_)) return false;
if (TextObservation != other.TextObservation) return false;
if(!storedVectorActions_.Equals(other.storedVectorActions_)) return false;
if (StoredTextActions != other.StoredTextActions) return false;

if (Id != other.Id) return false;
if(!actionMask_.Equals(other.actionMask_)) return false;
if (!object.Equals(CustomObservation, other.CustomObservation)) return false;
if(!compressedObservations_.Equals(other.compressedObservations_)) return false;
return Equals(_unknownFields, other._unknownFields);
}

hash ^= stackedVectorObservation_.GetHashCode();
hash ^= visualObservations_.GetHashCode();
if (TextObservation.Length != 0) hash ^= TextObservation.GetHashCode();
hash ^= storedVectorActions_.GetHashCode();
if (StoredTextActions.Length != 0) hash ^= StoredTextActions.GetHashCode();

if (Id != 0) hash ^= Id.GetHashCode();
hash ^= actionMask_.GetHashCode();
if (customObservation_ != null) hash ^= CustomObservation.GetHashCode();
hash ^= compressedObservations_.GetHashCode();
if (_unknownFields != null) {
hash ^= _unknownFields.GetHashCode();
}

[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public void WriteTo(pb::CodedOutputStream output) {
stackedVectorObservation_.WriteTo(output, _repeated_stackedVectorObservation_codec);
visualObservations_.WriteTo(output, _repeated_visualObservations_codec);
if (TextObservation.Length != 0) {
output.WriteRawTag(26);
output.WriteString(TextObservation);

output.WriteRawTag(98);
output.WriteMessage(CustomObservation);
}
compressedObservations_.WriteTo(output, _repeated_compressedObservations_codec);
if (_unknownFields != null) {
_unknownFields.WriteTo(output);
}

public int CalculateSize() {
int size = 0;
size += stackedVectorObservation_.CalculateSize(_repeated_stackedVectorObservation_codec);
size += visualObservations_.CalculateSize(_repeated_visualObservations_codec);
if (TextObservation.Length != 0) {
size += 1 + pb::CodedOutputStream.ComputeStringSize(TextObservation);
}

if (customObservation_ != null) {
size += 1 + pb::CodedOutputStream.ComputeMessageSize(CustomObservation);
}
size += compressedObservations_.CalculateSize(_repeated_compressedObservations_codec);
if (_unknownFields != null) {
size += _unknownFields.CalculateSize();
}

return;
}
stackedVectorObservation_.Add(other.stackedVectorObservation_);
visualObservations_.Add(other.visualObservations_);
if (other.TextObservation.Length != 0) {
TextObservation = other.TextObservation;
}

}
CustomObservation.MergeFrom(other.CustomObservation);
}
compressedObservations_.Add(other.compressedObservations_);
_unknownFields = pb::UnknownFieldSet.MergeFrom(_unknownFields, other._unknownFields);
}

stackedVectorObservation_.AddEntriesFrom(input, _repeated_stackedVectorObservation_codec);
break;
}
case 18: {
visualObservations_.AddEntriesFrom(input, _repeated_visualObservations_codec);
break;
}
case 26: {
TextObservation = input.ReadString();
break;

customObservation_ = new global::MLAgents.CommunicatorObjects.CustomObservationProto();
}
input.ReadMessage(customObservation_);
break;
}
case 106: {
compressedObservations_.AddEntriesFrom(input, _repeated_compressedObservations_codec);
break;
}
}

44
UnitySDK/Assets/ML-Agents/Scripts/Grpc/CommunicatorObjects/BrainParameters.cs


string.Concat(
"CjltbGFnZW50cy9lbnZzL2NvbW11bmljYXRvcl9vYmplY3RzL2JyYWluX3Bh",
"cmFtZXRlcnMucHJvdG8SFGNvbW11bmljYXRvcl9vYmplY3RzGjNtbGFnZW50",
"cy9lbnZzL2NvbW11bmljYXRvcl9vYmplY3RzL3Jlc29sdXRpb24ucHJvdG8a",
"M21sYWdlbnRzL2VudnMvY29tbXVuaWNhdG9yX29iamVjdHMvc3BhY2VfdHlw",
"ZS5wcm90byLUAgoUQnJhaW5QYXJhbWV0ZXJzUHJvdG8SHwoXdmVjdG9yX29i",
"c2VydmF0aW9uX3NpemUYASABKAUSJwofbnVtX3N0YWNrZWRfdmVjdG9yX29i",
"c2VydmF0aW9ucxgCIAEoBRIaChJ2ZWN0b3JfYWN0aW9uX3NpemUYAyADKAUS",
"QQoSY2FtZXJhX3Jlc29sdXRpb25zGAQgAygLMiUuY29tbXVuaWNhdG9yX29i",
"amVjdHMuUmVzb2x1dGlvblByb3RvEiIKGnZlY3Rvcl9hY3Rpb25fZGVzY3Jp",
"cHRpb25zGAUgAygJEkYKGHZlY3Rvcl9hY3Rpb25fc3BhY2VfdHlwZRgGIAEo",
"DjIkLmNvbW11bmljYXRvcl9vYmplY3RzLlNwYWNlVHlwZVByb3RvEhIKCmJy",
"YWluX25hbWUYByABKAkSEwoLaXNfdHJhaW5pbmcYCCABKAhCH6oCHE1MQWdl",
"bnRzLkNvbW11bmljYXRvck9iamVjdHNiBnByb3RvMw=="));
"cy9lbnZzL2NvbW11bmljYXRvcl9vYmplY3RzL3NwYWNlX3R5cGUucHJvdG8i",
"lwIKFEJyYWluUGFyYW1ldGVyc1Byb3RvEh8KF3ZlY3Rvcl9vYnNlcnZhdGlv",
"bl9zaXplGAEgASgFEicKH251bV9zdGFja2VkX3ZlY3Rvcl9vYnNlcnZhdGlv",
"bnMYAiABKAUSGgoSdmVjdG9yX2FjdGlvbl9zaXplGAMgAygFEiIKGnZlY3Rv",
"cl9hY3Rpb25fZGVzY3JpcHRpb25zGAUgAygJEkYKGHZlY3Rvcl9hY3Rpb25f",
"c3BhY2VfdHlwZRgGIAEoDjIkLmNvbW11bmljYXRvcl9vYmplY3RzLlNwYWNl",
"VHlwZVByb3RvEhIKCmJyYWluX25hbWUYByABKAkSEwoLaXNfdHJhaW5pbmcY",
"CCABKAhKBAgEEAVCH6oCHE1MQWdlbnRzLkNvbW11bmljYXRvck9iamVjdHNi",
"BnByb3RvMw=="));
new pbr::FileDescriptor[] { global::MLAgents.CommunicatorObjects.ResolutionReflection.Descriptor, global::MLAgents.CommunicatorObjects.SpaceTypeReflection.Descriptor, },
new pbr::FileDescriptor[] { global::MLAgents.CommunicatorObjects.SpaceTypeReflection.Descriptor, },
new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.BrainParametersProto), global::MLAgents.CommunicatorObjects.BrainParametersProto.Parser, new[]{ "VectorObservationSize", "NumStackedVectorObservations", "VectorActionSize", "CameraResolutions", "VectorActionDescriptions", "VectorActionSpaceType", "BrainName", "IsTraining" }, null, null, null)
new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.BrainParametersProto), global::MLAgents.CommunicatorObjects.BrainParametersProto.Parser, new[]{ "VectorObservationSize", "NumStackedVectorObservations", "VectorActionSize", "VectorActionDescriptions", "VectorActionSpaceType", "BrainName", "IsTraining" }, null, null, null)
}));
}
#endregion

vectorObservationSize_ = other.vectorObservationSize_;
numStackedVectorObservations_ = other.numStackedVectorObservations_;
vectorActionSize_ = other.vectorActionSize_.Clone();
cameraResolutions_ = other.cameraResolutions_.Clone();
vectorActionDescriptions_ = other.vectorActionDescriptions_.Clone();
vectorActionSpaceType_ = other.vectorActionSpaceType_;
brainName_ = other.brainName_;

get { return vectorActionSize_; }
}
/// <summary>Field number for the "camera_resolutions" field.</summary>
public const int CameraResolutionsFieldNumber = 4;
private static readonly pb::FieldCodec<global::MLAgents.CommunicatorObjects.ResolutionProto> _repeated_cameraResolutions_codec
= pb::FieldCodec.ForMessage(34, global::MLAgents.CommunicatorObjects.ResolutionProto.Parser);
private readonly pbc::RepeatedField<global::MLAgents.CommunicatorObjects.ResolutionProto> cameraResolutions_ = new pbc::RepeatedField<global::MLAgents.CommunicatorObjects.ResolutionProto>();
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public pbc::RepeatedField<global::MLAgents.CommunicatorObjects.ResolutionProto> CameraResolutions {
get { return cameraResolutions_; }
}
/// <summary>Field number for the "vector_action_descriptions" field.</summary>
public const int VectorActionDescriptionsFieldNumber = 5;
private static readonly pb::FieldCodec<string> _repeated_vectorActionDescriptions_codec

if (VectorObservationSize != other.VectorObservationSize) return false;
if (NumStackedVectorObservations != other.NumStackedVectorObservations) return false;
if(!vectorActionSize_.Equals(other.vectorActionSize_)) return false;
if(!cameraResolutions_.Equals(other.cameraResolutions_)) return false;
if(!vectorActionDescriptions_.Equals(other.vectorActionDescriptions_)) return false;
if (VectorActionSpaceType != other.VectorActionSpaceType) return false;
if (BrainName != other.BrainName) return false;

if (VectorObservationSize != 0) hash ^= VectorObservationSize.GetHashCode();
if (NumStackedVectorObservations != 0) hash ^= NumStackedVectorObservations.GetHashCode();
hash ^= vectorActionSize_.GetHashCode();
hash ^= cameraResolutions_.GetHashCode();
hash ^= vectorActionDescriptions_.GetHashCode();
if (VectorActionSpaceType != 0) hash ^= VectorActionSpaceType.GetHashCode();
if (BrainName.Length != 0) hash ^= BrainName.GetHashCode();

output.WriteInt32(NumStackedVectorObservations);
}
vectorActionSize_.WriteTo(output, _repeated_vectorActionSize_codec);
cameraResolutions_.WriteTo(output, _repeated_cameraResolutions_codec);
vectorActionDescriptions_.WriteTo(output, _repeated_vectorActionDescriptions_codec);
if (VectorActionSpaceType != 0) {
output.WriteRawTag(48);

size += 1 + pb::CodedOutputStream.ComputeInt32Size(NumStackedVectorObservations);
}
size += vectorActionSize_.CalculateSize(_repeated_vectorActionSize_codec);
size += cameraResolutions_.CalculateSize(_repeated_cameraResolutions_codec);
size += vectorActionDescriptions_.CalculateSize(_repeated_vectorActionDescriptions_codec);
if (VectorActionSpaceType != 0) {
size += 1 + pb::CodedOutputStream.ComputeEnumSize((int) VectorActionSpaceType);

NumStackedVectorObservations = other.NumStackedVectorObservations;
}
vectorActionSize_.Add(other.vectorActionSize_);
cameraResolutions_.Add(other.cameraResolutions_);
vectorActionDescriptions_.Add(other.vectorActionDescriptions_);
if (other.VectorActionSpaceType != 0) {
VectorActionSpaceType = other.VectorActionSpaceType;

case 26:
case 24: {
vectorActionSize_.AddEntriesFrom(input, _repeated_vectorActionSize_codec);
break;
}
case 34: {
cameraResolutions_.AddEntriesFrom(input, _repeated_cameraResolutions_codec);
break;
}
case 42: {

9
UnitySDK/Assets/ML-Agents/Scripts/Grpc/CommunicatorObjects/SpaceType.cs


byte[] descriptorData = global::System.Convert.FromBase64String(
string.Concat(
"CjNtbGFnZW50cy9lbnZzL2NvbW11bmljYXRvcl9vYmplY3RzL3NwYWNlX3R5",
"cGUucHJvdG8SFGNvbW11bmljYXRvcl9vYmplY3RzGjNtbGFnZW50cy9lbnZz",
"L2NvbW11bmljYXRvcl9vYmplY3RzL3Jlc29sdXRpb24ucHJvdG8qLgoOU3Bh",
"Y2VUeXBlUHJvdG8SDAoIZGlzY3JldGUQABIOCgpjb250aW51b3VzEAFCH6oC",
"HE1MQWdlbnRzLkNvbW11bmljYXRvck9iamVjdHNiBnByb3RvMw=="));
"cGUucHJvdG8SFGNvbW11bmljYXRvcl9vYmplY3RzKi4KDlNwYWNlVHlwZVBy",
"b3RvEgwKCGRpc2NyZXRlEAASDgoKY29udGludW91cxABQh+qAhxNTEFnZW50",
"cy5Db21tdW5pY2F0b3JPYmplY3RzYgZwcm90bzM="));
new pbr::FileDescriptor[] { global::MLAgents.CommunicatorObjects.ResolutionReflection.Descriptor, },
new pbr::FileDescriptor[] { },
new pbr::GeneratedClrTypeInfo(new[] {typeof(global::MLAgents.CommunicatorObjects.SpaceTypeProto), }, null));
}
#endregion

2
UnitySDK/Assets/ML-Agents/Scripts/Grpc/CommunicatorObjects/CompressedObservation.cs.meta


fileFormatVersion: 2
guid: f19fefeaaa2594e70b23e74adfaf0d5a
guid: 55ac40ee8d5b74b9e80d3def9d4ef6e0
MonoImporter:
externalObjects: {}
serializedVersion: 2

56
UnitySDK/Assets/ML-Agents/Scripts/Grpc/GrpcExtensions.cs


using Google.Protobuf;
using Google.Protobuf.Collections;
using MLAgents.CommunicatorObjects;
using MLAgents.Sensor;
using UnityEngine;
namespace MLAgents

agentInfoProto.ActionMask.AddRange(ai.actionMasks);
}
foreach (var obs in ai.visualObservations)
if (ai.compressedObservations != null)
using (TimerStack.Instance.Scoped("encodeVisualObs"))
foreach (var obs in ai.compressedObservations)
agentInfoProto.VisualObservations.Add(
ByteString.CopyFrom(obs.EncodeToPNG())
);
agentInfoProto.CompressedObservations.Add(obs.ToProto());
/// Converts a Brain into to a Protobuff BrainInfoProto so it can be sent
/// Converts a Brain into to a Protobuf BrainInfoProto so it can be sent
/// </summary>
/// <returns>The BrainInfoProto generated.</returns>
/// <param name="bp">The instance of BrainParameter to extend.</param>

IsTraining = isTraining
};
brainParametersProto.VectorActionDescriptions.AddRange(bp.vectorActionDescriptions);
foreach (var res in bp.cameraResolutions)
{
brainParametersProto.CameraResolutions.Add(
new ResolutionProto
{
Width = res.width,
Height = res.height,
GrayScale = res.blackAndWhite
});
}
return brainParametersProto;
}

}
/// <summary>
/// Converts Resolution protobuf array to C# Resolution array.
/// </summary>
private static Resolution[] ResolutionProtoToNative(IReadOnlyList<ResolutionProto> resolutionProtos)
{
var localCameraResolutions = new Resolution[resolutionProtos.Count];
for (var i = 0; i < resolutionProtos.Count; i++)
{
localCameraResolutions[i] = new Resolution
{
height = resolutionProtos[i].Height,
width = resolutionProtos[i].Width,
blackAndWhite = resolutionProtos[i].GrayScale
};
}
return localCameraResolutions;
}
/// <summary>
/// Convert a BrainParametersProto to a BrainParameters struct.
/// </summary>
/// <param name="bpp">An instance of a brain parameters protobuf object.</param>

var bp = new BrainParameters
{
vectorObservationSize = bpp.VectorObservationSize,
cameraResolutions = ResolutionProtoToNative(
bpp.CameraResolutions
),
numStackedVectorObservations = bpp.NumStackedVectorObservations,
vectorActionSize = bpp.VectorActionSize.ToArray(),
vectorActionDescriptions = bpp.VectorActionDescriptions.ToArray(),

agentActions.Add(ap.ToAgentAction());
}
return agentActions;
}
public static CompressedObservationProto ToProto(this CompressedObservation obs)
{
var obsProto = new CompressedObservationProto
{
Data = ByteString.CopyFrom(obs.Data),
CompressionType = (CompressionTypeProto) obs.CompressionType,
};
obsProto.Shape.AddRange(obs.Shape);
return obsProto;
}
}
}

181
UnitySDK/Assets/ML-Agents/Scripts/Grpc/RpcCommunicator.cs


/// The default number of agents in the scene
private const int k_NumAgents = 32;
/// Keeps track of which brains have data to send on the current step
Dictionary<string, bool> m_HasData =
new Dictionary<string, bool>();
/// Keeps track of which brains queried the communicator on the current step
Dictionary<string, bool> m_HasQueried =
new Dictionary<string, bool>();
/// Keeps track of the agents of each brain on the current step
Dictionary<string, List<Agent>> m_CurrentAgents =
new Dictionary<string, List<Agent>>();

Dictionary<string, Dictionary<Agent, AgentAction>> m_LastActionsReceived =
new Dictionary<string, Dictionary<Agent, AgentAction>>();
private UnityRLInitializationOutputProto m_CurrentUnityRlInitializationOutput;
// Brains that we have sent over the communicator with agents.
HashSet<string> m_sentBrainKeys = new HashSet<string>();
Dictionary<string, BrainParameters> m_unsentBrainKeys = new Dictionary<string, BrainParameters>();
# if UNITY_EDITOR || UNITY_STANDALONE_WIN || UNITY_STANDALONE_OSX || UNITY_STANDALONE_LINUX

/// Adds the brain to the list of brains which will be sending information to External.
/// </summary>
/// <param name="brainKey">Brain key.</param>
/// <param name="brainParameters">Brain parameters needed to send to the trainer.</param>
m_HasQueried[brainKey] = false;
m_HasData[brainKey] = false;
if (m_CurrentAgents.ContainsKey(brainKey))
{
return;
}
new CommunicatorObjects.UnityRLOutputProto.Types.ListAgentInfoProto());
if (m_CurrentUnityRlInitializationOutput == null){
m_CurrentUnityRlInitializationOutput = new CommunicatorObjects.UnityRLInitializationOutputProto();
}
m_CurrentUnityRlInitializationOutput.BrainParameters.Add(brainParameters.ToProto(brainKey, true));
new UnityRLOutputProto.Types.ListAgentInfoProto()
);
CacheBrainParameters(brainKey, brainParameters);
}
void UpdateEnvironmentWithInput(UnityRLInputProto rlInput)

#region Destruction
/// <summary>
/// Ensure that when this object is destructed, the connection is closed.
/// </summary>
~RpcCommunicator()
{
Close();
}
/// <summary>
public void Close()
public void Dispose()
{
# if UNITY_EDITOR || UNITY_STANDALONE_WIN || UNITY_STANDALONE_OSX || UNITY_STANDALONE_LINUX
if (!m_IsOpen)

switch (command)
{
case CommandProto.Quit:
{
QuitCommandReceived?.Invoke();
return;
}
{
QuitCommandReceived?.Invoke();
return;
}
{
ResetCommandReceived?.Invoke(environmentParametersProto.ToEnvironmentResetParameters());
return;
}
{
ResetCommandReceived?.Invoke(environmentParametersProto.ToEnvironmentResetParameters());
return;
}
{
return;
}
{
return;
}
}
}

#region Sending and retreiving data
public void PutObservations(
string brainKey, IEnumerable<Agent> agents)
public void DecideBatch()
// The brain tried called GiveBrainInfo, update m_hasQueried
m_HasQueried[brainKey] = true;
// Populate the currentAgents dictionary
m_CurrentAgents[brainKey].Clear();
foreach (var agent in agents)
if (m_CurrentAgents.Values.All(l => l.Count == 0))
m_CurrentAgents[brainKey].Add(agent);
return;
// If at least one agent has data to send, then append data to
// the message and update hasSentState
if (m_CurrentAgents[brainKey].Count > 0)
foreach (var brainKey in m_CurrentAgents.Keys)
foreach (var agent in m_CurrentAgents[brainKey])
using (TimerStack.Instance.Scoped("AgentInfo.ToProto"))
var agentInfoProto = agent.Info.ToProto();
m_CurrentUnityRlOutput.AgentInfos[brainKey].Value.Add(agentInfoProto);
// Avoid visual obs memory leak. This should be called AFTER we are done with the visual obs.
// e.g. after recording them to demo and using them for inference.
agent.ClearVisualObservations();
if (m_CurrentAgents[brainKey].Count > 0)
{
foreach (var agent in m_CurrentAgents[brainKey])
{
// Update the sensor data on the AgentInfo
agent.GenerateSensorData();
var agentInfoProto = agent.Info.ToProto();
m_CurrentUnityRlOutput.AgentInfos[brainKey].Value.Add(agentInfoProto);
}
}
m_HasData[brainKey] = true;
// If any agent needs to send data, then the whole message
// must be sent
if (m_HasQueried.Values.All(x => x))
SendBatchedMessageHelper();
foreach (var brainKey in m_CurrentAgents.Keys)
if (m_HasData.Values.Any(x => x))
{
SendBatchedMessageHelper();
}
m_CurrentAgents[brainKey].Clear();
}
}
// The message was just sent so we must reset hasSentState and
// triedSendState
foreach (var k in m_CurrentAgents.Keys)
{
m_HasData[k] = false;
m_HasQueried[k] = false;
}
}
/// <summary>
/// Sends the observations of one Agent.
/// </summary>
/// <param name="key">Batch Key.</param>
/// <param name="agents">Agent info.</param>
public void PutObservations(string brainKey, Agent agent)
{
m_CurrentAgents[brainKey].Add(agent);
}
/// <summary>

void SendBatchedMessageHelper()
{
var message = new CommunicatorObjects.UnityOutputProto
var message = new UnityOutputProto
if (m_CurrentUnityRlInitializationOutput != null)
var tempUnityRlInitializationOutput = GetTempUnityRlInitializationOutput();
if (tempUnityRlInitializationOutput != null)
message.RlInitializationOutput = m_CurrentUnityRlInitializationOutput;
message.RlInitializationOutput = tempUnityRlInitializationOutput;
m_CurrentUnityRlInitializationOutput = null;
UpdateSentBrainParameters(tempUnityRlInitializationOutput);
foreach (var k in m_CurrentUnityRlOutput.AgentInfos.Keys)
{

};
}
private void CacheBrainParameters(string brainKey, BrainParameters brainParameters)
{
if (m_sentBrainKeys.Contains(brainKey))
{
return;
}
// TODO We should check that if m_unsentBrainKeys has brainKey, it equals brainParameters
m_unsentBrainKeys[brainKey] = brainParameters;
}
private UnityRLInitializationOutputProto GetTempUnityRlInitializationOutput()
{
UnityRLInitializationOutputProto output = null;
foreach (var brainKey in m_unsentBrainKeys.Keys)
{
if (m_CurrentUnityRlOutput.AgentInfos.ContainsKey(brainKey))
{
if (output == null)
{
output = new UnityRLInitializationOutputProto();
}
var brainParameters = m_unsentBrainKeys[brainKey];
output.BrainParameters.Add(brainParameters.ToProto(brainKey, true));
}
}
return output;
}
private void UpdateSentBrainParameters(UnityRLInitializationOutputProto output)
{
if (output == null)
{
return;
}
foreach (var brainProto in output.BrainParameters)
{
m_sentBrainKeys.Add(brainProto.BrainName);
m_unsentBrainKeys.Remove(brainProto.BrainName);
}
}
#endregion
#if UNITY_EDITOR

// This method is run whenever the playmode state is changed.
if (state == PlayModeStateChange.ExitingPlayMode)
{
Close();
Dispose();
}
}

22
UnitySDK/Assets/ML-Agents/Scripts/ICommunicator.cs


UnityOutput and UnityInput can be extended to provide functionalities beyond RL
UnityRLOutput and UnityRLInput can be extended to provide new RL functionalities
*/
public interface ICommunicator
public interface ICommunicator : IBatchedDecisionMaker
{
/// <summary>
/// Quit was received by the communicator.

void SubscribeBrain(string name, BrainParameters brainParameters);
/// <summary>
/// Sends the observations. If at least one brain has an agent in need of
/// a decision or if the academy is done, the data is sent via
/// Communicator. Else, a new step is realized. The data can only be
/// sent once all the brains that were part of initialization have tried
/// to send information.
/// </summary>
/// <param name="key">Batch Key.</param>
/// <param name="agents">Agent info.</param>
void PutObservations(string key, IEnumerable<Agent> agents);
/// <summary>
}
/// <summary>
/// Close the communicator gracefully on both sides of the communication.
/// </summary>
void Close();
public interface IBatchedDecisionMaker : IDisposable
{
void PutObservations(string key, Agent agent);
void DecideBatch();
}
}

354
UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/BarracudaModelParamLoader.cs


Continuous
}
private const long k_ApiVersion = 2;
private readonly IWorker m_Engine;
private readonly Model m_Model;
private readonly BrainParameters m_BrainParameters;
private readonly List<string> m_FailedModelChecks = new List<string>();
/// Factory for the ModelParamLoader : Creates a ModelParamLoader and runs the checks
/// on it.
/// Generates the Tensor inputs that are expected to be present in the Model.
/// <param name="engine">
/// The Barracuda engine worker we get the parameters and the checks from
/// </param>
/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
/// <returns></returns>
public static BarracudaModelParamLoader GetLoaderAndCheck(
IWorker engine, Model model, BrainParameters brainParameters)
{
var modelParamLoader = new BarracudaModelParamLoader(engine, model, brainParameters);
modelParamLoader.GenerateChecks();
return modelParamLoader;
}
private BarracudaModelParamLoader(
IWorker engine, Model model, BrainParameters brainParameters)
{
m_Engine = engine;
m_Model = model;
m_BrainParameters = brainParameters;
}
/// <summary>
/// Generates the Tensor inputs that are expected to be present in the Model.
/// </summary>
public IReadOnlyList<TensorProxy> GetInputTensors()
public static IReadOnlyList<TensorProxy> GetInputTensors(Model model)
if (m_Model == null)
if (model == null)
foreach (var input in m_Model.inputs)
foreach (var input in model.inputs)
{
tensors.Add(new TensorProxy
{

});
}
foreach (var mem in m_Model.memories)
foreach (var mem in model.memories)
//Debug.Log($"{mem.input}: {mem.shape} -> {BarracudaUtils.TensorShapeFromBarracuda(mem.shape).Length}");
tensors.Add(new TensorProxy
{
name = mem.input,

/// <summary>
/// Generates the Tensor outputs that are expected to be present in the Model.
/// </summary>
/// <param name="model">
/// The Barracuda engine model for loading static parameters
/// </param>
public string[] GetOutputNames()
public static string[] GetOutputNames(Model model)
if (m_Model == null)
if (model == null)
{
return names.ToArray();
}

var memory = GetIntScalar(TensorNames.MemorySize);
var memory = (int)model.GetTensorByName(TensorNames.MemorySize)[0];
foreach (var mem in m_Model.memories)
foreach (var mem in model.memories)
{
names.Add(mem.output);
}

}
/// <summary>
/// Queries the InferenceEngine for the value of a variable in the graph given its name.
/// Only works with int32 Tensors with zero dimensions containing a unique element.
/// If the node was not found or could not be retrieved, the value -1 will be returned.
/// </summary>
/// <param name="name">The name of the Tensor variable</param>
/// <returns>The value of the scalar variable in the model. (-1 if not found)</returns>
private int GetIntScalar(string name)
{
return (int)m_Model.GetTensorByName(name)[0];
}
/// <summary>
/// Retrieves an IEnumerable of string corresponding to the failed compatibility checks
/// between the InferenceEngine and the BrainParameters.
/// </summary>
public IEnumerable<string> GetChecks()
{
return m_FailedModelChecks;
}
/// <summary>
/// Generates the list of failed checks that failed when comparing the data from the Model
/// and from the BrainParameters
/// Factory for the ModelParamLoader : Creates a ModelParamLoader and runs the checks
/// on it.
private void GenerateChecks()
/// <param name="model">
/// The Barracuda engine model for loading static parameters
/// </param>
/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
/// <returns>The list the error messages of the checks that failed</returns>
public static IEnumerable<string> CheckModel(Model model, BrainParameters brainParameters)
m_FailedModelChecks.Clear();
if (m_Engine == null)
List<string> failedModelChecks = new List<string>();
if (model == null)
m_FailedModelChecks.Add(
failedModelChecks.Add(
return;
return failedModelChecks;
var modelApiVersion = GetIntScalar(TensorNames.VersionNumber);
var memorySize = GetIntScalar(TensorNames.MemorySize);
var isContinuousInt = GetIntScalar(TensorNames.IsContinuousControl);
var modelApiVersion = (int)model.GetTensorByName(TensorNames.VersionNumber)[0];
var memorySize = (int)model.GetTensorByName(TensorNames.MemorySize)[0];
var isContinuousInt = (int)model.GetTensorByName(TensorNames.IsContinuousControl)[0];
var actionSize = GetIntScalar(TensorNames.ActionOutputShape);
var actionSize = (int)model.GetTensorByName(TensorNames.ActionOutputShape)[0];
m_FailedModelChecks.Add(
failedModelChecks.Add(
return;
return failedModelChecks;
m_FailedModelChecks.Add(
failedModelChecks.Add(
return;
return failedModelChecks;
CheckIntScalarPresenceHelper(new Dictionary<string, int>()
{
{TensorNames.MemorySize, memorySize},
{TensorNames.IsContinuousControl, isContinuousInt},
{TensorNames.ActionOutputShape, actionSize}
});
CheckInputTensorPresence(memorySize, isContinuous);
CheckOutputTensorPresence(memorySize);
CheckInputTensorShape();
CheckOutputTensorShape(isContinuous, actionSize);
failedModelChecks.AddRange(
CheckIntScalarPresenceHelper(new Dictionary<string, int>()
{
{TensorNames.MemorySize, memorySize},
{TensorNames.IsContinuousControl, isContinuousInt},
{TensorNames.ActionOutputShape, actionSize}
})
);
failedModelChecks.AddRange(
CheckInputTensorPresence(model, brainParameters, memorySize, isContinuous)
);
failedModelChecks.AddRange(
CheckOutputTensorPresence(model, memorySize))
;
failedModelChecks.AddRange(
CheckInputTensorShape(model, brainParameters)
);
failedModelChecks.AddRange(
CheckOutputTensorShape(model, brainParameters, isContinuous, actionSize)
);
return failedModelChecks;
}
/// <summary>

/// invalid value of -1.
/// </summary>
/// <param name="requiredScalarFields"> Mapping from node names to int values</param>
private void CheckIntScalarPresenceHelper(Dictionary<string, int> requiredScalarFields)
/// <returns>The list the error messages of the checks that failed</returns>
private static IEnumerable<string> CheckIntScalarPresenceHelper(
Dictionary<string, int> requiredScalarFields)
var failedModelChecks = new List<string>();
m_FailedModelChecks.Add($"Missing node in the model provided : {field.Key}");
failedModelChecks.Add($"Missing node in the model provided : {field.Key}");
return failedModelChecks;
}
/// <summary>

/// <param name="model">
/// The Barracuda engine model for loading static parameters
/// </param>
/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
/// <param name="memory">
/// The memory size that the model is expecting.
/// </param>

/// <returns>
/// A IEnumerable of string corresponding to the failed input presence checks.
/// </returns>
private void CheckInputTensorPresence(int memory, ModelActionType isContinuous)
private static IEnumerable<string> CheckInputTensorPresence(
Model model,
BrainParameters brainParameters,
int memory,
ModelActionType isContinuous)
var tensorsNames = GetInputTensors().Select(x => x.name).ToList();
var failedModelChecks = new List<string>();
var tensorsNames = GetInputTensors(model).Select(x => x.name).ToList();
if ((m_BrainParameters.vectorObservationSize != 0) &&
if ((brainParameters.vectorObservationSize != 0) &&
m_FailedModelChecks.Add(
failedModelChecks.Add(
// If there are not enough Visual Observation Input compared to what the
// Brain Parameters expect.
for (var visObsIndex = 0;
visObsIndex < m_BrainParameters.cameraResolutions.Length;
visObsIndex++)
{
if (!tensorsNames.Contains(
TensorNames.VisualObservationPlaceholderPrefix + visObsIndex))
{
m_FailedModelChecks.Add(
"The model does not contain a Visual Observation Placeholder Input " +
"for visual observation " + visObsIndex + ".");
}
}
// TODO reenable checks there are enough Visual Observation Placeholder in the model.
// If the model has a non-negative memory size but requires a recurrent input
if (memory > 0)

{
m_FailedModelChecks.Add(
failedModelChecks.Add(
"The model does not contain a Recurrent Input Node but has memory_size.");
}
}

{
if (!tensorsNames.Contains(TensorNames.ActionMaskPlaceholder))
{
m_FailedModelChecks.Add(
failedModelChecks.Add(
return failedModelChecks;
}
/// <summary>

/// <param name="model">
/// The Barracuda engine model for loading static parameters
/// </param>
private void CheckOutputTensorPresence(int memory)
private static IEnumerable<string> CheckOutputTensorPresence(Model model, int memory)
var failedModelChecks = new List<string>();
if (!m_Model.outputs.Contains(TensorNames.ActionOutput))
if (!model.outputs.Contains(TensorNames.ActionOutput))
m_FailedModelChecks.Add("The model does not contain an Action Output Node.");
failedModelChecks.Add("The model does not contain an Action Output Node.");
var memOutputs = m_Model.memories.Select(x => x.output).ToList();
var memOutputs = model.memories.Select(x => x.output).ToList();
m_FailedModelChecks.Add(
failedModelChecks.Add(
return failedModelChecks;
}
/// <summary>

private void CheckInputTensorShape()
/// <param name="model">
/// The Barracuda engine model for loading static parameters
/// </param>
/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
/// <returns>The list the error messages of the checks that failed</returns>
private static IEnumerable<string> CheckInputTensorShape(
Model model, BrainParameters brainParameters)
var failedModelChecks = new List<string>();
new Dictionary<string, Func<TensorProxy, string>>()
new Dictionary<string, Func<BrainParameters, TensorProxy, string>>()
{TensorNames.RandomNormalEpsilonPlaceholder, ((tensor) => null)},
{TensorNames.ActionMaskPlaceholder, ((tensor) => null)},
{TensorNames.SequenceLengthPlaceholder, ((tensor) => null)},
{TensorNames.RecurrentInPlaceholder, ((tensor) => null)},
{TensorNames.RandomNormalEpsilonPlaceholder, ((bp, tensor) => null)},
{TensorNames.ActionMaskPlaceholder, ((bp, tensor) => null)},
{TensorNames.SequenceLengthPlaceholder, ((bp, tensor) => null)},
{TensorNames.RecurrentInPlaceholder, ((bp, tensor) => null)},
foreach (var mem in m_Model.memories)
foreach (var mem in model.memories)
tensorTester[mem.input] = ((tensor) => null);
tensorTester[mem.input] = ((bp, tensor) => null);
for (var obsIndex = 0; obsIndex < m_BrainParameters.cameraResolutions.Length; obsIndex++)
{
var index = obsIndex;
tensorTester[TensorNames.VisualObservationPlaceholderPrefix + obsIndex] =
(tensor) => CheckVisualObsShape(tensor, index);
}
// TODO reenable checks on visual observation shapes.
foreach (var tensor in GetInputTensors())
foreach (var tensor in GetInputTensors(model))
m_FailedModelChecks.Add(
"Model requires an unknown input named : " + tensor.name);
if (!tensor.name.Contains("visual_observation"))
{
failedModelChecks.Add(
"Model requires an unknown input named : " + tensor.name);
}
var error = tester.Invoke(tensor);
var error = tester.Invoke(brainParameters, tensor);
m_FailedModelChecks.Add(error);
failedModelChecks.Add(error);
return failedModelChecks;
}
/// <summary>

/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
private string CheckVectorObsShape(TensorProxy tensorProxy)
private static string CheckVectorObsShape(
BrainParameters brainParameters, TensorProxy tensorProxy)
var vecObsSizeBp = m_BrainParameters.vectorObservationSize;
var numStackedVector = m_BrainParameters.numStackedVectorObservations;
var vecObsSizeBp = brainParameters.vectorObservationSize;
var numStackedVector = brainParameters.numStackedVectorObservations;
var totalVecObsSizeT = tensorProxy.shape[tensorProxy.shape.Length - 1];
if (vecObsSizeBp * numStackedVector != totalVecObsSizeT)
{

/// Checks that the shape of the Previous Vector Action input placeholder is the same in the
/// model and in the Brain Parameters.
/// </summary>
/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
private string CheckPreviousActionShape(TensorProxy tensorProxy)
private static string CheckPreviousActionShape(
BrainParameters brainParameters, TensorProxy tensorProxy)
var numberActionsBp = m_BrainParameters.vectorActionSize.Length;
var numberActionsBp = brainParameters.vectorActionSize.Length;
var numberActionsT = tensorProxy.shape[tensorProxy.shape.Length - 1];
if (numberActionsBp != numberActionsT)
{

}
/// <summary>
/// Checks that the shape of the visual observation input placeholder is the same in the
/// model and in the Brain Parameters.
/// </summary>
/// <param name="tensorProxy">The tensor that is expected by the model</param>
/// <param name="visObsIndex">The index of the visual observation.</param>
/// <returns>
/// If the Check failed, returns a string containing information about why the
/// check failed. If the check passed, returns null.
/// </returns>
private string CheckVisualObsShape(TensorProxy tensorProxy, int visObsIndex)
{
var resolutionBp = m_BrainParameters.cameraResolutions[visObsIndex];
var widthBp = resolutionBp.width;
var heightBp = resolutionBp.height;
var pixelBp = resolutionBp.blackAndWhite ? 1 : 3;
var heightT = tensorProxy.shape[1];
var widthT = tensorProxy.shape[2];
var pixelT = tensorProxy.shape[3];
if ((widthBp != widthT) || (heightBp != heightT) || (pixelBp != pixelT))
{
return $"The visual Observation {visObsIndex} of the model does not match. " +
$"Received TensorProxy of shape [?x{widthBp}x{heightBp}x{pixelBp}] but " +
$"was expecting [?x{widthT}x{heightT}x{pixelT}].";
}
return null;
}
/// <summary>
/// <param name="model">
/// The Barracuda engine model for loading static parameters
/// </param>
/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
/// <param name="isContinuous">
/// Whether the model is expecting continuous or discrete control.
/// </param>

/// A IEnumerable of string corresponding to the incompatible shapes between model
/// and BrainParameters.
/// </returns>
private void CheckOutputTensorShape(ModelActionType isContinuous, int modelActionSize)
private static IEnumerable<string> CheckOutputTensorShape(
Model model,
BrainParameters brainParameters,
ModelActionType isContinuous,
int modelActionSize)
var failedModelChecks = new List<string>();
m_FailedModelChecks.Add("Cannot infer type of Control from the provided model.");
return;
failedModelChecks.Add("Cannot infer type of Control from the provided model.");
return failedModelChecks;
m_BrainParameters.vectorActionSpaceType != SpaceType.Continuous)
brainParameters.vectorActionSpaceType != SpaceType.Continuous)
m_FailedModelChecks.Add(
failedModelChecks.Add(
return;
return failedModelChecks;
m_BrainParameters.vectorActionSpaceType != SpaceType.Discrete)
brainParameters.vectorActionSpaceType != SpaceType.Discrete)
m_FailedModelChecks.Add(
failedModelChecks.Add(
return;
return failedModelChecks;
var tensorTester = new Dictionary<string, Func<TensorShape, int, string>>();
if (m_BrainParameters.vectorActionSpaceType == SpaceType.Continuous)
var tensorTester = new Dictionary<string, Func<BrainParameters, TensorShape, int, string>>();
if (brainParameters.vectorActionSpaceType == SpaceType.Continuous)
{
tensorTester[TensorNames.ActionOutput] = CheckContinuousActionOutputShape;
}

}
// If the model expects an output but it is not in this list
foreach (var name in m_Model.outputs)
foreach (var name in model.outputs)
var error = tester.Invoke(m_Model.GetShapeByName(name), modelActionSize);
var error = tester.Invoke(brainParameters, model.GetShapeByName(name), modelActionSize);
m_FailedModelChecks.Add(error);
failedModelChecks.Add(error);
return failedModelChecks;
}
/// <summary>

/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
/// <param name="shape"> The tensor shape that is expected by the model</param>
/// <param name="modelActionSize">
/// The size of the action output that is expected by the model.

/// check failed. If the check passed, returns null.
/// </returns>
private string CheckDiscreteActionOutputShape(TensorShape shape, int modelActionSize)
private static string CheckDiscreteActionOutputShape(
BrainParameters brainParameters, TensorShape shape, int modelActionSize)
var bpActionSize = m_BrainParameters.vectorActionSize.Sum();
var bpActionSize = brainParameters.vectorActionSize.Sum();
if (modelActionSize != bpActionSize)
{
return "Action Size of the model does not match. The BrainParameters expect " +

/// Checks that the shape of the continuous action output is the same in the
/// model and in the Brain Parameters.
/// </summary>
/// <param name="brainParameters">
/// The BrainParameters that are used verify the compatibility with the InferenceEngine
/// </param>
/// <param name="shape"> The tensor shape that is expected by the model</param>
/// <param name="modelActionSize">
/// The size of the action output that is expected by the model.

private string CheckContinuousActionOutputShape(TensorShape shape, int modelActionSize)
private static string CheckContinuousActionOutputShape(
BrainParameters brainParameters, TensorShape shape, int modelActionSize)
var bpActionSize = m_BrainParameters.vectorActionSize[0];
var bpActionSize = brainParameters.vectorActionSize[0];
if (modelActionSize != bpActionSize)
{
return "Action Size of the model does not match. The BrainParameters expect " +

16
UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/GeneratorImpl.cs


{
TensorUtils.ResizeTensor(tensorProxy, batchSize, m_Allocator);
var vecObsSizeT = tensorProxy.shape[tensorProxy.shape.Length - 1];
var agentIndex = 0;
foreach (var agent in agents)
{

private readonly ITensorAllocator m_Allocator;
public VisualObservationInputGenerator(
int index, bool grayScale, ITensorAllocator allocator)
int index, ITensorAllocator allocator)
m_GrayScale = grayScale;
m_Allocator = allocator;
}

var textures = agents.Select(
agent => agent.Info.visualObservations[m_Index]).ToList();
Utilities.TextureToTensorProxy(textures, tensorProxy, m_GrayScale);
var agentIndex = 0;
foreach (var agent in agents)
{
// TODO direct access to sensors list here - should we do it differently?
// TODO m_Index here is the visual observation index. Will work for now but not if we add more sensor types.
agent.m_Sensors[m_Index].WriteToTensor(tensorProxy, agentIndex);
agentIndex++;
}
}
}
}

23
UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/TensorGenerator.cs


new ActionMaskInputGenerator(allocator);
m_Dict[TensorNames.RandomNormalEpsilonPlaceholder] =
new RandomNormalInputGenerator(seed, allocator);
if (bp.cameraResolutions != null)
{
for (var visIndex = 0;
visIndex < bp.cameraResolutions.Length;
visIndex++)
{
var index = visIndex;
var bw = bp.cameraResolutions[visIndex].blackAndWhite;
m_Dict[TensorNames.VisualObservationPlaceholderPrefix + visIndex] =
new VisualObservationInputGenerator(index, bw, allocator);
}
}
}
public void InitializeVisualObservations(Agent agent, ITensorAllocator allocator)
{
for (var visIndex = 0; visIndex < agent.m_Sensors.Count; visIndex++)
{
// TODO handle non-visual sensors too - need to index better
m_Dict[TensorNames.VisualObservationPlaceholderPrefix + visIndex] =
new VisualObservationInputGenerator(visIndex, allocator);
}
}
/// <summary>

35
UnitySDK/Assets/ML-Agents/Scripts/Timer.cs


// Compile with: csc CRefTest.cs -doc:Results.xml
using System.Collections.ObjectModel;
using System.IO;
using UnityEngine.Profiling;
using System.Runtime.Serialization;

/// <summary>
/// Number of total ticks elapsed for this node.
/// </summary>
long m_TotalTicks = 0;
long m_TotalTicks;
long m_TickStart = 0;
long m_TickStart;
int m_NumCalls = 0;
int m_NumCalls;
/// <summary>
/// The total recorded ticks for the timer node, plus the currently elapsed ticks

{
get
{
long currentTicks = m_TotalTicks;
var currentTicks = m_TotalTicks;
if (m_TickStart != 0)
{
currentTicks += (System.DateTime.Now.Ticks - m_TickStart);

/// <returns></returns>
public string DebugGetTimerString(string parentName = "", int level = 0)
{
string indent = new string(' ', 2 * level); // TODO generalize
string shortName = (level == 0) ? m_FullName : m_FullName.Replace(parentName + s_Separator, "");
string timerString = "";
var indent = new string(' ', 2 * level); // TODO generalize
var shortName = (level == 0) ? m_FullName : m_FullName.Replace(parentName + s_Separator, "");
string timerString;
if (level == 0)
{
timerString = $"{shortName}(root)\n";

timerString = $"{indent}{shortName}\t\traw={TotalSeconds} rawCount={m_NumCalls}\n";
}
// TODO use stringbuilder? might be overkill since this is only debugging code?
// TODO use StringBuilder? might be overkill since this is only debugging code?
foreach (TimerNode c in m_Children.Values)
foreach (var c in m_Children.Values)
{
timerString += c.DebugGetTimerString(m_FullName, level + 1);
}

/// <summary>
/// A "stack" of timers that allows for lightweight hierarchical profiling of long-running processes.
/// <example>
///
/// <code>
/// for (int i=0; i<5; i++)
/// for (int i=0; i&lt;5; i++)
/// {
/// using(myTimer.Scoped("bar"))
/// {

/// }
/// </code>
/// </example>
/// </summary>
/// <remarks>
/// This implements the Singleton pattern (solution 4) as described in

{
private static readonly TimerStack instance = new TimerStack();
private static readonly TimerStack k_Instance = new TimerStack();
Stack<TimerNode> m_Stack;
TimerNode m_RootNode;

public static TimerStack Instance
{
get { return instance; }
get { return k_Instance; }
}
public TimerNode RootNode

private void Push(string name)
{
TimerNode current = m_Stack.Peek();
TimerNode next = current.GetChild(name);
var current = m_Stack.Peek();
var next = current.GetChild(name);
m_Stack.Push(next);
next.Begin();
}

69
UnitySDK/Assets/ML-Agents/Scripts/Utilities.cs


var numTextures = textures.Count;
var width = textures[0].width;
var height = textures[0].height;
for (var t = 0; t < numTextures; t++)
{
var texture = textures[t];
Debug.Assert(width == texture.width, "All Textures must have the same dimension");
Debug.Assert(height == texture.height, "All Textures must have the same dimension");
TextureToTensorProxy(texture, tensorProxy, grayScale, t);
}
}
/// <summary>
/// Puts a Texture2D into a TensorProxy.
/// </summary>
/// <param name="texture">
/// The texture to be put into the tensor.
/// </param>
/// <param name="tensorProxy">
/// TensorProxy to fill with Texture data.
/// </param>
/// <param name="grayScale">
/// If set to <c>true</c> the textures will be converted to grayscale before
/// being stored in the tensor.
/// </param>
/// <param name="textureOffset">
/// Index of the texture being written.
/// </param>
public static void TextureToTensorProxy(
Texture2D texture,
TensorProxy tensorProxy,
bool grayScale,
int textureOffset = 0)
{
var width = texture.width;
var height = texture.height;
for (var t = 0; t < numTextures; t++)
var t = textureOffset;
var texturePixels = texture.GetPixels32();
// During training, we convert from Texture to PNG before sending to the trainer, which has the
// effect of flipping the image. We need another flip here at inference time to match this.
for (var h = height - 1; h >= 0; h--)
var texturePixels = textures[t].GetPixels32();
for (var h = height - 1; h >= 0; h--)
for (var w = 0; w < width; w++)
for (var w = 0; w < width; w++)
var currentPixel = texturePixels[(height - h - 1) * width + w];
if (grayScale)
var currentPixel = texturePixels[(height - h - 1) * width + w];
if (grayScale)
{
data[t, h, w, 0] =
(currentPixel.r + currentPixel.g + currentPixel.b) / 3f / 255.0f;
}
else
{
// For Color32, the r, g and b values are between 0 and 255.
data[t, h, w, 0] = currentPixel.r / 255.0f;
data[t, h, w, 1] = currentPixel.g / 255.0f;
data[t, h, w, 2] = currentPixel.b / 255.0f;
}
data[t, h, w, 0] =
(currentPixel.r + currentPixel.g + currentPixel.b) / 3f / 255.0f;
}
else
{
// For Color32, the r, g and b values are between 0 and 255.
data[t, h, w, 0] = currentPixel.r / 255.0f;
data[t, h, w, 1] = currentPixel.g / 255.0f;
data[t, h, w, 2] = currentPixel.b / 255.0f;
}
/// <summary>

4
UnitySDK/README.md


# Unity ML-Agents SDK
Contains the ML-Agents Unity Project, including
both the core plugin (in `Scripts`), as well as a set
Contains the ML-Agents Unity Project, including
both the core plugin (in `Scripts`), as well as a set
of example environments (in `Examples`).

8
config/gail_config.yaml


strength: 1.0
gamma: 0.99
PyramidsLearning:
Pyramids:
summary_freq: 2000
time_horizon: 128
batch_size: 128

encoding_size: 128
demo_path: demos/ExpertPyramid.demo
CrawlerStaticLearning:
CrawlerStatic:
normalize: true
num_epoch: 3
time_horizon: 1000

encoding_size: 128
demo_path: demos/ExpertCrawlerSta.demo
PushBlockLearning:
PushBlock:
max_steps: 5.0e4
batch_size: 128
buffer_size: 2048

encoding_size: 128
demo_path: demos/ExpertPush.demo
HallwayLearning:
Hallway:
use_recurrent: true
sequence_length: 64
num_layers: 2

2
config/offline_bc_config.yaml


memory_size: 256
demo_path: ./UnitySDK/Assets/Demonstrations/<Your_Demo_File>.demo
HallwayLearning:
Hallway:
trainer: offline_bc
max_steps: 5.0e5
num_epoch: 5

53
config/sac_trainer_config.yaml


strength: 1.0
gamma: 0.99
FoodCollectorLearning:
FoodCollector:
normalize: false
batch_size: 256
buffer_size: 500000

BouncerLearning:
Bouncer:
beta: 0.0
PushBlockLearning:
PushBlock:
beta: 1.0e-2
SmallWallJumpLearning:
SmallWallJump:
max_steps: 1.0e6
hidden_units: 256
summary_freq: 2000

normalize: false
BigWallJumpLearning:
BigWallJump:
max_steps: 1.0e6
hidden_units: 256
summary_freq: 2000

normalize: false
StrikerLearning:
Striker:
beta: 1.0e-2
hidden_units: 256
summary_freq: 2000
time_horizon: 128

GoalieLearning:
Goalie:
beta: 1.0e-2
hidden_units: 256
summary_freq: 2000
time_horizon: 128

PyramidsLearning:
Pyramids:
summary_freq: 2000
time_horizon: 128
batch_size: 128

use_actions: true
demo_path: demos/ExpertPyramid.demo
VisualPyramidsLearning:
VisualPyramids:
beta: 1.0e-2
max_steps: 5.0e5
buffer_size: 500000
init_entcoef: 0.01

use_actions: true
demo_path: demos/ExpertPyramid.demo
3DBallLearning:
3DBall:
normalize: true
batch_size: 64
buffer_size: 12000

init_entcoef: 0.5
3DBallHardLearning:
3DBallHard:
TennisLearning:
Tennis:
CrawlerStaticLearning:
CrawlerStatic:
normalize: true
time_horizon: 1000
batch_size: 256

strength: 1.0
gamma: 0.995
CrawlerDynamicLearning:
CrawlerDynamic:
normalize: true
time_horizon: 1000
batch_size: 256

strength: 1.0
gamma: 0.995
WalkerLearning:
Walker:
normalize: true
time_horizon: 1000
batch_size: 256

strength: 1.0
gamma: 0.995
ReacherLearning:
Reacher:
normalize: true
time_horizon: 1000
batch_size: 128

HallwayLearning:
Hallway:
beta: 0.0
init_entcoef: 0.1
max_steps: 5.0e5
summary_freq: 1000

VisualHallwayLearning:
VisualHallway:
beta: 1.0e-2
gamma: 0.99
batch_size: 64
max_steps: 5.0e5

VisualPushBlockLearning:
VisualPushBlock:
beta: 1.0e-2
gamma: 0.99
buffer_size: 1024
batch_size: 64

GridWorldLearning:
GridWorld:
init_entcoef: 0.01
init_entcoef: 0.5
buffer_init_steps: 1000
buffer_size: 50000
max_steps: 5.0e5
summary_freq: 2000

strength: 1.0
gamma: 0.9
BasicLearning:
Basic:
batch_size: 64
normalize: false
num_layers: 2

42
config/trainer_config.yaml


strength: 1.0
gamma: 0.99
FoodCollectorLearning:
FoodCollector:
normalize: false
beta: 5.0e-3
batch_size: 1024

BouncerLearning:
Bouncer:
PushBlockLearning:
PushBlock:
max_steps: 5.0e4
batch_size: 128
buffer_size: 2048

time_horizon: 64
num_layers: 2
SmallWallJumpLearning:
SmallWallJump:
max_steps: 1.0e6
batch_size: 128
buffer_size: 2048

num_layers: 2
normalize: false
BigWallJumpLearning:
BigWallJump:
max_steps: 1.0e6
batch_size: 128
buffer_size: 2048

num_layers: 2
normalize: false
StrikerLearning:
Striker:
max_steps: 5.0e5
learning_rate: 1e-3
batch_size: 128

num_layers: 2
normalize: false
GoalieLearning:
Goalie:
max_steps: 5.0e5
learning_rate: 1e-3
batch_size: 320

num_layers: 2
normalize: false
PyramidsLearning:
Pyramids:
summary_freq: 2000
time_horizon: 128
batch_size: 128

gamma: 0.99
encoding_size: 256
VisualPyramidsLearning:
VisualPyramids:
time_horizon: 128
batch_size: 64
buffer_size: 2024

gamma: 0.99
encoding_size: 256
3DBallLearning:
3DBall:
normalize: true
batch_size: 64
buffer_size: 12000

beta: 0.001
3DBallHardLearning:
3DBallHard:
normalize: true
batch_size: 1200
buffer_size: 12000

strength: 1.0
gamma: 0.995
TennisLearning:
Tennis:
CrawlerStaticLearning:
CrawlerStatic:
normalize: true
num_epoch: 3
time_horizon: 1000

strength: 1.0
gamma: 0.995
CrawlerDynamicLearning:
CrawlerDynamic:
normalize: true
num_epoch: 3
time_horizon: 1000

strength: 1.0
gamma: 0.995
WalkerLearning:
Walker:
normalize: true
num_epoch: 3
time_horizon: 1000

strength: 1.0
gamma: 0.995
ReacherLearning:
Reacher:
normalize: true
num_epoch: 3
time_horizon: 1000

strength: 1.0
gamma: 0.995
HallwayLearning:
Hallway:
use_recurrent: true
sequence_length: 64
num_layers: 2

summary_freq: 1000
time_horizon: 64
VisualHallwayLearning:
VisualHallway:
use_recurrent: true
sequence_length: 64
num_layers: 1

summary_freq: 1000
time_horizon: 64
VisualPushBlockLearning:
VisualPushBlock:
use_recurrent: true
sequence_length: 32
num_layers: 1

summary_freq: 1000
time_horizon: 64
GridWorldLearning:
GridWorld:
batch_size: 32
normalize: false
num_layers: 1

strength: 1.0
gamma: 0.9
BasicLearning:
Basic:
batch_size: 32
normalize: false
num_layers: 1

2
docs/Background-TensorFlow.md


deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
train the behavior of an agent, the output is a TensorFlow model (.nn) file
that you can then embed within a Learning Brain. Unless you implement a new
that you can then associate with an Agent. Unless you implement a new
algorithm, the use of TensorFlow is mostly abstracted away and behind the
scenes.

75
docs/Basic-Guide.md


## Setting up the ML-Agents Toolkit within Unity
In order to use the ML-Agents toolkit within Unity, you first need to change a few
Unity settings.
Unity settings.
1. Launch Unity
2. On the Projects dialog, choose the **Open** option at the top of the window.

## Running a Pre-trained Model
We include pre-trained models for our agents (`.nn` files) and we use the
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models
inside Unity. In this section, we will use the pre-trained model for the
We include pre-trained models for our agents (`.nn` files) and we use the
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models
inside Unity. In this section, we will use the pre-trained model for the
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
Expand `Game` and click on the `Platform` prefab. You should see the `Platform` prefab in the **Inspector** window.
**Note**: The platforms in the `3DBall` scene were created using the `Platform` prefab. Instead of updating all 12 platforms individually, you can update the `Platform` prefab instead.
2. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder.
Expand `3DBall` and click on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector** window.
**Note**: The platforms in the `3DBall` scene were created using the `3DBall` prefab. Instead of updating all 12 platforms individually, you can update the `3DBall` prefab instead.
3. In the **Project** window, drag the **3DBallLearning** Brain located in
`Assets/ML-Agents/Examples/3DBall/Brains` into the `Brain` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
3. In the **Project** window, drag the **3DBallLearning** Model located in
`Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
4. You should notice that each `Platform` under each `Game` in the **Hierarchy** windows now contains **3DBallLearning** as `Brain`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
once using the search bar in the Scene Hierarchy.
5. In the **Project** window, click on the **3DBallLearning** Brain located in
`Assets/ML-Agents/Examples/3DBall/Brains`. You should see the properties in the **Inspector** window.
6. In the **Project** window, open the `Assets/ML-Agents/Examples/3DBall/TFModels`
folder.
7. Drag the `3DBallLearning` model file from the `Assets/ML-Agents/Examples/3DBall/TFModels`
folder to the **Model** field of the **3DBallLearning** Brain in the **Inspector** window. __Note__ : All of the brains should now have `3DBallLearning` as the TensorFlow model in the `Model` property
8. Select the **InferenceDevice** to use for this model (CPU or GPU).
4. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** windows now contains **3DBallLearning** as `Model`. __Note__ : You can modify multiple game objects in a scene by selecting them all at
once using the search bar in the Scene Hierarchy.
8. Select the **InferenceDevice** to use for this model (CPU or GPU) on the Agent.
_Note: CPU is faster for the majority of ML-Agents toolkit generated models_
9. Click the **Play** button and you will see the platforms balance the balls
using the pre-trained model.

More information and documentation is provided in the
[Python API](Python-API.md) page.
## Training the Brain with Reinforcement Learning
## Training the Model with Reinforcement Learning
To set up the environment for training, you will need to specify which agents are contributing
to the training and which Brain is being trained. You can only perform training with
a `Learning Brain`.
Each platform agent needs an assigned `Learning Brain`. In this example, each platform agent was created using a prefab. To update all of the brains in each platform agent at once, you only need to update the platform agent prefab. In the **Project** window, go to the `Assets/ML-Agents/Examples/3DBall/Prefabs` folder. Expand `Game` and click on the `Platform` prefab. You should see the `Platform` prefab in the **Inspector** window. In the **Project** window, drag the **3DBallLearning** Brain located in `Assets/ML-Agents/Examples/3DBall/Brains` into the `Brain` property under `Ball 3D Agent (Script)` component in the **Inspector** window.
**Note**: The Unity prefab system will modify all instances of the agent properties in your scene. If the agent does not synchronize automatically with the prefab, you can hit the Revert button in the top of the **Inspector** window.
**Note:** Assigning a Brain to an agent (dragging a Brain into the `Brain` property of
the agent) means that the Brain will be making decision for that agent. If the Agent uses a
LearningBrain either Python controls the Brain or the model on the Brain does.
In order to setup the Agents for Training, you will need to edit the
`Behavior Name` under `BehaviorParamters` in the Agent Inspector window.
The `Behavior Name` is used to group agents per behaviors. Note that Agents
sharing the same `Behavior Name` must be agents of the same type using the
same `Behavior Parameters`. You can make sure all your agents have the same
`Behavior Parameters` using Prefabs.
The `Behavior Name` corresponds to the name of the model that will be
generated by the training process and is used to select the hyperparameters
from the training configuration file.
### Training the environment

### After training
You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/<brain_name>.nn` where
`<brain_name>` is the name of the Brain corresponding to the model.
`models/<run-identifier>/<behavior_name>.nn` where
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding to the model.
model into your Learning Brain by following the steps below, which is similar to
model into your Agents by following the steps below, which is similar to
the steps described
[above](#running-a-pre-trained-model).

3. Select the **3DBallLearning** Learning Brain from the Scene hierarchy.
4. Drag the `<brain_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **3DBallLearning**
3. Select the **3DBall** prefab Agent object.
4. Drag the `<behavior_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **Ball3DAgent**
inspector window.
5. Press the :arrow_forward: button at the top of the Editor.

20
docs/Creating-Custom-Protobuf-Messages.md


# Creating Custom Protobuf Messages
Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default.
Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default.
## Implementing a Custom Message

By default, the Python API sends actions to Unity in the form of a floating point list and an optional string-valued text action for each agent.
You can define a custom action type, to either replace or augment the default, by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`.
You can define a custom action type, to either replace or augment the default, by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`.
Instances of custom actions are set via the `custom_action` parameter of the `env.step`. An agent receives a custom action by defining a method with the signature:

Below is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk.
Below is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk.
The `custom_action.proto` file looks like:

EAST=2;
WEST=3;
}
float walkAmount = 1;
float walkAmount = 1;
Direction direction = 2;
}
```

### Custom Reset Parameters
By default, you can configure an environment `env` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats.
By default, you can configure an environment `env` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats.
You can also configure the environment reset using a custom protobuf message. To do this, add fields to the `CustomResetParameters` protobuf message in `custom_reset_parameters.proto`, analogously to `CustomAction` above. Then pass an instance of the message to `env.reset` via the `custom_reset_parameters` keyword parameter.

### Custom Observations
By default, Unity returns observations to Python in the form of a floating-point vector.
By default, Unity returns observations to Python in the form of a floating-point vector.
You can define a custom observation message to supplement that. To do so, add fields to the `CustomObservation` protobuf message in `custom_observation.proto`.
You can define a custom observation message to supplement that. To do so, add fields to the `CustomObservation` protobuf message in `custom_observation.proto`.
Then in your agent, create an instance of a custom observation via `new CommunicatorObjects.CustomObservation`. Then in `CollectObservations`, call `SetCustomObservation` with the custom observation instance as the parameter.

var obs = new CustomObservation();
obs.CustomField = 1.0;
SetCustomObservation(obs);
}
}
}
```

...
result = env.step(...)
result[brain_name].custom_observations[0].customField
result[behavior_name].custom_observations[0].customField
where `brain_name` is the name of the brain attached to the agent.
where `behavior_name` is the `Behavior Name` property of the Agent.

2
docs/FAQ.md


There may be a number of possible causes:
* _Cause_: There may be no agent in the scene with a LearningBrain
* _Cause_: There may be no agent in the scene
* _Cause_: On OSX, the firewall may be preventing communication with the
environment. _Solution_: Add the built environment binary to the list of
exceptions on the firewall by following

4
docs/Feature-Memory.md


will be able to store a vector of floats to be used next time they need to make
a decision.
![Brain Inspector](images/ml-agents-LSTM.png)
![Inspector](images/ml-agents-LSTM.png)
Deciding what the agents should remember in order to solve a task is not easy to
do by hand, but our training algorithms can learn to keep track of what is

## How to use
When configuring the trainer parameters in the `config/trainer_config.yaml`
file, add the following parameters to the Brain you want to use.
file, add the following parameters to the Behavior you want to use.
```json
use_recurrent: true

118
docs/Getting-Started-with-Balance-Ball.md


An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing an
Academy and one or more Brain and Agent objects, and, of course, the other
Academy and one or more Agent objects, and, of course, the other
entities that an agent interacts with.
![Unity Editor](images/mlagents-3DBallHierarchy.png)

The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several agent cubes. Each agent cube in the scene is an
independent agent, but they all share the same Brain. 3D Balance Ball does this
independent agent, but they all share the same Behavior. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
### Academy

itself when needed — but many environments do use these functions to control the
environment around the Agents.
### Brain
As of v0.6, a Brain is a Unity asset and exists within the `UnitySDK` folder. These brains (ex. **3DBallLearning.asset**) are loaded into each Agent object (ex. **Ball3DAgents**). A Brain doesn't store any information about an Agent, it just
routes the Agent's collected observations to the decision making process and
returns the chosen action to the Agent. All Agents can share the same
Brain, but would act independently. The Brain settings tell you quite a bit about how
an Agent works.
You can create new Brain assets by selecting `Assets ->
Create -> ML-Agents -> Brain`. There are 3 types of Brains.
The **Learning Brain** is a Brain that uses a trained neural network to make decisions.
When Unity is connected to Python, the external process will be controlling the Brain.
The external process that is training the neural network will take over decision making for the agents
and ultimately generate a trained neural network. You can also use the
**Learning Brain** with a pre-trained model.
The **Heuristic** Brain allows you to hand-code the Agent logic by extending
the Decision class.
Finally, the **Player** Brain lets you map keyboard commands to actions, which
can be useful when testing your agents and environment. You can also implement your own type of Brain.
In this tutorial, you will use the **Learning Brain** for training.
#### Vector Observation Space
Before making a decision, an agent collects its observation about its state in
the world. The vector observation is a vector of floating point numbers which
contain relevant information for the agent to make decisions.
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
vector containing the Agent's observations contains eight elements: the `x` and
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the Agent's `CollectObservations()` function.)
#### Vector Action Space
An Agent is given instructions from the Brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the Agent logic (the PPO training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `Rigidbody` in the Agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the Agent is an array of indices into tables.
The 3D Balance Ball example is programmed to use both types of vector action
space. You can try training with both settings to observe whether there is a
difference. (Set the `Vector Action Space Size` to 4 when using the discrete
action space and 2 when using continuous.)
### Agent
The Agent is the actor that observes and takes actions in the environment. In

* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
makes decisions. All the Agents in the 3D Balance Ball scene share the same
Brain.
* **Visual Observations** — Defines any Camera objects used by the Agent to
observe its environment. 3D Balance Ball does not use camera observations.
* **Behavior Parameters** — Every Agent must have a Behavior. The Behavior
determines how an Agent makes decisions. More on Behavior Parameters in
the next section.
* **Max Step** — Defines how many simulation steps can occur before the Agent
decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
* **Reset On Done** — Defines whether an Agent starts over when it is finished.

training generalizes to more than a specific starting position and agent cube
attitude.
* agent.CollectObservations() — Called every simulation step. Responsible for
collecting the Agent's observations of the environment. Since the Brain
instance assigned to the Agent is set to the continuous vector observation
collecting the Agent's observations of the environment. Since the Behavior
Parameters of the Agent are set with vector observation
`AddVectorObs` such that vector size adds up to 8.
`AddVectorObs` such that vector size adds up to 8.
by the Brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
by the Agent. The vector action spaces result in a
small change in the agent cube's rotation at each step. The `AgentAction()` function
assigns a reward to the Agent; in this example, an Agent receives a small
positive reward for each step it keeps the ball on the agent cube's head and a larger,

* agent.Heuristic() - When the `Use Heuristic` checkbox is checked in the Behavior
Parameters of the Agent, the Agent will use the `Heuristic()` method to generate
the actions of the Agent. As such, the `Heuristic()` method returns an array of
floats. In the case of the Ball 3D Agent, the `Heuristic()` method converts the
keyboard inputs into actions.
## Training the Brain with Reinforcement Learning
#### Behavior Parameters : Vector Observation Space
Before making a decision, an agent collects its observation about its state in
the world. The vector observation is a vector of floating point numbers which
contain relevant information for the agent to make decisions.
The Behavior Parameters of the 3D Balance Ball example uses a **Space Size** of 8.
This means that the feature
vector containing the Agent's observations contains eight elements: the `x` and
`z` components of the agent cube's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
defined in the Agent's `CollectObservations()` function.)
#### Behavior Parameters : Vector Action Space
An Agent is given instructions in the form of a float array of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
element of the vector means is defined by the Agent logic (the training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
element might represent a force or torque applied to a `Rigidbody` in the Agent.
The **Discrete** action vector space defines its actions as tables. An action
given to the Agent is an array of indices into tables.
The 3D Balance Ball example is programmed to use continuous action
space with `Space Size` of 2.
## Training with Reinforcement Learning
Now that we have an environment, we can perform the training.

![Example TensorBoard Run](images/mlagents-TensorBoard.png)
## Embedding the Trained Brain into the Unity Environment (Experimental)
## Embedding the Model into the Unity Environment
use it with Agents having a **Learning Brain**.
use it with compatible Agents (the Agents that generated the model).
__Note:__ Do not just close the Unity Window once the `Saved Model` message appears.
Either wait for the training process to close the window or press Ctrl+C at the
command-line prompt. If you close the window manually, the `.nn` file

To embed the trained model into Unity, follow the later part of [Training the
Brain with Reinforcement
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
Model with Reinforcement
Learning](Basic-Guide.md#training-the-model-with-reinforcement-learning) section
of the Basic Guide page.

12
docs/Glossary.md


environment.
* **Agent** - Unity Component which produces observations and takes actions in
the environment. Agents actions are determined by decisions produced by a
linked Brain.
* **Brain** - Unity Asset which makes decisions for the agents linked to it.
* **Decision** - The specification produced by a Brain for an action to be
Policy.
* **Policy** - The decision making mechanism, typically a neural network model.
* **Decision** - The specification produced by a Policy for an action to be
* **Environment** - The Unity scene which contains Agents, Academy, and Brains.
* **Environment** - The Unity scene which contains Agents and the Academy.
* **FixedUpdate** - Unity method called each time the game engine is
stepped. ML-Agents logic should be placed here.
* **Frame** - An instance of rendering the main camera for the display.

logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
* **Trainer** - Python class which is responsible for training a given
Brain. Contains TensorFlow graph which makes decisions for Learning Brain.
* **Trainer** - Python class which is responsible for training a given
group of Agents.

16
docs/Installation.md


## Environment Setup
We now support a single mechanism for installing ML-Agents on Mac/Windows/Linux using Virtual
Environments. For more information on Virtual Environments and installation instructions,
Environments. For more information on Virtual Environments and installation instructions,
follow this [guide](Using-Virtual-Environment.md).
### Clone the ML-Agents Toolkit Repository

It also contains many [example environments](Learning-Environment-Examples.md)
to help you get started.
The `ml-agents` subdirectory contains a Python package which provides deep reinforcement
The `ml-agents` subdirectory contains a Python package which provides deep reinforcement
the `ml-agents` package depends on.
the `ml-agents` package depends on.
In order to use ML-Agents toolkit, you need Python 3.6.1 or higher.
In order to use ML-Agents toolkit, you need Python 3.6.1 or higher.
[Download](https://www.python.org/downloads/) and install the latest version of Python if you do not already have it.
If your Python environment doesn't include `pip3`, see these

pip3 install mlagents
```
Note that this will install `ml-agents` from PyPi, _not_ from the cloned repo.
Note that this will install `ml-agents` from PyPi, _not_ from the cloned repo.
parameters you can use with `mlagents-learn`.
parameters you can use with `mlagents-learn`.
By installing the `mlagents` package, the dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
Some of the primary dependencies include:

### Installing for Development
If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install
If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install
the packages from the cloned repo rather than from PyPi. To do this, you will need to install
`ml-agents` and `ml-agents-envs` separately. From the repo's root directory, run:

Running pip with the `-e` flag will let you make changes to the Python files directly and have those
reflected when you run `mlagents-learn`. It is important to install these packages in this order as the
`mlagents` package depends on `mlagents_envs`, and installing it in the other
`mlagents` package depends on `mlagents_envs`, and installing it in the other
order will download `mlagents_envs` from PyPi.
## Next Steps

7
docs/Learning-Environment-Best-Practices.md


lessons which progressively increase in difficulty are presented to the agent
([learn more here](Training-Curriculum-Learning.md)).
* When possible, it is often helpful to ensure that you can complete the task by
using a Player Brain to control the agent.
* It is often helpful to make many copies of the agent, and attach the Brain to
be trained to all of these agents. In this way the Brain can get more feedback
using a heuristic to control the agent. To do so, check the `Use Heuristic`
checkbox on the Agent and implement the `Heuristic()` method on the Agent.
* It is often helpful to make many copies of the agent, and give them the same
`Behavior Name`. In this way the learning process can get more feedback
information from all of these agents, which helps it train faster.
## Rewards

212
docs/Learning-Environment-Create-New.md


containing the environment. Your Academy class can implement a few optional
methods to update the scene independently of any agents. For example, you can
add, move, or delete agents and other entities in the environment.
3. Create one or more Brain assets by clicking **Assets** > **Create** >
**ML-Agents** > **Brain**, and naming them appropriately.
4. Implement your Agent subclasses. An Agent subclass defines the code an Agent
3. Implement your Agent subclasses. An Agent subclass defines the code an Agent
5. Add your Agent subclasses to appropriate GameObjects, typically, the object
4. Add your Agent subclasses to appropriate GameObjects, typically, the object
in the scene that represents the Agent in the simulation. Each Agent object
must be assigned a Brain object.

importing the ML-Agents assets into it:
1. Launch the Unity Editor and create a new project named "RollerBall".
2. Make sure that the Scripting Runtime Version for the project is set to use
**.NET 4.x Equivalent** (This is an experimental option in Unity 2017,
2. Make sure that the Scripting Runtime Version for the project is set to use
**.NET 4.x Equivalent** (This is an experimental option in Unity 2017,
4. Drag the `ML-Agents` and `Gizmos` folders from `UnitySDK/Assets` to the Unity
4. Drag the `ML-Agents` folder from `UnitySDK/Assets` to the Unity
Editor Project window.
Your Unity **Project** window should contain the following assets:

1. In the Unity Project window, double-click the `RollerAcademy` script to open
it in your code editor. (By default new scripts are placed directly in the
**Assets** folder.)
2. In the code editor, add the statement, `using MLAgents;`.
2. In the code editor, add the statement, `using MLAgents;`.
3. Change the base class from `MonoBehaviour` to `Academy`.
4. Delete the `Start()` and `Update()` methods that were added by default.

The default settings for the Academy properties are also fine for this
environment, so we don't need to change anything for the RollerAcademy component
in the Inspector window.
in the Inspector window.
## Add Brain Assets
The Brain object encapsulates the decision making process. An Agent sends its
observations to its Brain and expects a decision in return. The type of the Brain
(Learning, Heuristic or Player) determines how the Brain makes decisions.
To create the Brain:
1. Go to **Assets** > **Create** > **ML-Agents** and select the type of Brain asset
you want to create. For this tutorial, create a **Learning Brain** and
a **Player Brain**.
2. Name them `RollerBallBrain` and `RollerBallPlayer` respectively.
![Creating a Brain Asset](images/mlagents-NewTutBrain.png)
We will come back to the Brain properties later, but leave the Model property
of the `RollerBallBrain` as `None` for now. We will need to first train a
model before we can add it to the **Learning Brain**.
## Implement an Agent
To create the Agent:

1. In the Unity Project window, double-click the `RollerAgent` script to open it
in your code editor.
2. In the editor, add the `using MLAgents;` statement and then change the base
2. In the editor, add the `using MLAgents;` statement and then change the base
class from `MonoBehaviour` to `Agent`.
3. Delete the `Update()` method, but we will use the `Start()` function, so
leave it alone for now.

this reference, add a public field of type `Transform` to the RollerAgent class.
Public fields of a component in Unity get displayed in the Inspector window,
allowing you to choose which GameObject to use as the target in the Unity
Editor.
Editor.
To reset the Agent's velocity (and later to apply force to move the
agent) we need a reference to the Rigidbody component. A

In our case, the information our Agent collects includes:
* Position of the target.
* Position of the target.
* Position of the Agent itself.
* Position of the Agent itself.
```csharp
AddVectorObs(this.transform.position);

### Rewards
Reinforcement learning requires rewards. Assign rewards in the `AgentAction()`
function. The learning algorithm uses the rewards assigned to the Agent during
function. The learning algorithm uses the rewards assigned to the Agent during
assigned task. In this case, the Agent is given a reward of 1.0 for reaching the
assigned task. In this case, the Agent is given a reward of 1.0 for reaching the
reward of 1.0 and marks the agent as finished by calling the `Done()` method
reward of 1.0 and marks the agent as finished by calling the `Done()` method
on the Agent.
```csharp

## Final Editor Setup
Now, that all the GameObjects and ML-Agent components are in place, it is time
to connect everything together in the Unity Editor. This involves assigning the
Brain asset to the Agent, changing some of the Agent Component's properties, and
setting the Brain properties so that they are compatible with our Agent code.
to connect everything together in the Unity Editor. This involves
changing some of the Agent Component's properties so that they are compatible
with our Agent code.
2. Drag the Brain **RollerBallPlayer** from the Project window to the
RollerAgent **Brain** field.
3. Change **Decision Interval** from `1` to `10`.
4. Drag the Target GameObject from the Hierarchy window to the RollerAgent
2. Change **Decision Interval** from `1` to `10`.
3. Drag the Target GameObject from the Hierarchy window to the RollerAgent
![Assign the Brain to the RollerAgent](images/mlagents-NewTutAssignBrain.png)
Finally, select the **RollerBallBrain** Asset in the **Project** window so that you can
see its properties in the Inspector window. Set the following properties:
* `Vector Observation` `Space Size` = 8
* `Vector Action` `Space Type` = **Continuous**
* `Vector Action` `Space Size` = 2
Select the **RollerBallPlayer** Asset in the **Project** window and set the same
property values.
4. Modify the Behavior Parameters of the Agent :
* `Behavior Name` to *RollerBallBrain*
* `Vector Observation` `Space Size` = 8
* `Vector Action` `Space Type` = **Continuous**
* `Vector Action` `Space Size` = 2
Now you are ready to test the environment before training.

an extended training run. The reason we have created the `RollerBallPlayer` Brain
is so that we can control the Agent using direct keyboard
control. But first, you need to define the keyboard to action mapping. Although
the RollerAgent only has an `Action Size` of two, we will use one key to specify
positive values and one to specify negative values for each action, for a total
of four keys.
an extended training run. To do so, you will need to implement the `Heuristic()`
method on the RollerAgent class. This will allow you control the Agent using
direct keyboard control.
1. Select the `RollerBallPlayer` Asset to view its properties in the Inspector.
2. Expand the **Key Continuous Player Actions** dictionary (only visible when using
a **PlayerBrain**).
3. Set **Size** to 4.
4. Set the following mappings:
The `Heuristic()` method will look like this :
| Element | Key | Index | Value |
| :------------ | :---: | :------: | :------: |
| Element 0 | D | 0 | 1 |
| Element 1 | A | 0 | -1 |
| Element 2 | W | 1 | 1 |
| Element 3 | S | 1 | -1 |
```csharp
public override float[] Heuristic()
{
var action = new float[2];
action[0] = Input.GetAxis("Horizontal");
action[1] = Input.GetAxis("Vertical");
return action;
}
```
The **Index** value corresponds to the index of the action array passed to
`AgentAction()` function. **Value** is assigned to action[Index] when **Key** is
pressed.
What this code means is that the heuristic will generate an action corresponding
to the values of the "Horizontal" and "Vertical" input axis (which correspond to
the keyboard arrow keys).
In order for the Agent to use the Heuristic, You will need to check the `Use Heuristic`
checkbox in the `Behavior Parameters` of the RollerAgent.
Press **Play** to run the scene and use the WASD keys to move the Agent around
Press **Play** to run the scene and use the arrows keys to move the Agent around
the platform. Make sure that there are no errors displayed in the Unity editor
Console window and that the Agent resets when it reaches its target or falls
from the platform. Note that for more involved debugging, the ML-Agents SDK

## Training the Environment
Now you can train the Agent. To get ready for training, you must first drag the
`RollerBallBrain` asset to the **RollerAgent** GameObject `Brain` field to change
to the LearningBrain. From there, the process is
the same as described in [Training ML-Agents](Training-ML-Agents.md). Note that the
The process is
the same as described in [Training ML-Agents](Training-ML-Agents.md). Note that the
pass to the `mlagents-learn` program. Using the default settings specified
pass to the `mlagents-learn` program. Using the default settings specified
RollerAgent takes about 300,000 steps to train. However, you can change the
RollerAgent takes about 300,000 steps to train. However, you can change the
Since this example creates a very simple training environment with only a few inputs
and outputs, using small batch and buffer sizes speeds up the training considerably.
However, if you add more complexity to the environment or change the reward or
observation functions, you might also find that training performs better with different
Since this example creates a very simple training environment with only a few inputs
and outputs, using small batch and buffer sizes speeds up the training considerably.
However, if you add more complexity to the environment or change the reward or
observation functions, you might also find that training performs better with different
**Note:** In addition to setting these hyperparameter values, the Agent
**Note:** In addition to setting these hyperparameter values, the Agent
in this simple environment, speeds up training.
in this simple environment, speeds up training.
To train in the editor, run the following Python command from a Terminal or Console
To train in the editor, run the following Python command from a Terminal or Console
(where `config.yaml` is a copy of `trainer_config.yaml` that you have edited
to change the `batch_size` and `buffer_size` hyperparameters for your brain.)
(where `config.yaml` is a copy of `trainer_config.yaml` that you have edited
to change the `batch_size` and `buffer_size` hyperparameters for your trainer.)
**Note:** If you get a `command not found` error when running this command, make sure
that you have followed the *Install Python and mlagents Package* section of the
**Note:** If you get a `command not found` error when running this command, make sure
that you have followed the *Install Python and mlagents Package* section of the
To monitor the statistics of Agent performance during training, use
[TensorBoard](Using-Tensorboard.md).
To monitor the statistics of Agent performance during training, use
[TensorBoard](Using-Tensorboard.md).
In particular, the *cumulative_reward* and *value_estimate* statistics show how
well the Agent is achieving the task. In this example, the maximum reward an
In particular, the *cumulative_reward* and *value_estimate* statistics show how
well the Agent is achieving the task. In this example, the maximum reward an
**Note:** If you use TensorBoard, always increment or change the `run-id`
you pass to the `mlagents-learn` command for each training run. If you use
the same id value, the statistics for multiple runs are combined and become
**Note:** If you use TensorBoard, always increment or change the `run-id`
you pass to the `mlagents-learn` command for each training run. If you use
the same id value, the statistics for multiple runs are combined and become
In many of the [example environments](Learning-Environment-Examples.md), many copies of
In many of the [example environments](Learning-Environment-Examples.md), many copies of
simply by instantiating many Agents which share the same Brain. Use the following steps to
parallelize your RollerBall environment.
simply by instantiating many Agents which share the `Behavior Parameters`. Use the following steps to
parallelize your RollerBall environment.
1. Right-click on your Project Hierarchy and create a new empty GameObject.
Name it TrainingArea.
2. Reset the TrainingArea’s Transform so that it is at (0,0,0) with Rotation (0,0,0)
and Scale (1,1,1).
3. Drag the Floor, Target, and RollerAgent GameObjects in the Hierarchy into the
TrainingArea GameObject.
4. Drag the TrainingArea GameObject, along with its attached GameObjects, into your
1. Right-click on your Project Hierarchy and create a new empty GameObject.
Name it TrainingArea.
2. Reset the TrainingArea’s Transform so that it is at (0,0,0) with Rotation (0,0,0)
and Scale (1,1,1).
3. Drag the Floor, Target, and RollerAgent GameObjects in the Hierarchy into the
TrainingArea GameObject.
4. Drag the TrainingArea GameObject, along with its attached GameObjects, into your
5. You can now instantiate copies of the TrainingArea prefab. Drag them into your scene,
positioning them so that they do not overlap.
5. You can now instantiate copies of the TrainingArea prefab. Drag them into your scene,
positioning them so that they do not overlap.
### Editing the Scripts
### Editing the Scripts
You will notice that in the previous section, we wrote our scripts assuming that our
TrainingArea was at (0,0,0), performing checks such as `this.transform.position.y < 0`
to determine whether our agent has fallen off the platform. We will need to change
this if we are to use multiple TrainingAreas throughout the scene.
You will notice that in the previous section, we wrote our scripts assuming that our
TrainingArea was at (0,0,0), performing checks such as `this.transform.position.y < 0`
to determine whether our agent has fallen off the platform. We will need to change
this if we are to use multiple TrainingAreas throughout the scene.
A quick way to adapt our current code is to use
localPosition rather than position, so that our position reference is in reference
to the prefab TrainingArea's location, and not global coordinates.
A quick way to adapt our current code is to use
localPosition rather than position, so that our position reference is in reference
to the prefab TrainingArea's location, and not global coordinates.
This is only one way to achieve this objective. Refer to the
This is only one way to achieve this objective. Refer to the
[example environments](Learning-Environment-Examples.md) for other ways we can achieve relative positioning.
## Review: Scene Layout

There are two kinds of game objects you need to include in your scene in order
to use Unity ML-Agents: an Academy and one or more Agents. You also need to
have brain assets linked appropriately to your Agents and to the Academy.
to use Unity ML-Agents: an Academy and one or more Agents.
* If you are using multiple training areas, make sure all the Agents have the same `Behavior Name`
and `Behavior Parameters`

2
docs/Learning-Environment-Design-Academy.md


# Creating an Academy
An Academy orchestrates all the Agent and Brain objects in a Unity scene. Every
An Academy orchestrates all the Agent objects in a Unity scene. Every
scene containing Agents must contain a single Academy. To use an Academy, you
must create your own subclass. However, all the methods you can override are
optional.

206
docs/Learning-Environment-Design-Agents.md


reinforcement learning and the reward you assign to estimate the value of the
agent's current state toward accomplishing its tasks.
An Agent passes its observations to its Brain. The Brain, then, makes a decision
An Agent passes its observations to its Policy. The Policy, then, makes a decision
and passes the chosen action back to the agent. Your agent code must execute the
action, for example, move the agent in one direction or another. In order to
[train an agent using reinforcement learning](Learning-Environment-Design.md),

The Brain class abstracts out the decision making logic from the Agent itself so
that you can use the same Brain in multiple Agents. How a Brain makes its
decisions depends on the kind of Brain it is. A Player Brain allows you
to directly control the agent. A Heuristic Brain allows you to create a
decision script to control the agent with a set of rules. These two Brains
do not involve neural networks but they can be useful for debugging. The
Learning Brain allows you to train and use neural network models for
your Agents. See [Brains](Learning-Environment-Design-Brains.md).
The Policy class abstracts out the decision making logic from the Agent itself so
that you can use the same Policy in multiple Agents. How a Policy makes its
decisions depends on the kind of Policy it is. You can change the Policy of an
Agent by changing its `Behavior Parameters`. If you check `Use Heuristic`, the
Agent will use its `Heuristic()` method to make decisions which can allow you to
control the Agent manually or write your own Policy. If the Agent has a `Model`
file, it Policy will use the neural network `Model` to take decisions.
## Decisions
The observation-decision-action-reward cycle repeats after a configurable number

agent in a robotic simulator that must provide fine-control of joint torques
should make its decisions every step of the simulation. On the other hand, an
agent that only needs to make decisions when certain game or simulation events
occur, should use on-demand decision making.
occur, should use on-demand decision making.
using the same Brain instance can use a different frequency. During simulation
using the same Model can use a different frequency. During simulation
On demand decision making allows Agents to request decisions from their Brains
On demand decision making allows Agents to request decisions from their Policies
only when needed instead of receiving decisions at a fixed frequency. This is
useful when the agents commit to an action for a variable number of steps or
when the agents cannot make decisions at the same time. This typically the case

When you turn on **On Demand Decisions** for an Agent, your agent code must call
the `Agent.RequestDecision()` function. This function call starts one iteration
of the observation-decision-action-reward cycle. The Brain invokes the Agent's
`CollectObservations()` method, makes a decision and returns it by calling the
`AgentAction()` method. The Brain waits for the Agent to request the next
of the observation-decision-action-reward cycle. The Agent's
`CollectObservations()` method is called, the Policy makes a decision and
returns it by calling the
`AgentAction()` method. The Policy waits for the Agent to request the next
decision before starting another iteration.
## Observations

When you use vector observations for an Agent, implement the
`Agent.CollectObservations()` method to create the feature vector. When you use
**Visual Observations**, you only need to identify which Unity Camera objects
or RenderTextures will provide images and the base Agent class handles the rest.
You do not need to implement the `CollectObservations()` method when your Agent
**Visual Observations**, you only need to identify which Unity Camera objects
or RenderTextures will provide images and the base Agent class handles the rest.
You do not need to implement the `CollectObservations()` method when your Agent
represent the agent's observation at each step of the simulation. The Brain
class calls the `CollectObservations()` method of each of its Agents. Your
represent the agent's observation at each step of the simulation. The Policy
class calls the `CollectObservations()` method of each Agent. Your
implementation of this function must call `AddVectorObs` to add vector
observations.

an agent's observations to a fixed subset. For example, instead of observing
every enemy agent in an environment, you could only observe the closest five.
When you set up an Agent's Brain in the Unity Editor, set the following
When you set up an Agent's `Behavior Parameters` in the Unity Editor, set the following
properties to use a continuous vector observation:
* **Space Size** — The state size must match the length of your feature vector.

### Multiple Visual Observations
Visual observations use rendered textures directly or from one or more
cameras in a scene. The Brain vectorizes the textures into a 3D Tensor which
can be fed into a convolutional neural network (CNN). For more information on
CNNs, see [this guide](http://cs231n.github.io/convolutional-networks/). You
Visual observations use rendered textures directly or from one or more
cameras in a scene. The Policy vectorizes the textures into a 3D Tensor which
can be fed into a convolutional neural network (CNN). For more information on
CNNs, see [this guide](http://cs231n.github.io/convolutional-networks/). You
Agents using visual observations can capture state of arbitrary complexity and
are useful when the state is difficult to describe numerically. However, they
are also typically less efficient and slower to train, and sometimes don't
Agents using visual observations can capture state of arbitrary complexity and
are useful when the state is difficult to describe numerically. However, they
are also typically less efficient and slower to train, and sometimes don't
Visual observations can be derived from Cameras or RenderTextures within your scene.
To add a visual observation to an Agent, either click on the `Add Camera` or
`Add RenderTexture` button in the Agent inspector. Then drag the camera or
render texture you want to add to the `Camera` or `RenderTexture` field.
You can have more than one camera or render texture and even use a combination
of both attached to an Agent.
Visual observations can be derived from Cameras or RenderTextures within your scene.
To add a visual observation to an Agent, add either a Camera Sensor Component
or RenderTextures Sensor Component to the Agent. Then drag the camera or
render texture you want to add to the `Camera` or `RenderTexture` field.
You can have more than one camera or render texture and even use a combination
of both attached to an Agent. For each visual observation, set the width and height
of the image (in pixels) and whether or not the observation is color or grayscale.
![Agent Camera](images/visual-observation.png)

In addition, make sure that the Agent's Brain expects a visual observation. In
the Brain inspector, under **Brain Parameters** > **Visual Observations**,
specify the number of Resolutions the Agent is using for its visual observations.
For each visual observation, set the width and height of the image (in pixels)
and whether or not the observation is color or grayscale (when `Black And White`
is checked).
Each Agent that uses the same Policy must have the same number of visual observations,
and they must all have the same resolutions (including whether or not they are grayscale).
Additionally, each Sensor Component on an Agent must have a unique name so that they can
be sorted deterministically (the name must be unique for that Agent, but multiple Agents can
have a Sensor Component with the same name).
For instance, if you are using two cameras and one render texture on your Agent,
three **Visual Observations** have to be added to the **Brain Parameters**.
During runtime, if a combination of `Cameras` and `RenderTextures` is used, all
cameras are captured first, then all render textures will be added, in the
order they appear in the editor.
![Agent Camera and RenderTexture combination](images/visual-observation-combination.png)
RenderTexture observations will throw an `Exception` if the width/height doesn't
match the resolution specified under **Brain Parameters** > **Visual Observations**.
When using `RenderTexture` visual observations, a handy feature for debugging is
adding a `Canvas`, then adding a `Raw Image` with it's texture set to the Agent's
`RenderTexture`. This will render the agent observation on the game screen.
When using `RenderTexture` visual observations, a handy feature for debugging is
adding a `Canvas`, then adding a `Raw Image` with it's texture set to the Agent's
`RenderTexture`. This will render the agent observation on the game screen.
The [GridWorld environment](Learning-Environment-Examples.md#gridworld)
is an example on how to use a RenderTexture for both debugging and observation. Note
that in this example, a Camera is rendered to a RenderTexture, which is then used for
observations and debugging. To update the RenderTexture, the Camera must be asked to
render every time a decision is requested within the game code. When using Cameras
The [GridWorld environment](Learning-Environment-Examples.md#gridworld)
is an example on how to use a RenderTexture for both debugging and observation. Note
that in this example, a Camera is rendered to a RenderTexture, which is then used for
observations and debugging. To update the RenderTexture, the Camera must be asked to
render every time a decision is requested within the game code. When using Cameras
as observations directly, this is done automatically by the Agent.
![Agent RenderTexture Debug](images/gridworld.png)

An action is an instruction from the Brain that the agent carries out. The
An action is an instruction from the Policy that the agent carries out. The
action is passed to the Agent as a parameter when the Academy invokes the
agent's `AgentAction()` function. When you specify that the vector action space
is **Continuous**, the action parameter passed to the Agent is an array of

is an array of indices. The number of indices in the array is determined by the
number of branches defined in the `Branches Size` property. Each branch
corresponds to an action table, you can specify the size of each table by
modifying the `Branches` property. The `Branch Descriptions` property holds the names
for each available branch. Set the `Vector Action Space Size` and
`Vector Action Space Type` properties on the Brain object assigned to the Agent
(using the Unity Editor Inspector window).
modifying the `Branches` property.
Neither the Brain nor the training algorithm know anything about what the action
Neither the Policy nor the training algorithm know anything about what the action
values themselves mean. The training algorithm simply tries different values for
the action list and observes the affect on the accumulated rewards over time and
many training episodes. Thus, the only place actions are defined for an Agent is

For example, if you designed an agent to move in two dimensions, you could use
either continuous or the discrete vector actions. In the continuous case, you
would set the vector action size to two (one for each dimension), and the
agent's Brain would create an action with two floating point values. In the
agent's Policy would create an action with two floating point values. In the
direction), and the Brain would create an action array containing a single
direction), and the Policy would create an action array containing a single
movement), and the Brain would create an action array containing two elements
movement), and the Policy would create an action array containing two elements
test your action logic using a **Player** Brain, which lets you map keyboard
commands to actions. See [Brains](Learning-Environment-Design-Brains.md).
test your action logic using the `Heuristic()` method of the Agent,
which lets you map keyboard
commands to actions.
The [3DBall](Learning-Environment-Examples.md#3dball-3d-balance-ball) and
[Area](Learning-Environment-Examples.md#push-block) example environments are set

When an Agent uses a Brain set to the **Continuous** vector action space, the
When an Agent uses a Policy set to the **Continuous** vector action space, the
length equal to the Brain object's `Vector Action Space Size` property value.
length equal to the `Vector Action Space Size` property value.
example, the training process learns to control the speed of the Agent though
example, the training process learns to control the speed of the Agent through
this parameter.
The [Reacher example](Learning-Environment-Examples.md#reacher) defines a

### Discrete Action Space
When an Agent uses a Brain set to the **Discrete** vector action space, the
When an Agent uses a **Discrete** vector action space, the
action parameter passed to the Agent's `AgentAction()` function is an array
containing indices. With the discrete vector action space, `Branches` is an
array of integers, each value corresponds to the number of possibilities for

#### Masking Discrete Actions
When using Discrete Actions, it is possible to specify that some actions are
impossible for the next decision. Then the Agent is controlled by a
Learning Brain, the Agent will be unable to perform the specified action. Note
that when the Agent is controlled by a Player or Heuristic Brain, the Agent will
impossible for the next decision. When the Agent is controlled by a
neural network, the Agent will be unable to perform the specified action. Note
that when the Agent is controlled by its Heuristic, the Agent will
still be able to decide to perform the masked action. In order to mask an
action, call the method `SetActionMask` within the `CollectObservation` method :

reward over time. The better your reward mechanism, the better your agent will
learn.
**Note:** Rewards are not used during inference by a Brain using an already
trained policy and is also not used during imitation learning.
**Note:** Rewards are not used during inference by an Agent using a
trained model and is also not used during imitation learning.
to display the cumulative reward received by an Agent. You can even use a Player
Brain to control the Agent while watching how it accumulates rewards.
to display the cumulative reward received by an Agent. You can even use the
Agent's Heuristic to control the Agent while watching how it accumulates rewards.
Allocate rewards to an Agent by calling the `AddReward()` method in the
`AgentAction()` function. The reward assigned between each decision

platform.
Note that all of these environments make use of the `Done()` method, which manually
terminates an episode when a termination condition is reached. This can be
terminates an episode when a termination condition is reached. This can be
![Agent Inspector](images/agent.png)
![Agent Inspector](images/3dball_learning_brain.png)
* `Brain` - The Brain to register this Agent to. Can be dragged into the
inspector using the Editor.
* `Visual Observations` - A list of `Cameras` or `RenderTextures` which will
be used to generate observations.
* `Behavior Parameters` - The parameters dictating what Policy the Agent will
receive.
* `Vector Observation`
* `Space Size` - Length of vector observation for the Agent.
* `Stacked Vectors` - The number of previous vector observations that will
be stacked and used collectively for decision making. This results in the
effective size of the vector observation being passed to the Policy being:
_Space Size_ x _Stacked Vectors_.
* `Vector Action`
* `Space Type` - Corresponds to whether action vector contains a single
integer (Discrete) or a series of real-valued floats (Continuous).
* `Space Size` (Continuous) - Length of action vector.
* `Branches` (Discrete) - An array of integers, defines multiple concurrent
discrete actions. The values in the `Branches` array correspond to the
number of possible discrete values for each action branch.
* `Model` - The neural network model used for inference (obtained after
training)
* `Inference Device` - Whether to use CPU or GPU to run the model during inference
* `Use Heuristic` - If checked, the Agent will use its 'Heuristic()' method for
decisions.
* `Max Step` - The per-agent maximum number of steps. Once this number is
reached, the Agent will be reset if `Reset On Done` is checked.
* `Reset On Done` - Whether the Agent's `AgentReset()` function should be called

Frequency` steps and perform an action every step. In the example above,
`CollectObservations()` will be called every 5 steps and `AgentAction()`
will be called at every step. This means that the Agent will reuse the
decision the Brain has given it.
decision the Policy has given it.
causes the Agent to collect its observations and ask the Brain for a
causes the Agent to collect its observations and ask the Policy for a
decision at the next step of the simulation. Note that when an Agent
requests a decision, it also request an action. This is to ensure that
all decisions lead to an action during training.

throughout the training process, we imagine it can be more broadly useful. You
can learn more [here](Feature-Monitor.md).
## Instantiating an Agent at Runtime
To add an Agent to an environment at runtime, use the Unity
`GameObject.Instantiate()` function. It is typically easiest to instantiate an
agent from a [Prefab](https://docs.unity3d.com/Manual/Prefabs.html) (otherwise,
you have to instantiate every GameObject and Component that make up your Agent
individually). In addition, you must assign a Brain instance to the new Agent
and initialize it by calling its `AgentReset()` method. For example, the
following function creates a new Agent given a Prefab, Brain instance, location,
and orientation:
```csharp
private void CreateAgent(GameObject AgentPrefab, Brain brain, Vector3 position, Quaternion orientation)
{
GameObject AgentObj = Instantiate(agentPrefab, position, orientation);
Agent Agent = AgentObj.GetComponent<Agent>();
Agent.GiveBrain(brain);
Agent.AgentReset();
}
```
the next step in the simulation) so that the Brain knows that this Agent is no
the next step in the simulation) so that the Policy knows that this Agent is no
longer active. Thus, the best place to destroy an Agent is in the
`Agent.AgentOnDone()` function:

66
docs/Learning-Environment-Design.md


During training, the external Python training process communicates with the
Academy to run a series of episodes while it collects data and optimizes its
neural network model. The kind of Brain assigned to an Agent determines whether
it participates in training or not. The **Learning Brain** can be used to train
or execute a TensorFlow model. When training is completed
neural network model. When training is completed
successfully, you can add the trained model file to your Unity project for later
use.

2. Calls the `AgentReset()` function for each Agent in the scene.
3. Calls the `CollectObservations()` function for each Agent in the scene.
4. Uses each Agent's Brain to decide on the Agent's next action.
4. Uses each Agent's Policy to decide on the Agent's next action.
the action chosen by the Agent's Brain. (This function is not called if the
the action chosen by the Agent's Policy. (This function is not called if the
Agent is done.)
7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
Step` count or has otherwise marked itself as `done`. Optionally, you can set

To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
a single Academy subclass and as many Agent subclasses
as you need. The Brain assets are present in the project and should be grouped
together and named according to the type of agents they are compatible with.
as you need.
You must assign a Brain to every Agent, but you can share Brains between
multiple Agents. Each Agent will make its own observations and act
independently, but will use the same decision-making logic and, for **Learning
Brains**, the same trained TensorFlow model.
### Academy

See [Academy](Learning-Environment-Design-Academy.md) for a complete list of
the Academy properties and their uses.
### Brain
The Brain encapsulates the decision making process. Every Agent must be
assigned a Brain, but you can use the same Brain with more than one Agent.
__Note__:You can assign the same Brain to multiple agents by using prefabs
or by selecting all the agents you want to attach the Brain to using the
search bar on top of the Scene Hierarchy window.
To Create a Brain, go to `Assets -> Create -> Ml-Agents` and select the
type of Brain you want to use. During training, use a **Learning Brain** .
When you want to use the trained model, import the model file into the Unity
project, add it to the **Model** property of the **Learning Brain**.
If the Python process is not active, the **Learning Brain** will not train but
use its model. See
[Brains](Learning-Environment-Design-Brains.md) for details on using the
different types of Brains. You can create new kinds of Brains if the three
built-in don't do what you need.
The Brain class has several important properties that you can set using the
Inspector window. These properties must be appropriate for the Agents using the
Brain. For example, the `Vector Observation Space Size` property must match the
length of the feature vector created by an Agent exactly. See
[Agents](Learning-Environment-Design-Agents.md) for information about creating
agents and setting up a Brain instance correctly.
See [Brains](Learning-Environment-Design-Brains.md) for a complete list of the
Brain properties.
in a football game or a car object in a vehicle simulation. Every Agent must be
assigned a Brain.
in a football game or a car object in a vehicle simulation. Every Agent must
have appropriate `Behavior Parameters`.
* `AgentAction()` — Carries out the action chosen by the Agent's Brain and
* `AgentAction()` — Carries out the action chosen by the Agent's Policy and
Your implementations of these functions determine how the properties of the
Brain assigned to this Agent must be set.
Your implementations of these functions determine how the Behavior Parameters
assigned to this Agent must be set.
has finished (or irrevocably failed) its task by calling the `Done()` function.
You can also set the Agent's `Max Steps` property to a positive value and the
Agent will consider itself done after it has taken that many steps. If you
set an Agent's `ResetOnDone` property to true, then the Agent can attempt its
task several times in one episode. (Use the `Agent.AgentReset()` function to
has finished (or irrevocably failed) its task by calling the `Done()` function.
You can also set the Agent's `Max Steps` property to a positive value and the
Agent will consider itself done after it has taken that many steps. If you
set an Agent's `ResetOnDone` property to true, then the Agent can attempt its
task several times in one episode. (Use the `Agent.AgentReset()` function to
prepare the Agent to start again.)
See [Agents](Learning-Environment-Design-Agents.md) for detailed information

properties that can be set differently for a training scene versus a regular
scene. The Academy's **Configuration** properties control rendering and time
scale. You can set the **Training Configuration** to minimize the time Unity
spends rendering graphics in order to speed up training.
spends rendering graphics in order to speed up training.
When you create a training environment in Unity, you must set up the scene so
that it can be controlled by the external training process. Considerations
include:

76
docs/Learning-Environment-Examples.md


* Set-up: A linear movement task where the agent must move left or right to
rewarding states.
* Goal: Move to the most reward state.
* Agents: The environment contains one agent linked to a single Brain.
* Agents: The environment contains one agent.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: One variable corresponding to current state.
* Vector Action space: (Discrete) Two possible actions (Move left, move
right).

* Set-up: A balance-ball task, where the agent balances the ball on it's head.
* Goal: The agent must balance the ball on it's head for as long as possible.
* Agents: The environment contains 12 agents of the same kind, all linked to a
single Brain.
* Agents: The environment contains 12 agents of the same kind, all using the
same Behavior Parameters.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: 8 variables corresponding to rotation of the agent cube,
and position and velocity of ball.
* Vector Observation space (Hard Version): 5 variables corresponding to

* Default: 1
* Recommended Minimum: 0.2
* Recommended Maximum: 5
* gravity: Magnitude of gravity
* gravity: Magnitude of gravity
* Default: 9.81
* Recommended Minimum: 4
* Recommended Maximum: 105

and obstacles.
* Goal: The agent must navigate the grid to the goal while avoiding the
obstacles.
* Agents: The environment contains one agent linked to a single Brain.
* Agents: The environment contains nine agents with the same Behavior Parameters.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: None
* Vector Action space: (Discrete) Size of 4, corresponding to movement in
cardinal directions. Note that for this environment,

net.
* Goal: The agents must bounce ball between one another while not dropping or
sending ball out of bounds.
* Agents: The environment contains two agent linked to a single Brain named
TennisBrain. After training you can attach another Brain named MyBrain to one
of the agent to play against your trained model.
* Agents: The environment contains two agent with same Behavior Parameters.
After training you can check the `Use Heuristic` checkbox on one of the Agents
to play against your trained model.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: 8 variables corresponding to position and velocity
of ball and racket.
* Vector Action space: (Continuous) Size of 2, corresponding to movement

* angle: Angle of the racket from the vertical (Y) axis.
* Default: 55
* Recommended Minimum: 35
* Recommended Minimum: 35
* Recommended Maximum: 65
* gravity: Magnitude of gravity
* Default: 9.81

* Set-up: A platforming environment where the agent can push a block around.
* Goal: The agent must push the block to the goal.
* Agents: The environment contains one agent linked to a single Brain.
* Agents: The environment contains one agent.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: (Continuous) 70 variables corresponding to 14
ray-casts each detecting one of three possible objects (wall, goal, or
block).

* Set-up: A platforming environment where the agent can jump over a wall.
* Goal: The agent must use the block to scale the wall and reach the goal.
* Agents: The environment contains one agent linked to two different Brains. The
Brain the agent is linked to changes depending on the height of the wall.
* Agents: The environment contains one agent linked to two different
Models. The Policy the agent is linked to changes depending on the
height of the wall. The change of Policy is done in the WallJumpAgent class.
* Brains: Two Brains, each with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: Size of 74, corresponding to 14 ray casts each
detecting 4 possible objects. plus the global position of the agent and
whether or not the agent is grounded.

* Jump (2 possible actions: Jump, No Action)
* Visual Observations: None
* Reset Parameters: Four
* Benchmark Mean Reward (Big & Small Wall Brain): 0.8
* Benchmark Mean Reward (Big & Small Wall): 0.8
## [Reacher](https://youtu.be/2N9EoF6pQyE)

* Goal: The agents must move its hand to the goal location, and keep it there.
* Agents: The environment contains 10 agent linked to a single Brain.
* Agents: The environment contains 10 agent with same Behavior Parameters.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: 26 variables corresponding to position, rotation,
velocity, and angular velocities of the two arm Rigidbodies.
* Vector Action space: (Continuous) Size of 4, corresponding to torque

* Goal: The agents must move its body toward the goal direction without falling.
* `CrawlerStaticTarget` - Goal direction is always forward.
* `CrawlerDynamicTarget`- Goal direction is randomized.
* Agents: The environment contains 3 agent linked to a single Brain.
* Agents: The environment contains 3 agent with same Behavior Parameters.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: 117 variables corresponding to position, rotation,
velocity, and angular velocities of each limb plus the acceleration and
angular acceleration of the body.

* Set-up: A multi-agent environment where agents compete to collect food.
* Goal: The agents must learn to collect as many green food spheres as possible
while avoiding red spheres.
* Agents: The environment contains 5 agents linked to a single Brain.
* Agents: The environment contains 5 agents with same Behavior Parameters.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: 53 corresponding to velocity of agent (2), whether
agent is frozen and/or shot its laser (2), plus ray-based perception of
objects around agent's forward direction (49; 7 raycast angles with 7

remember it, and use it to move to the correct goal.
* Goal: Move to the goal which corresponds to the color of the block in the
room.
* Agents: The environment contains one agent linked to a single Brain.
* Agents: The environment contains one agent.
* Brains: One Brain with the following observation/action space:
* Behavior Parameters:
* Vector Observation space: 30 corresponding to local ray-casts detecting
objects, goals, and walls.
* Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent

* Set-up: Environment where the agent needs on-demand decision making. The agent
must decide how perform its next bounce only when it touches the ground.
* Goal: Catch the floating green cube. Only has a limited number of jumps.
* Agents: The environment contains one agent linked to a single Brain.
* Agents: The environment contains one agent.
* Brains: One Brain with the following observation/action space:
* Behavior Parameters:
* Vector Observation space: 6 corresponding to local position of agent and
green cube.
* Vector Action space: (Continuous) 3 corresponding to agent force applied for

* Goal:
* Striker: Get the ball into the opponent's goal.
* Goalie: Prevent the ball from entering its own goal.
* Agents: The environment contains four agents, with two linked to one Brain
(strikers) and two linked to another (goalies).
* Agents: The environment contains four agents, with two different sets of
Behavior Parameters : Striker and Goalie.
* Agent Reward Function (dependent):
* Striker:
* +1 When ball enters opponent's goal.

* -1 When ball enters team's goal.
* +0.1 When ball enters opponents goal.
* +0.001 Existential bonus.
* Brains: Two Brain with the following observation/action space:
* Behavior Parameters:
* Vector Observation space: 112 corresponding to local 14 ray casts, each
detecting 7 possible object types, along with the object's distance.
Perception is in 180 degree view from front of agent.

* Default: 9.81
* Recommended minimum: 6
* Recommended maximum: 20
* Benchmark Mean Reward (Striker & Goalie Brain): 0 (the means will be inverse
* Benchmark Mean Reward (Striker & Goalie): 0 (the means will be inverse
of each other and criss crosses during training) __Note that our trainer is currently unable to consistently train this environment__
## Walker

head, thighs, shins, feet, arms, forearms and hands.
* Goal: The agents must move its body toward the goal direction as quickly as
possible without falling.
* Agents: The environment contains 11 independent agent linked to a single
Brain.
* Agents: The environment contains 11 independent agents with same Behavior Parameters.
* Brains: One Brain with the following observation/action space.
* Behavior Parameters:
* Vector Observation space: 215 variables corresponding to position, rotation,
velocity, and angular velocities of each limb, along with goal direction.
* Vector Action space: (Continuous) Size of 39, corresponding to target

pyramid, then navigate to the pyramid, knock it over, and move to the gold
brick at the top.
* Goal: Move to the golden brick on top of the spawned pyramid.
* Agents: The environment contains one agent linked to a single Brain.
* Agents: The environment contains one agent.
* Brains: One Brain with the following observation/action space:
* Behavior Parameters:
* Vector Observation space: 148 corresponding to local ray-casts detecting
switch, bricks, golden brick, and walls, plus variable indicating switch
state.

17
docs/Learning-Environment-Executable.md


![3DBall Scene](images/mlagents-Open3DBall.png)
Make sure the Brains in the scene have the right type. For example, if you want
to be able to control your agents from Python, you will need to put the Brain
controlling the Agents to be a **Learning Brain**. In the 3DBall
scene, this can be done in the Platform GameObject within the Game prefab in
`Assets/ML-Agents/Examples/3DBall/Prefabs/`, or in each instance of the
Platform in the Scene.
Next, we want the set up scene to play correctly when the training process
launches our environment executable. This means:

```
You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/<brain_name>.nn`, which corresponds
`models/<run-identifier>/<behavior_name>.nn`, which corresponds
into your Learning Brain by following the steps below:
into your Agent by following the steps below:
3. Select the **Ball3DLearning** object from the Project window.
5. Drag the `<brain_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **Ball3DLearning**
3. Select the **3DBall** prefab from the Project window and select **Agent**.
5. Drag the `<behavior_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **Ball3DAgent**
inspector window.
6. Press the Play button at the top of the editor.

97
docs/ML-Agents-Overview.md


_Simplified block diagram of ML-Agents._
The Learning Environment contains three additional components that help
The Learning Environment contains two additional components that help
Agent is linked to exactly one Brain.
- **Brains** - which encapsulates the logic for making decisions for the Agent.
In essence, the Brain is what holds on to the policy for each Agent and
determines which actions the Agent should take at each instance. More
specifically, it is the component that receives the observations and rewards
from the Agent and returns an action.
Agent is linked to a Policy.
- **Academy** - which orchestrates the observation and decision making process.
Within the Academy, several environment-wide parameters such as the rendering
quality and the speed at which the environment is run can be specified. The

every character in the scene. While each Agent must be linked to a Brain, it is
possible for Agents that have similar observations and actions to be linked to
the same Brain. In our sample game, we have two teams each with their own medic.
every character in the scene. While each Agent must be linked to a Policy, it is
possible for Agents that have similar observations and actions to have
the same Policy type. In our sample game, we have two teams each with their own medic.
but both of these medics can be linked to the same Brain. Note that these two
medics are linked to the same Brain because their _space_ of observations and
but both of these medics can have the same Policy. Note that these two
medics have the same Policy because their _space_ of observations and
identical observation and action _values_. In other words, the Brain defines the
identical observation and action _values_. In other words, the Policy defines the
attached to those characters cannot share a Brain with the Agent linked to the
attached to those characters cannot share a Policy with the Agent linked to the
medics (medics and drivers have different actions).
<p align="center">

We have yet to discuss how the ML-Agents toolkit trains behaviors, and what role
the Python API and External Communicator play. Before we dive into those
details, let's summarize the earlier components. Each character is attached to
an Agent, and each Agent is linked to a Brain. The Brain receives observations
an Agent, and each Agent has a Policy. The Policy receives observations
Agents and Brains are in sync in addition to controlling environment-wide
settings. So how does the Brain control what the Agent does?
Agents are in sync in addition to controlling environment-wide
settings.
In practice, we have three different categories of Brains, which enable a wide
range of training and inference scenarios:
- **Learning** - where decisions are made using an embedded
[TensorFlow](Background-TensorFlow.md) model. The embedded TensorFlow model
represents a learned policy and the Brain directly uses this model to
determine the action for each Agent. You can train a **Learning Brain**
by launching the game with the Python training process.
- **Player** - where decisions are made using real input from a keyboard or
controller. Here, a human player is controlling the Agent and the observations
and rewards collected by the Brain are not used to control the Agent.
- **Heuristic** - where decisions are made using hard-coded behavior. This
resembles how most character behaviors are currently defined and can be
helpful for debugging or comparing how an Agent with hard-coded rules compares
to an Agent whose behavior has been trained. In our example, once we have
trained a Brain for the medics we could assign a medic on one team to the
trained Brain and assign the medic on the other team a Heuristic Brain with
hard-coded behaviors. We can then evaluate which medic is more effective.
<p align="center">
<img src="images/learning_environment.png"
alt="ML-Agents Scene Block Diagram"
border="10" />
</p>
_An example of how a scene containing multiple Agents and Brains might be
configured._
## Training Modes

As mentioned previously, the ML-Agents toolkit ships with several
implementations of state-of-the-art algorithms for training intelligent agents.
In this mode, the only Brain used is a **Learning Brain**. More
specifically, during training, all the medics in the
More specifically, during training, all the medics in the
Communicator (this is the behavior with an External Brain). The Python API
Communicator. The Python API
during the inference phase, we use the **Learning Brain** in internal mode
and include the
during the inference phase, we use the
phase, the medics still continue to generate their observations, but instead of
phase, the medics still continue to generate their observations, but instead of
being sent to the Python API, they will be fed into their (internal, embedded)
model to generate the _optimal_ action for each medic to take at every point in
time.

model. This model is then embedded within the Learning Brain during inference to
generate the optimal actions for all Agents linked to that Brain.
model. This model is then embedded within the Agent during inference.
The
[Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md)

In the previous mode, the Learning Brain was used for training to generate
a TensorFlow model that the Learning Brain can later use. However,
In the previous mode, the Agents were used for training to generate
a TensorFlow model that the Agents can later use. However,
training. In this case, the Brain type would be set to Learning
and the behaviors of all the Agents in the scene will be controlled within Python.
training. In this case, the behaviors of all the Agents in the scene
will be controlled within Python.
You can even turn your environment into a [gym.](../gym-unity/README.md)
We do not currently have a tutorial highlighting this mode, but you can

to perform, rather than attempting to have it learn via trial-and-error methods.
For example, instead of training the medic by setting up its reward function,
this mode allows providing real examples from a game controller on how the medic
should behave. More specifically, in this mode, the Brain type during training
is set to Player and all the actions performed with the controller (in addition
should behave. More specifically, in this mode, the Agent must use its heuristic
to generate action, and all the actions performed with the controller (in addition
to the agent observations) will be recorded. The
imitation learning algorithm will then use these pairs of observations and
actions from the human player to learn a policy. [Video

training intelligent agents, below are a few examples that can serve as
inspiration:
- Single-Agent. A single agent linked to a single Brain, with its own reward
- Single-Agent. A single agent, with its own reward
signals linked to a single Brain. A parallelized version of the traditional
signals with same `Behavior Parameters`. A parallelized version of the traditional
- Adversarial Self-Play. Two interacting agents with inverse reward signals
linked to a single Brain. In two-player games, adversarial self-play can allow
- Adversarial Self-Play. Two interacting agents with inverse reward signals.
In two-player games, adversarial self-play can allow
signal linked to either a single or multiple different Brains. In this
signal with same or different `Behavior Parameters`. In this
signals linked to either a single or multiple different Brains. In this
signals with same or different `Behavior Parameters`. In this
- Ecosystem. Multiple interacting agents with independent reward signals linked
to either a single or multiple different Brains. This scenario can be thought
- Ecosystem. Multiple interacting agents with independent reward signals with
same or different `Behavior Parameters`. This scenario can be thought
of as creating a small world in which animals with different goals all
interact, such as a savanna in which there might be zebras, elephants and
giraffes, or an autonomous driving simulation within an urban environment.

9
docs/Migrating.md


### Important Changes
* The definition of the gRPC service has changed.
* The online BC training feature has been removed.
* The online BC training feature has been removed.
* The Brain ScriptableObjects have been deprecated. The Brain Parameters are now on the Agent and are referred to as Behavior Parameters. Make sure the Behavior Parameters is attached to the Agent GameObject.
* Several changes were made to the setup for visual observations (i.e. using Cameras or RenderTextures):
* Camera resolutions are no longer stored in the Brain Parameters.
* AgentParameters no longer stores lists of Cameras and RenderTextures
* To add visual observations to an Agent, you must now attach a CameraSensorComponent or RenderTextureComponent to the agent. The corresponding Camera or RenderTexture can be added to these in the editor, and the resolution and color/grayscale is configured on the component itself.
* If your Agents used visual observations, you must add a CameraSensorComponent corresponding to each old Camera in the Agent's camera list (and similarly for RenderTextures).
* Since Brain ScriptableObjects have been removed, you will need to delete all the Brain ScriptableObjects from your `Assets` folder. Then, add a `Behavior Parameters` component to each `Agent` GameObject. You will then need to complete the fields on the new `Behavior Parameters` component with the BrainParameters of the old Brain.
## Migrating from ML-Agents toolkit v0.9 to v0.10

6
docs/Readme.md


* [Designing a Learning Environment](Learning-Environment-Design.md)
* [Agents](Learning-Environment-Design-Agents.md)
* [Academy](Learning-Environment-Design-Academy.md)
* [Brains](Learning-Environment-Design-Brains.md):
[Player](Learning-Environment-Design-Player-Brains.md),
[Heuristic](Learning-Environment-Design-Heuristic-Brains.md),
[Learning](Learning-Environment-Design-Learning-Brains.md)
* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
### Advanced Usage

* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
### Cloud Training (Deprecated)
Here are the cloud training set-up guides for Azure and AWS. We no longer use them ourselves and
Here are the cloud training set-up guides for Azure and AWS. We no longer use them ourselves and
so they may not be work correctly. We've decided to keep them up just in case they are helpful to
you.

2
docs/Reward-Signals.md


Reward signals, like other hyperparameters, are defined in the trainer config `.yaml` file. An
example is provided in `config/trainer_config.yaml` and `config/gail_config.yaml`. To enable a reward signal, add it to the
`reward_signals:` section under the brain name. For instance, to enable the extrinsic signal
`reward_signals:` section under the behavior name. For instance, to enable the extrinsic signal
in addition to a small curiosity reward and a GAIL reward signal, you would define your `reward_signals` as follows:
```yaml

35
docs/Training-Behavioral-Cloning.md


# Training with Behavioral Cloning
There are a variety of possible imitation learning algorithms which can
be used, the simplest one of them is Behavioral Cloning. It works by collecting
demonstrations from a teacher, and then simply uses them to directly learn a
policy, in the same way the supervised learning for image classification
There are a variety of possible imitation learning algorithms which can
be used, the simplest one of them is Behavioral Cloning. It works by collecting
demonstrations from a teacher, and then simply uses them to directly learn a
policy, in the same way the supervised learning for image classification
With offline behavioral cloning, we can use demonstrations (`.demo` files)
With offline behavioral cloning, we can use demonstrations (`.demo` files)
1. Choose an agent you would like to learn to imitate some set of demonstrations.
2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)).
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
3. Build the scene, assigning the agent a Learning Brain. For more information on Brains, see
[here](Learning-Environment-Design-Brains.md).
4. Open the `config/offline_bc_config.yaml` file.
5. Modify the `demo_path` parameter in the file to reference the path to the
demonstration file recorded in step 2. In our case this is:
1. Choose an agent you would like to learn to imitate some set of demonstrations.
2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)).
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
3. Build the scene(make sure the Agent is not using its heuristic).
4. Open the `config/offline_bc_config.yaml` file.
5. Modify the `demo_path` parameter in the file to reference the path to the
demonstration file recorded in step 2. In our case this is:
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml`
as the config parameter, and include the `--run-id` and `--train` as usual.
Provide your environment as the `--env` parameter if it has been compiled
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml`
as the config parameter, and include the `--run-id` and `--train` as usual.
Provide your environment as the `--env` parameter if it has been compiled
This will use the demonstration file to train a neural network driven agent
to directly imitate the actions provided in the demonstration. The environment
This will use the demonstration file to train a neural network driven agent
to directly imitate the actions provided in the demonstration. The environment
will launch and be used for evaluating the agent's performance during training.

14
docs/Training-Curriculum-Learning.md


## How-To
Each Brain in an environment can have a corresponding curriculum. These
Each group of Agents under the same `Behavior Name` in an environment can have
a corresponding curriculum. These
different Brains to follow different curriculums within the same environment.
different groups of Agents to follow different curriculums within the same environment.
### Specifying a Metacurriculum

points in the training process our wall height will change, either based on the
percentage of training steps which have taken place, or what the average reward
the agent has received in the recent past is. Below is an example curriculum for
the BigWallBrain in the Wall Jump environment.
the BigWallBehavior in the Wall Jump environment.
```json
{

for an example.
We will save this file into our metacurriculum folder with the name of its
corresponding Brain. For example, in the Wall Jump environment, there are two
Brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
the BigWallBrain, we will save `BigWallBrain.json` into
corresponding `Behavior Name`. For example, in the Wall Jump environment, there are two
different `Behaviors Name` set via script in `WallJumpAgent.cs`
---BigWallBrainLearning and SmallWallBrainLearning. If we want to define a curriculum for
the BigWallBrainLearning, we will save `BigWallBrainLearning.json` into
`config/curricula/wall-jump/`.
### Training with a Curriculum

70
docs/Training-Generalized-Reinforcement-Learning-Agents.md


agents are unable to generalize to any tweaks or variations in the environment.
This is analogous to a model being trained and tested on an identical dataset
in supervised learning. This becomes problematic in cases where environments
are randomly instantiated with varying objects or properties.
are randomly instantiated with varying objects or properties.
To make agents robust and generalizable to different environments, the agent
should be trained over multiple variations of the environment. Using this approach

## How to Enable Generalization Using Reset Parameters
We first need to provide a way to modify the environment by supplying a set of `Reset Parameters`
and vary them over time. This provision can be done either deterministically or randomly.
and vary them over time. This provision can be done either deterministically or randomly.
This is done by assigning each `Reset Parameter` a `sampler-type`(such as a uniform sampler),
This is done by assigning each `Reset Parameter` a `sampler-type`(such as a uniform sampler),
`Reset Parameter`, the parameter maintains the default value throughout the
training procedure, remaining unchanged. The samplers for all the `Reset Parameters`
are handled by a **Sampler Manager**, which also handles the generation of new
values for the reset parameters when needed.
`Reset Parameter`, the parameter maintains the default value throughout the
training procedure, remaining unchanged. The samplers for all the `Reset Parameters`
are handled by a **Sampler Manager**, which also handles the generation of new
values for the reset parameters when needed.
To setup the Sampler Manager, we create a YAML file that specifies how we wish to
generate new samples for each `Reset Parameters`. In this file, we specify the samplers and the
`resampling-interval` (the number of simulation steps after which reset parameters are
To setup the Sampler Manager, we create a YAML file that specifies how we wish to
generate new samples for each `Reset Parameters`. In this file, we specify the samplers and the
`resampling-interval` (the number of simulation steps after which reset parameters are
resampled). Below is an example of a sampler file for the 3D ball environment.
```yaml

Below is the explanation of the fields in the above example.
* `resampling-interval` - Specifies the number of steps for the agent to
train under a particular environment configuration before resetting the
* `resampling-interval` - Specifies the number of steps for the agent to
train under a particular environment configuration before resetting the
* `Reset Parameter` - Name of the `Reset Parameter` like `mass`, `gravity` and `scale`. This should match the name
specified in the academy of the intended environment for which the agent is
being trained. If a parameter specified in the file doesn't exist in the
* `Reset Parameter` - Name of the `Reset Parameter` like `mass`, `gravity` and `scale`. This should match the name
specified in the academy of the intended environment for which the agent is
being trained. If a parameter specified in the file doesn't exist in the
* `sampler-type` - Specify the sampler type to use for the `Reset Parameter`.
This is a string that should exist in the `Sampler Factory` (explained
* `sampler-type` - Specify the sampler type to use for the `Reset Parameter`.
This is a string that should exist in the `Sampler Factory` (explained
* `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
In the example above, this would correspond to the `intervals`
under the `sampler-type` `"multirange_uniform"` for the `Reset Parameter` called gravity`.
The key name should match the name of the corresponding argument in the sampler definition.
* `sampler-type-sub-arguments` - Specify the sub-arguments depending on the `sampler-type`.
In the example above, this would correspond to the `intervals`
under the `sampler-type` `"multirange_uniform"` for the `Reset Parameter` called gravity`.
The key name should match the name of the corresponding argument in the sampler definition.
(See below)
The Sampler Manager allocates a sampler type for each `Reset Parameter` by using the *Sampler Factory*,

Below is a list of included `sampler-type` as part of the toolkit.
* `uniform` - Uniform sampler
* Uniformly samples a single float value between defined endpoints.
The sub-arguments for this sampler to specify the interval
endpoints are as below. The sampling is done in the range of
* Uniformly samples a single float value between defined endpoints.
The sub-arguments for this sampler to specify the interval
endpoints are as below. The sampling is done in the range of
* `gaussian` - Gaussian sampler
* `gaussian` - Gaussian sampler
the mean and standard deviation. The sub-arguments to specify the
the mean and standard deviation. The sub-arguments to specify the
* Uniformly samples a single float value between the specified intervals.
Samples by first performing a weight pick of an interval from the list
of intervals (weighted based on interval width) and samples uniformly
from the selected interval (half-closed interval, same as the uniform
sampler). This sampler can take an arbitrary number of intervals in a
list in the following format:
* Uniformly samples a single float value between the specified intervals.
Samples by first performing a weight pick of an interval from the list
of intervals (weighted based on interval width) and samples uniformly
from the selected interval (half-closed interval, same as the uniform
sampler). This sampler can take an arbitrary number of intervals in a
list in the following format:
* **sub-arguments** - `intervals`
The implementation of the samplers can be found at `ml-agents-envs/mlagents/envs/sampler_class.py`.

If you want to define your own sampler type, you must first inherit the *Sampler*
base class (included in the `sampler_class` file) and preserve the interface.
Once the class for the required method is specified, it must be registered in the Sampler Factory.
Once the class for the required method is specified, it must be registered in the Sampler Factory.
This can be done by subscribing to the *register_sampler* method of the SamplerFactory. The command
is as follows:

sampling setup, we would run
```sh
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml
--run-id=3D-Ball-generalization --train
```

26
docs/Training-ML-Agents.md


And then opening the URL: [localhost:6006](http://localhost:6006).
**Note:** The default port TensorBoard uses is 6006. If there is an existing session
running on port 6006 a new session can be launched on an open port using the --port
running on port 6006 a new session can be launched on an open port using the --port
option.
When training is finished, you can find the saved model in the `models` folder

Default is set to 1. Set to higher values when benchmarking performance and
multiple training sessions is desired. Training sessions are independent, and
do not improve learning performance.
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to
collect experiences from when training. Defaults to 1.
* `--run-id=<path>`: Specifies an identifier for each training run. This
identifier is used to name the subdirectories in which the trained model and

All arguments after this flag will be passed to the executable. For example, setting
`mlagents-learn config/trainer_config.yaml --env-args --num-orcs 42` would result in
` --num-orcs 42` passed to the executable.
* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance
will get assigned a port sequentially, starting from the `base-port`. Each instance
will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs
* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance
will get assigned a port sequentially, starting from the `base-port`. Each instance
will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs
given to each instance from 0 to `num_envs - 1`. Default is 5005. __Note:__ When
training using the Editor rather than an executable, the base port will be ignored.
* `--slow`: Specify this option to run the Unity environment at normal, game

details.
* `--debug`: Specify this option to enable debug-level logging for some parts of the code.
* `--multi-gpu`: Setting this flag enables the use of multiple GPU's (if available) during training.
* `--cpu`: Forces training using CPU only.
the hyperparameters, and a few additional values to use when training with Proximal Policy
Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial Imitation Learning)
with PPO, and online and offline Behavioral Cloning(BC)/Imitation. These files are divided
the hyperparameters, and a few additional values to use when training with Proximal Policy
Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial Imitation Learning)
with PPO, and online and offline Behavioral Cloning(BC)/Imitation. These files are divided
The **default** section defines the default values for all the available settings. You can
also add new sections to override these defaults to train specific Brains. Name each of these
override sections after the GameObject containing the Brain component that should use these
settings. (This GameObject will be a child of the Academy in your scene.) Sections for the
The **default** section defines the default values for all the available settings. You can
also add new sections to override these defaults to train specific Behaviors. Name each of these
override sections after the appropriate `Behavior Name`. Sections for the
example environments are included in the provided config file.
| **Setting** | **Description** | **Applies To Trainer\*** |

2
docs/Training-PPO.md


### Entropy
This corresponds to how random the decisions of a Brain are. This should
This corresponds to how random the decisions are. This should
consistently decrease during training. If it decreases too soon or not at all,
`beta` should be adjusted (when using discrete action space).

2
docs/Training-SAC.md


### Entropy
This corresponds to how random the decisions of a Brain are. This should
This corresponds to how random the decisions are. This should
initially increase during training, reach a peak, and should decline along
with the Entropy Coefficient. This is because in the beginning, the agent is
incentivized to be more random for exploration due to a high entropy coefficient.

2
docs/Training-Using-Concurrent-Unity-Instances.md


# Training Using Concurrent Unity Instances
As part of release v0.8, we enabled developers to run concurrent, parallel instances of the Unity executable during training. For certain scenarios, this should speed up the training.
As part of release v0.8, we enabled developers to run concurrent, parallel instances of the Unity executable during training. For certain scenarios, this should speed up the training.
## How to Run Concurrent Unity Instances During Training

5
docs/Training-on-Amazon-Web-Service.md


# Training on Amazon Web Service
Note: We no longer use this guide ourselves and so it may not work correctly. We've
Note: We no longer use this guide ourselves and so it may not work correctly. We've
decided to keep it up just in case it is helpful to you.
This page contains instructions for setting up an EC2 instance on Amazon Web

### Unity Environment not responding
If you didn't setup X Server or hasn't launched it properly, or you didn't made your environment with external brain, or your environment somehow crashes, or you haven't `chmod +x` your Unity Environment, all of these will cause connection between Unity and Python to fail. Then you will see something like this:
If you didn't setup X Server or hasn't launched it properly, or your environment somehow crashes, or you haven't `chmod +x` your Unity Environment, all of these will cause connection between Unity and Python to fail. Then you will see something like this:
```console
Logging to /home/ubuntu/.config/unity3d/<Some_Path>/Player.log

File "/home/ubuntu/ml-agents/ml-agents/mlagents/envs/rpc_communicator.py", line 60, in initialize
mlagents.envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
The environment does not need user interaction to launch
The Academy and the External Brain(s) are attached to objects in the Scene
The environment and the Python interface have compatible versions.
```

部分文件因为文件数量过多而无法显示

正在加载...
取消
保存