浏览代码

resolving conflicts

/develop-gpu-test
Anupam Bhatnagar 5 年前
当前提交
b733b34c
共有 92 个文件被更改,包括 774 次插入5576 次删除
  1. 45
      UnitySDK/Assets/ML-Agents/Editor/BroadcastHubDrawer.cs
  2. 6
      UnitySDK/Assets/ML-Agents/Examples/3DBall/Scenes/3DBall.unity
  3. 6
      UnitySDK/Assets/ML-Agents/Examples/3DBall/Scenes/3DBallHard.unity
  4. 4
      UnitySDK/Assets/ML-Agents/Examples/Basic/Scenes/Basic.unity
  5. 4
      UnitySDK/Assets/ML-Agents/Examples/Bouncer/Scenes/Bouncer.unity
  6. 4
      UnitySDK/Assets/ML-Agents/Examples/Crawler/Scenes/CrawlerDynamicTarget.unity
  7. 4
      UnitySDK/Assets/ML-Agents/Examples/Crawler/Scenes/CrawlerStaticTarget.unity
  8. 6
      UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scenes/FoodCollector.unity
  9. 4
      UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scenes/VisualFoodCollector.unity
  10. 6
      UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scenes/GridWorld.unity
  11. 4
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/Hallway.unity
  12. 3
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/VisualHallway.unity
  13. 7
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Brains/PushBlockLearning.asset
  14. 4
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/PushBlock.unity
  15. 3
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/VisualPushBlock.unity
  16. 2
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Prefabs/VisualAreaPyramids.prefab
  17. 4
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scenes/Pyramids.unity
  18. 5
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scenes/VisualPyramids.unity
  19. 4
      UnitySDK/Assets/ML-Agents/Examples/Reacher/Scenes/Reacher.unity
  20. 6
      UnitySDK/Assets/ML-Agents/Examples/Soccer/Scenes/SoccerTwos.unity
  21. 4
      UnitySDK/Assets/ML-Agents/Examples/Tennis/Scenes/Tennis.unity
  22. 4
      UnitySDK/Assets/ML-Agents/Examples/Walker/Scenes/Walker.unity
  23. 8
      UnitySDK/Assets/ML-Agents/Examples/WallJump/Scenes/WallJump.unity
  24. 88
      UnitySDK/Assets/ML-Agents/Scripts/Academy.cs
  25. 7
      UnitySDK/Assets/ML-Agents/Scripts/Agent.cs
  26. 25
      UnitySDK/Assets/ML-Agents/Scripts/Brain.cs
  27. 36
      UnitySDK/Assets/ML-Agents/Scripts/BroadcastHub.cs
  28. 13
      UnitySDK/Assets/ML-Agents/Scripts/Grpc/GrpcExtensions.cs
  29. 18
      UnitySDK/Assets/ML-Agents/Scripts/LearningBrain.cs
  30. 9
      docs/Basic-Guide.md
  31. 6
      docs/FAQ.md
  32. 5
      docs/Getting-Started-with-Balance-Ball.md
  33. 4
      docs/Learning-Environment-Create-New.md
  34. 3
      docs/Learning-Environment-Design-Academy.md
  35. 3
      docs/Learning-Environment-Design-Agents.md
  36. 43
      docs/Learning-Environment-Design-Brains.md
  37. 12
      docs/Learning-Environment-Design-Learning-Brains.md
  38. 4
      docs/Learning-Environment-Design-Player-Brains.md
  39. 10
      docs/Learning-Environment-Design.md
  40. 7
      docs/Learning-Environment-Examples.md
  41. 2
      docs/Learning-Environment-Executable.md
  42. 10
      docs/ML-Agents-Overview.md
  43. 3
      docs/Migrating.md
  44. 10
      docs/Python-API.md
  45. 63
      docs/Training-Behavioral-Cloning.md
  46. 5
      docs/Training-Imitation-Learning.md
  47. 24
      docs/Training-ML-Agents.md
  48. 2
      docs/Unity-Inference-Engine.md
  49. 10
      gym-unity/gym_unity/envs/__init__.py
  50. 23
      ml-agents-envs/mlagents/envs/brain.py
  51. 32
      ml-agents-envs/mlagents/envs/environment.py
  52. 2
      ml-agents-envs/mlagents/envs/tests/test_envs.py
  53. 2
      ml-agents/mlagents/trainers/bc/models.py
  54. 9
      ml-agents/mlagents/trainers/bc/trainer.py
  55. 18
      ml-agents/mlagents/trainers/models.py
  56. 11
      ml-agents/mlagents/trainers/ppo/trainer.py
  57. 16
      ml-agents/mlagents/trainers/tests/mock_brain.py
  58. 29
      ml-agents/mlagents/trainers/tests/test_barracuda_converter.py
  59. 22
      ml-agents/mlagents/trainers/tests/test_bc.py
  60. 2
      ml-agents/mlagents/trainers/tests/test_bcmodule.py
  61. 85
      ml-agents/mlagents/trainers/tests/test_ppo.py
  62. 6
      ml-agents/mlagents/trainers/tests/test_reward_signals.py
  63. 10
      ml-agents/mlagents/trainers/tests/test_sac.py
  64. 76
      ml-agents/mlagents/trainers/tests/test_trainer_util.py
  65. 10
      ml-agents/mlagents/trainers/trainer_util.py
  66. 2
      notebooks/getting-started.ipynb
  67. 37
      UnitySDK/Assets/ML-Agents/Editor/Tests/TimerTest.cs
  68. 3
      UnitySDK/Assets/ML-Agents/Editor/Tests/TimerTest.cs.meta
  69. 10
      UnitySDK/Assets/ML-Agents/Plugins/ProtoBuffer/link.xml
  70. 7
      UnitySDK/Assets/ML-Agents/Plugins/ProtoBuffer/link.xml.meta
  71. 344
      UnitySDK/Assets/ML-Agents/Scripts/Timer.cs
  72. 11
      UnitySDK/Assets/ML-Agents/Scripts/Timer.cs.meta
  73. 3
      UnitySDK/Assets/ML-Agents/Editor/Builder.cs.meta
  74. 3
      UnitySDK/Assets/ML-Agents/Editor/BuilderUtils.cs.meta
  75. 14
      UnitySDK/Assets/ML-Agents/Editor/Builder.cs
  76. 44
      UnitySDK/Assets/ML-Agents/Editor/BuilderUtils.cs
  77. 7
      UnitySDK/Assets/ML-Agents/Examples/Bouncer/Scenes/BouncerIL.unity.meta
  78. 1001
      UnitySDK/Assets/ML-Agents/Examples/Bouncer/Scenes/BouncerIL.unity
  79. 880
      UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scenes/FoodCollectorIL.unity
  80. 9
      UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scenes/FoodCollectorIL.unity.meta
  81. 7
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/HallwayIL.unity.meta
  82. 653
      UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/HallwayIL.unity
  83. 7
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/PushBlockIL.unity.meta
  84. 714
      UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/PushBlockIL.unity
  85. 7
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scenes/PyramidsIL.unity.meta
  86. 566
      UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scenes/PyramidsIL.unity
  87. 7
      UnitySDK/Assets/ML-Agents/Examples/Tennis/Scenes/TennisIL.unity.meta
  88. 763
      UnitySDK/Assets/ML-Agents/Examples/Tennis/Scenes/TennisIL.unity
  89. 11
      UnitySDK/Assets/ML-Agents/Scripts/BCTeacherHelper.cs.meta
  90. 59
      UnitySDK/Assets/ML-Agents/Scripts/BCTeacherHelper.cs
  91. 110
      config/online_bc_config.yaml
  92. 149
      ml-agents/mlagents/trainers/bc/online_trainer.py

45
UnitySDK/Assets/ML-Agents/Editor/BroadcastHubDrawer.cs


private const float k_LineHeight = 17f;
// The vertical space left below the BroadcastHub UI.
private const float k_ExtraSpaceBelow = 10f;
// The horizontal size of the Control checkbox
private const int k_ControlSize = 80;
/// <summary>
/// Computes the height of the Drawer depending on the property it is showing

position.y += k_LineHeight;
// This is the labels for each columns
var brainWidth = position.width - k_ControlSize;
var brainWidth = position.width;
var controlRect = new Rect(
position.x + brainWidth, position.y, k_ControlSize, position.height);
EditorGUI.LabelField(controlRect, "Control");
controlRect.y += k_LineHeight;
controlRect.x += 15;
DrawBrains(brainRect, controlRect);
DrawBrains(brainRect);
EditorGUI.indentLevel--;
EditorGUI.EndProperty();
}

}
/// <summary>
/// Draws the Brain and Control checkbox for the brains contained in the BroadCastHub.
/// Draws the Brain contained in the BroadcastHub.
/// <param name="controlRect">The Rect to draw the control checkbox.</param>
private void DrawBrains(Rect brainRect, Rect controlRect)
private void DrawBrains(Rect brainRect)
var exposedBrains = m_Hub.broadcastingBrains;
var brain = exposedBrains[index];
var controlledBrains = m_Hub.brainsToControl;
var brain = controlledBrains[index];
brainRect, brain, typeof(Brain), true) as Brain;
brainRect, brain, typeof(LearningBrain), true) as LearningBrain;
m_Hub.broadcastingBrains.RemoveAt(index);
var brainToInsert = exposedBrains.Contains(newBrain) ? null : newBrain;
exposedBrains.Insert(index, brainToInsert);
m_Hub.brainsToControl.RemoveAt(index);
var brainToInsert = controlledBrains.Contains(newBrain) ? null : newBrain;
controlledBrains.Insert(index, brainToInsert);
}
// This is the Rectangle for the control checkbox
EditorGUI.BeginChangeCheck();
if (brain is LearningBrain)
{
var isTraining = m_Hub.IsControlled(brain);
isTraining = EditorGUI.Toggle(controlRect, isTraining);
m_Hub.SetControlled(brain, isTraining);
}
controlRect.y += k_LineHeight;
if (EditorGUI.EndChangeCheck())
{
MarkSceneAsDirty();
}
}
}

{
if (m_Hub.Count > 0)
{
m_Hub.broadcastingBrains.RemoveAt(m_Hub.broadcastingBrains.Count - 1);
m_Hub.brainsToControl.RemoveAt(m_Hub.brainsToControl.Count - 1);
}
}

private void AddBrain()
{
m_Hub.broadcastingBrains.Add(null);
m_Hub.brainsToControl.Add(null);
}
}
}

6
UnitySDK/Assets/ML-Agents/Examples/3DBall/Scenes/3DBall.unity


m_ReflectionIntensity: 1
m_CustomReflection: {fileID: 0}
m_Sun: {fileID: 0}
m_IndirectSpecularColor: {r: 0.4497121, g: 0.4997778, b: 0.5756369, a: 1}
m_IndirectSpecularColor: {r: 0.44971162, g: 0.49977726, b: 0.5756362, a: 1}
--- !u!157 &3
LightmapSettings:
m_ObjectHideFlags: 0

m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 300
height: 200

6
UnitySDK/Assets/ML-Agents/Examples/3DBall/Scenes/3DBallHard.unity


m_ReflectionIntensity: 1
m_CustomReflection: {fileID: 0}
m_Sun: {fileID: 0}
m_IndirectSpecularColor: {r: 0.4497121, g: 0.4997778, b: 0.5756369, a: 1}
m_IndirectSpecularColor: {r: 0.44971162, g: 0.49977726, b: 0.5756362, a: 1}
--- !u!157 &3
LightmapSettings:
m_ObjectHideFlags: 0

m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 300
height: 200

4
UnitySDK/Assets/ML-Agents/Examples/Basic/Scenes/Basic.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 80
height: 80

4
UnitySDK/Assets/ML-Agents/Examples/Bouncer/Scenes/Bouncer.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 80
height: 80

4
UnitySDK/Assets/ML-Agents/Examples/Crawler/Scenes/CrawlerDynamicTarget.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 80
height: 80

4
UnitySDK/Assets/ML-Agents/Examples/Crawler/Scenes/CrawlerStaticTarget.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 1280
height: 720

6
UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scenes/FoodCollector.unity


m_ReflectionIntensity: 1
m_CustomReflection: {fileID: 0}
m_Sun: {fileID: 0}
m_IndirectSpecularColor: {r: 0.4497121, g: 0.4997778, b: 0.5756369, a: 1}
m_IndirectSpecularColor: {r: 0.44971162, g: 0.49977726, b: 0.5756362, a: 1}
--- !u!157 &3
LightmapSettings:
m_ObjectHideFlags: 0

m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 1500
m_TrainingConfiguration:
width: 500
height: 500

4
UnitySDK/Assets/ML-Agents/Examples/FoodCollector/Scenes/VisualFoodCollector.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 500
height: 500

6
UnitySDK/Assets/ML-Agents/Examples/GridWorld/Scenes/GridWorld.unity


m_ReflectionIntensity: 1
m_CustomReflection: {fileID: 0}
m_Sun: {fileID: 0}
m_IndirectSpecularColor: {r: 0.4497121, g: 0.4997778, b: 0.5756369, a: 1}
m_IndirectSpecularColor: {r: 0.44971162, g: 0.49977726, b: 0.5756362, a: 1}
--- !u!157 &3
LightmapSettings:
m_ObjectHideFlags: 0

m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 84
height: 84

4
UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/Hallway.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 128
height: 128

3
UnitySDK/Assets/ML-Agents/Examples/Hallway/Scenes/VisualHallway.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_TrainingConfiguration:
width: 128
height: 128

7
UnitySDK/Assets/ML-Agents/Examples/PushBlock/Brains/PushBlockLearning.asset


m_Name: PushBlockLearning
m_EditorClassIdentifier:
brainParameters:
vectorObservationSize: 0
vectorObservationSize: 70
cameraResolutions:
- width: 84
height: 84
blackAndWhite: 0
cameraResolutions: []
vectorActionDescriptions:
-
vectorActionSpaceType: 0

4
UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/PushBlock.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 400
height: 300

3
UnitySDK/Assets/ML-Agents/Examples/PushBlock/Scenes/VisualPushBlock.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_TrainingConfiguration:
width: 1280
height: 720

2
UnitySDK/Assets/ML-Agents/Examples/Pyramids/Prefabs/VisualAreaPyramids.prefab


m_Script: {fileID: 11500000, guid: b8db44472779248d3be46895c4d562d5, type: 3}
m_Name:
m_EditorClassIdentifier:
brain: {fileID: 11400000, guid: 59a04e208fb8a423586adf25bf1fecd0, type: 2}
brain: {fileID: 11400000, guid: 60f0ffcd08c3b43a6bdc746cfc0c4059, type: 2}
agentParameters:
agentCameras:
- {fileID: 20712684238256298}

4
UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scenes/Pyramids.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 80
height: 80

5
UnitySDK/Assets/ML-Agents/Examples/Pyramids/Scenes/VisualPyramids.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl:
- {fileID: 11400000, guid: 60f0ffcd08c3b43a6bdc746cfc0c4059, type: 2}
m_MaxSteps: 0
m_TrainingConfiguration:
width: 80
height: 80

4
UnitySDK/Assets/ML-Agents/Examples/Reacher/Scenes/Reacher.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 80
height: 80

6
UnitySDK/Assets/ML-Agents/Examples/Soccer/Scenes/SoccerTwos.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
- {fileID: 11400000, guid: 29ed78b3e8fef4340b3a1f6954b88f18, type: 2}
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
- {fileID: 11400000, guid: 29ed78b3e8fef4340b3a1f6954b88f18, type: 2}
m_TrainingConfiguration:
width: 800
height: 500

4
UnitySDK/Assets/ML-Agents/Examples/Tennis/Scenes/Tennis.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 25000
m_TrainingConfiguration:
width: 300
height: 200

4
UnitySDK/Assets/ML-Agents/Examples/Walker/Scenes/Walker.unity


m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
m_TrainingConfiguration:
width: 80
height: 80

8
UnitySDK/Assets/ML-Agents/Examples/WallJump/Scenes/WallJump.unity


m_ReflectionIntensity: 1
m_CustomReflection: {fileID: 0}
m_Sun: {fileID: 0}
m_IndirectSpecularColor: {r: 0.44971484, g: 0.49977952, b: 0.57563835, a: 1}
m_IndirectSpecularColor: {r: 0.44971442, g: 0.499779, b: 0.5756377, a: 1}
--- !u!157 &3
LightmapSettings:
m_ObjectHideFlags: 0

m_Name:
m_EditorClassIdentifier:
broadcastHub:
broadcastingBrains:
- {fileID: 11400000, guid: b5f530c5bf8d64bf8a18df92e283bb9c, type: 2}
brainsToControl:
m_BrainsToControl: []
m_MaxSteps: 0
- {fileID: 11400000, guid: b5f530c5bf8d64bf8a18df92e283bb9c, type: 2}
m_TrainingConfiguration:
width: 80
height: 80

88
UnitySDK/Assets/ML-Agents/Scripts/Academy.cs


using UnityEngine;
using System.IO;
using System.Linq;
using UnityEngine.Serialization;
#if UNITY_EDITOR

InitializeAcademy();
ICommunicator communicator;
var exposedBrains = broadcastHub.broadcastingBrains.Where(x => x != null).ToList();
var controlledBrains = broadcastHub.broadcastingBrains.Where(
x => x != null && x is LearningBrain && broadcastHub.IsControlled(x));
foreach (var brain1 in controlledBrains)
{
var brain = (LearningBrain)brain1;
brain.SetToControlledExternally();
}
var controlledBrains = broadcastHub.brainsToControl.Where(x => x != null).ToList();
// Try to launch the communicator by usig the arguments passed at launch
// Try to launch the communicator by using the arguments passed at launch
try
{
communicator = new RpcCommunicator(

});
}
// and if Unity is in Editor mode
// If there arn't, there is no need for a communicator and it is set
// If there are not, there is no need for a communicator and it is set
#if UNITY_EDITOR
if (controlledBrains.ToList().Count > 0)
{
communicator = new RpcCommunicator(

});
}
#endif
foreach (var trainingBrain in exposedBrains)
if (communicator != null)
trainingBrain.SetBatcher(m_BrainBatcher);
}
if (communicator != null)
{
foreach (var trainingBrain in controlledBrains)
{
trainingBrain.SetBatcher(m_BrainBatcher);
}
m_IsCommunicatorOn = true;
var academyParameters =

foreach (var brain in exposedBrains)
foreach (var brain in controlledBrains)
bp.ToProto(brain.name, broadcastHub.IsControlled(brain)));
bp.ToProto(brain.name, true));
academyParameters.EnvironmentParameters =
new CommunicatorObjects.EnvironmentParametersProto();
foreach (var key in resetParameters.Keys)

);
}
var pythonParameters = m_BrainBatcher.SendAcademyParameters(academyParameters);
Random.InitState(pythonParameters.Seed);
// We try to exchange the first message with Python. If this fails, it means
// no Python Process is ready to train the environment. In this case, the
//environment must use Inference.
try
{
var pythonParameters = m_BrainBatcher.SendAcademyParameters(academyParameters);
Random.InitState(pythonParameters.Seed);
}
catch
{
communicator = null;
m_BrainBatcher = new Batcher(null);
m_IsCommunicatorOn = false;
foreach (var trainingBrain in controlledBrains)
{
trainingBrain.SetBatcher(null);
}
}
}
// If a communicator is enabled/provided, then we assume we are in

AgentAct += () => { };
AgentForceReset += () => { };
// Configure the environment using the configurations provided by
// the developer in the Editor.
SetIsInference(!m_BrainBatcher.GetIsTraining());

private void UpdateResetParameters()
{
var newResetParameters = m_BrainBatcher.GetEnvironmentParameters();
var newResetParameters = m_BrainBatcher?.GetEnvironmentParameters();
if (newResetParameters != null)
{
foreach (var kv in newResetParameters.FloatParameters)

AgentSetStatus(m_StepCount);
AgentResetIfDone();
using (TimerStack.Instance.Scoped("AgentResetIfDone"))
{
AgentResetIfDone();
}
AgentSendState();
using (TimerStack.Instance.Scoped("AgentSendState"))
{
AgentSendState();
}
BrainDecideAction();
using (TimerStack.Instance.Scoped("BrainDecideAction"))
{
BrainDecideAction();
}
AcademyStep();
using (TimerStack.Instance.Scoped("AcademyStep"))
{
AcademyStep();
}
AgentAct();
using (TimerStack.Instance.Scoped("AgentAct"))
{
AgentAct();
}
m_StepCount += 1;
m_TotalStepCount += 1;

// Signal to listeners that the academy is being destroyed now
DestroyAction();
// TODO - Pass worker ID or some other identifier,
// so that multiple envs won't overwrite each others stats.
TimerStack.Instance.SaveJsonTimers();
}
}
}

7
UnitySDK/Assets/ML-Agents/Scripts/Agent.cs


m_Info.storedTextActions = m_Action.textActions;
m_Info.vectorObservation.Clear();
m_ActionMasker.ResetMask();
CollectObservations();
using (TimerStack.Instance.Scoped("CollectObservations"))
{
CollectObservations();
}
m_Info.actionMasks = m_ActionMasker.GetMask();
var param = brain.brainParameters;

}
/// <summary>
/// Sets the status of the agent. Will request decisions or actions according
/// Sets the status of the agent. Will request decisions or actions according
/// to the Academy's stepcount.
/// </summary>
/// <param name="academyStepCounter">Number of current steps in episode</param>

25
UnitySDK/Assets/ML-Agents/Scripts/Brain.cs


protected Dictionary<Agent, AgentInfo> m_AgentInfos =
new Dictionary<Agent, AgentInfo>(1024);
protected Batcher m_BrainBatcher;
/// <summary>
/// Sets the Batcher of the Brain. The brain will call the batcher at every step and give
/// it the agent's data using SendBrainInfo at each DecideAction call.
/// </summary>
/// <param name="batcher"> The Batcher the brain will use for the current session</param>
public void SetBatcher(Batcher batcher)
{
if (batcher == null)
{
m_BrainBatcher = null;
}
else
{
m_BrainBatcher = batcher;
m_BrainBatcher.SubscribeBrain(name);
}
LazyInitialize();
}
/// <summary>
/// Adds the data of an agent to the current batch so it will be processed in DecideAction.

if (m_IsInitialized)
{
m_AgentInfos.Clear();
m_IsInitialized = false;
}
}

/// </summary>
private void BrainDecideAction()
{
m_BrainBatcher?.SendBrainInfo(name, m_AgentInfos);
/// Is called only once at the begening of the training or inference session.
/// Is called only once at the beginning of the training or inference session.
/// </summary>
protected abstract void Initialize();

36
UnitySDK/Assets/ML-Agents/Scripts/BroadcastHub.cs


[System.Serializable]
public class BroadcastHub
{
[SerializeField]
public List<Brain> broadcastingBrains = new List<Brain>();
private List<Brain> m_BrainsToControl = new List<Brain>();
public List<LearningBrain> brainsToControl = new List<LearningBrain>();
/// <summary>
/// The number of Brains inside the BroadcastingHub.

get { return broadcastingBrains.Count; }
}
/// <summary>
/// Checks that a given Brain is set to be remote controlled.
/// </summary>
/// <param name="brain"> The Brain that is beeing checked</param>
/// <returns>true if the Brain is set to Controlled and false otherwise. Will return
/// false if the Brain is not present in the Hub.</returns>
public bool IsControlled(Brain brain)
{
return m_BrainsToControl.Contains(brain);
get { return brainsToControl.Count; }
}
/// <summary>

/// <param name="controlled"> if true, the Brain will be set to remote controlled. Otherwise
/// the brain will be set to broadcast only.</param>
public void SetControlled(Brain brain, bool controlled)
public void SetControlled(LearningBrain brain)
if (broadcastingBrains.Contains(brain))
if (!brainsToControl.Contains(brain))
if (controlled && !m_BrainsToControl.Contains(brain))
{
m_BrainsToControl.Add(brain);
}
if (!controlled && m_BrainsToControl.Contains(brain))
{
m_BrainsToControl.Remove(brain);
}
brainsToControl.Add(brain);
}
}

public void Clear()
{
broadcastingBrains.Clear();
m_BrainsToControl.Clear();
brainsToControl.Clear();
}
}
}

13
UnitySDK/Assets/ML-Agents/Scripts/Grpc/GrpcExtensions.cs


foreach (var obs in ai.visualObservations)
{
agentInfoProto.VisualObservations.Add(
ByteString.CopyFrom(obs.EncodeToPNG())
);
using (TimerStack.Instance.Scoped("encodeVisualObs"))
{
agentInfoProto.VisualObservations.Add(
ByteString.CopyFrom(obs.EncodeToPNG())
);
}
/// <summary>
/// Converts a Brain into to a Protobuff BrainInfoProto so it can be sent
/// </summary>

};
return demoProto;
}
/// <summary>
/// Initialize metadata values based on proto object.
/// </summary>

18
UnitySDK/Assets/ML-Agents/Scripts/LearningBrain.cs


/// <summary>
/// The Learning Brain works differently if you are training it or not.
/// When training your Agents, drag the Learning Brain to the Academy's BroadcastHub and check
/// the checkbox Control. When using a pretrained model, just drag the Model file into the
/// When training your Agents, drag the Learning Brain to the Academy's BroadcastHub.
/// When using a pretrained model, just drag the Model file into the
/// The training will start automatically if Python is ready to train and there is at
/// least one LearningBrain in the BroadcastHub.
/// The property model corresponds to the Model currently attached to the Brain. Before
/// being used, a call to ReloadModel is required.
/// When the Learning Brain is not training, it uses a TensorFlow model to make decisions.

[CreateAssetMenu(fileName = "NewLearningBrain", menuName = "ML-Agents/Learning Brain")]
public class LearningBrain : Brain
{
private Batcher m_Batcher;
private ITensorAllocator m_TensorAllocator;
private TensorGenerator m_TensorGenerator;
private TensorApplier m_TensorApplier;

private IReadOnlyList<TensorProxy> m_InferenceInputs;
private IReadOnlyList<TensorProxy> m_InferenceOutputs;
[NonSerialized]
private bool m_IsControlled;
public void SetToControlledExternally()
public void SetBatcher(Batcher batcher)
m_IsControlled = true;
m_Batcher = batcher;
m_Batcher?.SubscribeBrain(name);
}
/// <inheritdoc />

/// <inheritdoc />
protected override void DecideAction()
{
if (m_IsControlled)
m_Batcher?.SendBrainInfo(name, m_AgentInfos);
if (m_Batcher != null)
{
m_AgentInfos.Clear();
return;

9
docs/Basic-Guide.md


if you want to [use an executable](Learning-Environment-Executable.md) or to
`None` if you want to interact with the current scene in the Unity Editor.
Before building the environment or interacting with it in the editor, select `Ball3DAcademy` in the **Hierarchy** window of the Unity editor and make sure `Control` checkbox is checked under `Ball 3D Academy` component.
Before building the environment or interacting with it in the editor, select `Ball3DAcademy` in the **Hierarchy** window of the Unity editor and make sure the `3DBallLearningBrain` is in the Broadcast Hub of the `Ball3DAcademy` component.
More information and documentation is provided in the
[Python API](Python-API.md) page.

**Note**: The Unity prefab system will modify all instances of the agent properties in your scene. If the agent does not synchronize automatically with the prefab, you can hit the Revert button in the top of the **Inspector** window.
2. In the **Hierarchy** window, select `Ball3DAcademy`.
3. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall/Brains` folder and drag the **3DBallLearning** Brain to the `Brains` property under `Braodcast Hub` in the `Ball3DAcademy` object in the **Inspector** window. In order to train, make sure the `Control` checkbox is selected.
3. In the **Project** window, go to `Assets/ML-Agents/Examples/3DBall/Brains` folder and drag the **3DBallLearning** Brain to the `Brains` property under `Braodcast Hub` in the `Ball3DAcademy` object in the **Inspector** window.
The `Control` checkbox means that in addition to being exposed to Python, the Brain will
be controlled by the Python process (required for training).
![Set Brain to External](images/mlagents-SetBrainToTrain.png)

4. Drag the `<brain_name>.nn` file from the Project window of
the Editor to the **Model** placeholder in the **3DBallLearning**
inspector window.
5. Select Ball3DAcademy in the scene and toggle off Control, each platform's brain now regains control.
6. Press the :arrow_forward: button at the top of the Editor.
5. Press the :arrow_forward: button at the top of the Editor.
## Next Steps

6
docs/FAQ.md


There may be a number of possible causes:
* _Cause_: There may be no LearningBrain with `Control` option checked in the
* _Cause_: There may be no LearningBrain in the
`Broadcast Hub`, and drag your LearningBrain asset into the `Brains` field,
and check the `Control` toggle. Also you need to assign this LearningBrain
`Broadcast Hub`, and drag your LearningBrain asset into the `Brains` field.
Also you need to assign this LearningBrain
asset to all of the Agents you wish to do training on.
* _Cause_: On OSX, the firewall may be preventing communication with the
environment. _Solution_: Add the built environment binary to the list of

5
docs/Getting-Started-with-Balance-Ball.md


properties that control how the environment works.
The **Broadcast Hub** keeps track of which Brains will send data during training.
If a Brain is added to the hub, the data from this Brain will be sent to the external training
process. If the `Control` checkbox is checked, the training process will be able to
control and train the agents linked to the Brain.
process.
The **Training Configuration** and **Inference Configuration** properties
set the graphics and timescale properties for the Unity application.
The Academy uses the **Training Configuration** during training and the

You can create new Brain assets by selecting `Assets ->
Create -> ML-Agents -> Brain`. There are 3 types of Brains.
The **Learning Brain** is a Brain that uses a trained neural network to make decisions.
When the `Control` box is checked in the Brains property under the **Broadcast Hub** in the Academy, the external process that is training the neural network will take over decision making for the agents
When the **Learning Brain** is dragged into the **Broadcast Hub** in the Academy, the external process that is training the neural network will take over decision making for the agents
and ultimately generate a trained neural network. You can also use the
**Learning Brain** with a pre-trained model.
The **Heuristic** Brain allows you to hand-code the Agent logic by extending

4
docs/Learning-Environment-Create-New.md


5. Add your Agent subclasses to appropriate GameObjects, typically, the object
in the scene that represents the Agent in the simulation. Each Agent object
must be assigned a Brain object.
6. If training, check the `Control` checkbox in the BroadcastHub of the Academy.
6. If training, drag the Brain in the BroadcastHub of the Academy.
[run the training process](Training-ML-Agents.md).
**Note:** If you are unfamiliar with Unity, refer to

Now you can train the Agent. To get ready for training, you must first drag the
`RollerBallBrain` asset to the **RollerAgent** GameObject `Brain` field to change to the learning brain.
Then, select the Academy GameObject and check the `Control` checkbox for
Then, select the Academy GameObject and drag
the RollerBallBrain item in the **Broadcast Hub** list. From there, the process is
the same as described in [Training ML-Agents](Training-ML-Agents.md). Note that the
models will be created in the original ml-agents project folder, `ml-agents/models`.

3
docs/Learning-Environment-Design-Academy.md


![Academy Inspector](images/academy.png)
* `Broadcast Hub` - Gathers the Brains that will communicate with the external
process. Any Brain added to the Broadcast Hub will be visible from the external
process. In addition, if the checkbox `Control` is checked, the Brain will be
controllable from the external process and will thus be trainable.
process and controllable from the external process and will thus be trainable.
* `Configuration` - The engine-level settings which correspond to rendering
quality and engine speed.
* `Width` - Width of the environment window in pixels.

3
docs/Learning-Environment-Design-Agents.md


action, for example, move the agent in one direction or another. In order to
[train an agent using reinforcement learning](Learning-Environment-Design.md),
your agent must calculate a reward value at each action. The reward is used to
discover the optimal decision-making policy. (A reward is not used by already
trained agents or for imitation learning.)
discover the optimal decision-making policy.
The Brain class abstracts out the decision making logic from the Agent itself so
that you can use the same Brain in multiple Agents. How a Brain makes its

43
docs/Learning-Environment-Design-Brains.md


useful to test your Agent code.
During training, use a **Learning Brain**
and drag it into the Academy's `Broadcast Hub` with the `Control` checkbox checked.
and drag it into the Academy's `Broadcast Hub`.
project, add it to the **Model** property of the **Learning Brain** and uncheck
the `Control` checkbox of the `Broadcast Hub`.
project, add it to the **Model** property of the **Learning Brain**.
Brain assets has several important properties that you can set using the
Inspector window. These properties must be appropriate for the Agents using the

actions for the Brain.
The other properties of the Brain depend on the type of Brain you are using.
## Using the Broadcast Feature
The Player, Heuristic and Learning Brains can support
broadcast to an external process. The broadcast feature allows you to collect data
from your Agents using a Python program without controlling them.
### How to use: Unity
To turn it on in Unity, drag the Brain into the Academy's Broadcast Hub but leave
the `Control` checkbox unchecked when present. This will expose the Brain's data
without letting the external process control it.
![Broadcast](images/broadcast.png)
### How to use: Python
When you launch your Unity Environment from a Python program, you can see what
the Agents connected to Brains present in the `Broadcast Hub` are doing.
When calling `step` or
`reset` on your environment, you retrieve a dictionary mapping Brain names to
`BrainInfo` objects. The dictionary contains a `BrainInfo` object for each
Brain in the `Broadcast Hub`.
Just like with a Learning Brain, the `BrainInfo` object contains the fields for
`visual_observations`, `vector_observations`, `text_observations`,
`memories`,`rewards`, `local_done`, `max_reached`, `agents` and
`previous_actions`. Note that `previous_actions` corresponds to the actions that
were taken by the Agents at the previous step, not the current one.
Note that when you do a `step` on the environment, you can only provide actions
for the Brains in the `Broadcast Hub` with the `Control` checkbox checked. If there
are Brains in the `Broadcast Hub` with the
`Control` checkbox checked, simply call `step()` with no arguments.
You can use the broadcast feature to collect data generated by Player,
Heuristics or Learning Brains game sessions. You can then use this data to train
an agent in a supervised context.

12
docs/Learning-Environment-Design-Learning-Brains.md


# Learning Brains
The **Learning Brain** works differently if you are training it or not.
When training your Agents, drag the **Learning Brain** to the
Academy's `Broadcast Hub` and check the checkbox `Control`. When using a pre-trained
model, just drag the Model file into the `Model` property of the **Learning Brain**.
When used in an environment connected to Python, the Python process will train
the Brain. If no Python Process exists, the **Learning Brain** will use its
pre-trained model.
one Brain asset must be in the Academy's `Broadcast Hub` with the checkbox `Control`
checked. This allows the training process to collect the observations of Agents
using that Brain and give the Agents their actions.
one Brain asset must be in the Academy's `Broadcast Hub`. This allows the training
process to collect the observations of Agents using that Brain and give the Agents
their actions.
In addition to using a **Learning Brain** for training using the ML-Agents learning
algorithms, you can use a **Learning Brain** to control Agents in a Unity

4
docs/Learning-Environment-Design-Player-Brains.md


# Player Brain
The **Player Brain** allows you to control an Agent using keyboard
commands. You can use Player Brains to control a "teacher" Agent that trains
other Agents during [imitation learning](Training-Imitation-Learning.md). You
commands. You can use Player Brains to record demonstrations in order to train
other Agents with [imitation learning](Training-Imitation-Learning.md). You
can also use Player Brains to test your Agents and environment before replacing them by **Learning Brains** and running the training process.
## Player Brain properties

10
docs/Learning-Environment-Design.md


To Create a Brain, go to `Assets -> Create -> Ml-Agents` and select the
type of Brain you want to use. During training, use a **Learning Brain**
and drag it into the Academy's `Broadcast Hub` with the `Control` checkbox checked.
and drag it into the Academy's `Broadcast Hub`.
project, add it to the **Model** property of the **Learning Brain** and uncheck
the `Control` checkbox of the `Broadcast Hub`. See
project, add it to the **Model** property of the **Learning Brain**.
If the Python process is not active, the **Learning Brain** will not train but
use its model. See
[Brains](Learning-Environment-Design-Brains.md) for details on using the
different types of Brains. You can create new kinds of Brains if the three
built-in don't do what you need.

* The training scene must start automatically when your Unity application is
launched by the training process.
* The scene must include an Academy with at least one Brain in the `Broadcast Hub`
with the `Control` checkbox checked.
* The scene must include an Academy with at least one Brain in the `Broadcast Hub`.
* The Academy must reset the scene to a valid starting point for each episode of
training.
* A training episode must have a definite end — either using `Max Steps` or by

7
docs/Learning-Environment-Examples.md


# Example Learning Environments
# Example Learning Environments
The Unity ML-Agents toolkit contains an expanding set of example environments
which demonstrate various features of the platform. Environments are located in

* Recommended Minimum: 0.2
* Recommended Maximum: 5
* Benchmark Mean Reward: 2.5
* Optional Imitation Learning scene: `TennisIL`.
## [Push Block](https://youtu.be/jKdw216ZgoE)

* Recommended Minimum: 0
* Recommended Maximum: 2000
* Benchmark Mean Reward: 4.5
* Optional Imitation Learning scene: `PushBlockIL`.
## [Wall Jump](https://youtu.be/NITLug2DIWQ)

* Recommended Minimum: 0.5
* Recommended Maximum: 5
* Benchmark Mean Reward: 10
* Optional Imitation Learning scene: `FoodCollectorIL`.
## [Hallway](https://youtu.be/53GyfpPQRUQ)

* Reset Parameters: None
* Benchmark Mean Reward: 0.7
* To speed up training, you can enable curiosity by adding `use_curiosity: true` in `config/trainer_config.yaml`
* Optional Imitation Learning scene: `HallwayIL`.
## [Bouncer](https://youtu.be/Tkv-c-b1b2I)

this environment does not train with the provided default
training parameters.__
* Reset Parameters: None
* Optional Imitation Learning scene: `PyramidsIL`.
* Benchmark Mean Reward: 1.75

2
docs/Learning-Environment-Executable.md


Make sure the Brains in the scene have the right type. For example, if you want
to be able to control your agents from Python, you will need to put the Brain
controlling the Agents to be a **Learning Brain** and drag it into the
Academy's `Broadcast Hub` with the `Control` checkbox checked. In the 3DBall
Academy's `Broadcast Hub`. In the 3DBall
scene, this can be done in the Platform GameObject within the Game prefab in
`Assets/ML-Agents/Examples/3DBall/Prefabs/`, or in each instance of the
Platform in the Scene.

10
docs/ML-Agents-Overview.md


[TensorFlow](Background-TensorFlow.md) model. The embedded TensorFlow model
represents a learned policy and the Brain directly uses this model to
determine the action for each Agent. You can train a **Learning Brain**
by dragging it into the Academy's `Broadcast Hub` with the `Control`
checkbox checked.
by dragging it into the Academy's `Broadcast Hub` and launching the game with
the Python training process.
- **Player** - where decisions are made using real input from a keyboard or
controller. Here, a human player is controlling the Agent and the observations
and rewards collected by the Brain are not used to control the Agent.

a TensorFlow model that the Learning Brain can later use. However,
any user of the ML-Agents toolkit can leverage their own algorithms for
training. In this case, the Brain type would be set to Learning and be linked
to the BroadcastHub (with checked `Control` checkbox)
to the BroadcastHub
and the behaviors of all the Agents in the scene will be controlled within Python.
You can even turn your environment into a [gym.](../gym-unity/README.md)

this mode allows providing real examples from a game controller on how the medic
should behave. More specifically, in this mode, the Brain type during training
is set to Player and all the actions performed with the controller (in addition
to the agent observations) will be recorded and sent to the Python API. The
to the agent observations) will be recorded. The
to help speed up reward-based training (RL). We include two algorithms called
to help speed up reward-based training (RL). We include two algorithms called
Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL). The
[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers these
features in more depth.

3
docs/Migrating.md


### Important Changes
* The definition of the gRPC service has changed.
* The online BC training feature has been removed.
* The BroadcastHub of the Academy no longer has a `Control` checkbox. All Learning Brains in the BroadcastHub will be considered as trainable (although the training will only be launched if the Python Process is ready and will use inference otherwise)
* The broadcast feature has been deprecated. Only LearningBrains can communicate with Python.
#### Steps to Migrate
* In order to be able to train, make sure both your ML-Agents Python package and UnitySDK code come from the v0.11 release. Training will not work, for example, if you update the ML-Agents Python package, and only update the API Version in UnitySDK.

10
docs/Python-API.md


the ML-Agents SDK.
To communicate with an Agent in a Unity environment from a Python program, the
Agent must either use a Brain present in the Academy's `Broadcast Hub`.
Agent mus use a LearningBrain present in the Academy's `Broadcast Hub`.
actions for Agents with Brains with the `Control` checkbox of the
Academy's `Broadcast Hub` checked, but can only observe broadcasting
Brains (the information you receive for an Agent is the same in both cases).
actions for Agents with Brains in the
Academy's `Broadcast Hub`..
_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network

observations = brainInfo.vector_observations
```
Note that if you have more than one Brain in the Academy's `Broadcast Hub` with
the `Control` checkbox checked, you
Note that if you have more than one Brain in the Academy's `Broadcast Hub`, you
must provide dictionaries from Brain names to arrays for `action`, `memory`
and `value`. For example: If you have two Learning Brains named `brain1` and
`brain2` each with one Agent taking two continuous actions, then you can

63
docs/Training-Behavioral-Cloning.md


1. Choose an agent you would like to learn to imitate some set of demonstrations.
2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)).
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to
Control in the Broadcast Hub. For more information on Brains, see
3. Build the scene, assigning the agent a Learning Brain, and dragging it in the Broadcast Hub. For more information on Brains, see
[here](Learning-Environment-Design-Brains.md).
4. Open the `config/offline_bc_config.yaml` file.
5. Modify the `demo_path` parameter in the file to reference the path to the

This will use the demonstration file to train a neural network driven agent
to directly imitate the actions provided in the demonstration. The environment
will launch and be used for evaluating the agent's performance during training.
## Online Training
It is also possible to provide demonstrations in realtime during training,
without pre-recording a demonstration file. The steps to do this are as follows:
1. First create two Brains, one which will be the "Teacher," and the other which
will be the "Student." We will assume that the names of the Brain
Assets are "Teacher" and "Student" respectively.
2. The "Teacher" Brain must be a **Player Brain**. You must properly
configure the inputs to map to the corresponding actions.
3. The "Student" Brain must be a **Learning Brain**.
4. The Brain Parameters of both the "Teacher" and "Student" Brains must be
compatible with the agent.
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
and check the `Control` checkbox on the "Student" Brain.
6. Link the Brains to the desired Agents (one Agent as the teacher and at least
one Agent as a student).
7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
the `trainer` parameter of this entry to `online_bc`, and the
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
Additionally, set `batches_per_epoch`, which controls how much training to do
each moment. Increase the `max_steps` option if you'd like to keep training
the Agents for a longer period of time.
8. Launch the training process with `mlagents-learn config/online_bc_config.yaml
--train --slow`, and press the :arrow_forward: button in Unity when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
9. From the Unity window, control the Agent with the Teacher Brain by providing
"teacher demonstrations" of the behavior you would like to see.
10. Watch as the Agent(s) with the student Brain attached begin to behave
similarly to the demonstrations.
11. Once the Student Agents are exhibiting the desired behavior, end the training
process with `CTL+C` from the command line.
12. Move the resulting `*.nn` file into the `TFModels` subdirectory of the
Assets folder (or a subdirectory within Assets of your choosing) , and use
with `Learning` Brain.
**BC Teacher Helper**
We provide a convenience utility, `BC Teacher Helper` component that you can add
to the Teacher Agent.
<p align="center">
<img src="images/bc_teacher_helper.png"
alt="BC Teacher Helper"
width="375" border="10" />
</p>
This utility enables you to use keyboard shortcuts to do the following:
1. To start and stop recording experiences. This is useful in case you'd like to
interact with the game _but not have the agents learn from these
interactions_. The default command to toggle this is to press `R` on the
keyboard.
2. Reset the training buffer. This enables you to instruct the agents to forget
their buffer of recent experiences. This is useful if you'd like to get them
to quickly learn a new behavior. The default command to reset the buffer is
to press `C` on the keyboard.

5
docs/Training-Imitation-Learning.md


on the PPO trainer, in addition to using a small GAIL reward signal.
* To train an agent to exactly mimic demonstrations, you can use the
[Behavioral Cloning](Training-Behavioral-Cloning.md) trainer. Behavioral Cloning can be
used offline and online (in-editor), and learns very quickly. However, it usually is ineffective
used with demonstrations (in-editor), and learns very quickly. However, it usually is ineffective
on more complex environments without a large number of demonstrations.
### How to Choose

if you have few (<10) episodes of demonstrations. An example of this is provided for the Crawler example
environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
If you have plenty of demonstrations and/or a very simple environment, Behavioral Cloning
(online and offline) can be effective and quick. However, it cannot be combined with RL.
If you have plenty of demonstrations and/or a very simple environment, Offline Behavioral Cloning can be effective and quick. However, it cannot be combined with RL.
## Recording Demonstrations

24
docs/Training-ML-Agents.md


### Training Config File
The training config files `config/trainer_config.yaml`, `config/sac_trainer_config.yaml`,
`config/gail_config.yaml`, `config/online_bc_config.yaml` and `config/offline_bc_config.yaml`
specifies the training method, the hyperparameters, and a few additional values to use when
training with Proximal Policy Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial
Imitation Learning) with PPO, and online and offline Behavioral Cloning(BC)/Imitation. These files are
divided into sections. The **default** section defines the default values for all the available
settings. You can also add new sections to override these defaults to train
specific Brains. Name each of these override sections after the GameObject
containing the Brain component that should use these settings. (This GameObject
will be a child of the Academy in your scene.) Sections for the example
environments are included in the provided config file.
`config/gail_config.yaml` and `config/offline_bc_config.yaml` specifies the training method,
the hyperparameters, and a few additional values to use when training with Proximal Policy
Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial Imitation Learning)
with PPO, and online and offline Behavioral Cloning(BC)/Imitation. These files are divided
into sections. The **default** section defines the default values for all the available
training with PPO, SAC, GAIL (with PPO), and offline BC. These files are divided into sections.
The **default** section defines the default values for all the available settings. You can
also add new sections to override these defaults to train specific Brains. Name each of these
override sections after the GameObject containing the Brain component that should use these
settings. (This GameObject will be a child of the Academy in your scene.) Sections for the
example environments are included in the provided config file.
| **Setting** | **Description** | **Applies To Trainer\*** |
| :------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------- |

| brain\_to\_imitate | For online imitation learning, the name of the GameObject containing the Brain component to imitate. | (online)BC |
| demo_path | For offline imitation learning, the file path of the recorded demonstration file | (offline)BC |
| buffer_size | The number of experiences to collect before updating the policy model. In SAC, the max size of the experience buffer. | PPO, SAC |
| buffer_init_steps | The number of experiences to collect into the buffer before updating the policy model. | SAC |

| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC |
| normalize | Whether to automatically normalize observations. | PPO, SAC |
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
<<<<<<< HEAD
| num_layers | The number of hidden layers in the neural network. | PPO, SAC, BC |
| pretraining | Use demonstrations to bootstrap the policy neural network. See [Pretraining Using Demonstrations](Training-PPO.md#optional-pretraining-using-demonstrations). | PPO, SAC |
| reward_signals | The reward signals used to train the policy. Enable Curiosity and GAIL here. See [Reward Signals](Reward-Signals.md) for configuration options. | PPO, SAC, BC |

| num_update | Number of mini-batches to update the agent with during each update. | SAC |
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC |
\*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral Cloning (Imitation)
For specific advice on setting hyperparameters based on the type of training you
are conducting, see:

2
docs/Unity-Inference-Engine.md


When using a **Learning Brain**, drag the `.nn` file into the **Model** field
in the Inspector.
Uncheck the `Control` checkbox for the corresponding **Brain** in the
**BroadcastHub** of the Academy.
Select the **Inference Device** : CPU or GPU you want to use for Inference.
**Note:** For most of the models generated with the ML-Agents toolkit, CPU will be faster than GPU.

10
gym-unity/gym_unity/envs/__init__.py


high = np.array([np.inf] * brain.vector_observation_space_size)
self.action_meanings = brain.vector_action_descriptions
if self.use_visual:
if brain.camera_resolutions[0]["blackAndWhite"]:
depth = 1
else:
depth = 3
brain.camera_resolutions[0]["height"],
brain.camera_resolutions[0]["width"],
depth,
brain.camera_resolutions.height,
brain.camera_resolutions.width,
brain.camera_resolutions.num_channels,
),
)
else:

23
ml-agents-envs/mlagents/envs/brain.py


from mlagents.envs.communicator_objects.agent_info_pb2 import AgentInfoProto
from mlagents.envs.communicator_objects.brain_parameters_pb2 import BrainParametersProto
from mlagents.envs.timers import hierarchical_timer, timed
from typing import Dict, List, Optional
from typing import Dict, List, NamedTuple, Optional
class CameraResolution(NamedTuple):
height: int
width: int
gray_scale: bool
@property
def num_channels(self) -> int:
return 1 if self.gray_scale else 3
@staticmethod
def from_proto(p):
return CameraResolution(height=p.height, width=p.width, gray_scale=p.gray_scale)
class BrainParameters:
def __init__(
self,

camera_resolutions: List[Dict],
camera_resolutions: List[CameraResolution],
vector_action_space_size: List[int],
vector_action_descriptions: List[str],
vector_action_space_type: int,

:return: BrainParameter object.
"""
resolution = [
{"height": x.height, "width": x.width, "blackAndWhite": x.gray_scale}
for x in brain_param_proto.camera_resolutions
CameraResolution.from_proto(x) for x in brain_param_proto.camera_resolutions
]
brain_params = BrainParameters(
</