Merge branch 'main' into develop-agentprocessor-teammanager

3 年前 · 61781a1a
--- a/Project/Assets/ML-Agents/Examples/Match3/Prefabs/Match3VisualObs.prefab
+++ b/Project/Assets/ML-Agents/Examples/Match3/Prefabs/Match3VisualObs.prefab
    VectorActionDescriptions: []
    VectorActionSpaceType: 0
    hasUpgradedBrainParametersWithActionSpec: 1
-  m_Model: {fileID: 11400000, guid: 48d14da88fea74d0693c691c6e3f2e34, type: 3}
+  m_Model: {fileID: 11400000, guid: 28ccdfd7cb3d941ce8af0ab89e06130a, type: 3}
  m_InferenceDevice: 2
  m_BehaviorType: 0
  m_BehaviorName: Match3VisualObs
--- a/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VectorObs.onnx
+++ b/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VectorObs.onnx
--- a/README.md
+++ b/README.md

 # Unity ML-Agents Toolkit

-[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/)
+[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/)

 [![license badge](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)


 ## Features

- 15+ [example Unity environments](docs/Learning-Environment-Examples.md)
+- 18+ [example Unity environments](docs/Learning-Environment-Examples.md)
- Built-in support for Imitation Learning through Behavioral Cloning or
-  Generative Adversarial Imitation Learning
+- Built-in support for Imitation Learning through Behavioral Cloning (BC) or
+  Generative Adversarial Imitation Learning (GAIL)
 - Self-play mechanism for training agents in adversarial scenarios
 - Easily definable Curriculum Learning scenarios for complex tasks
 - Train robust agents using environment randomization

 ## Releases & Documentation

-**Our latest, stable release is `Release 12`. Click
-[here](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Readme.md)
+
+**Our latest, stable release is `Release 13`. Click
+[here](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/Readme.md)
 to get started with the latest release of ML-Agents.**

 The table below lists all our releases, including our `master` branch which is
 | **Version** | **Release Date** | **Source** | **Documentation** | **Download** | **Python Package** | **Unity Package** |
 |:-------:|:------:|:-------------:|:-------:|:------------:|:------------:|:------------:|
 | **master (unstable)** | -- | [source](https://github.com/Unity-Technologies/ml-agents/tree/master) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/master/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/master.zip) | -- | -- |
-| **Release 12** | **December 22, 2020** | **[source](https://github.com/Unity-Technologies/ml-agents/tree/release_12)** | **[docs](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Readme.md)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/release_12.zip)** | **[0.23.0](https://pypi.org/project/mlagents/0.23.0/)** | **[1.7.2](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.7/manual/index.html)** |
+| **Release 13** | **February 17, 2021** | **[source](https://github.com/Unity-Technologies/ml-agents/tree/release_13)** | **[docs](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/Readme.md)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/release_13.zip)** | **[0.24.0](https://pypi.org/project/mlagents/0.24.0/)** | **[1.8.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.8/manual/index.html)** |
+| **Release 12** | December 22, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/release_12) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/release_12.zip) | [0.23.0](https://pypi.org/project/mlagents/0.23.0/) | [1.7.2](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.7/manual/index.html) |
 | **Release 11** | December 21, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/release_11) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/release_11_docs/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/release_11.zip) | [0.23.0](https://pypi.org/project/mlagents/0.23.0/) | [1.7.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.7/manual/index.html) |
 | **Release 10** | November 18, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/release_10) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/release_10_docs/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/release_10.zip) | [0.22.0](https://pypi.org/project/mlagents/0.22.0/) | [1.6.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.6/manual/index.html) |
 | **Verified Package 1.0.6** | **November 16, 2020** | **[source](https://github.com/Unity-Technologies/ml-agents/tree/com.unity.ml-agents_1.0.6)** | **[docs](https://github.com/Unity-Technologies/ml-agents/blob/release_2_verified_docs/docs/Readme.md)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/com.unity.ml-agents_1.0.6.zip)** | **[0.16.1](https://pypi.org/project/mlagents/0.16.1/)** | **[1.0.6](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.0/manual/index.html)** |
-| **Release 7** | September 16, 2020 | [source](https://github.com/Unity-Technologies/ml-agents/tree/release_7) | [docs](https://github.com/Unity-Technologies/ml-agents/tree/release_7_docs/docs/Readme.md) | [download](https://github.com/Unity-Technologies/ml-agents/archive/release_7.zip) | [0.20.0](https://pypi.org/project/mlagents/0.20.0/) | [1.4.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@1.4/manual/index.html) |

 If you are a researcher interested in a discussion of Unity as an AI platform,
 see a pre-print of our
--- a/com.unity.ml-agents.extensions/Documentation~/Grid-Sensor.md
+++ b/com.unity.ml-agents.extensions/Documentation~/Grid-Sensor.md

 # Contribution

+An image can be thought of as a matrix of a predefined width (W) and a height (H) and each pixel can be thought of as simply an array of length 3 (in the case of RGB), `[Red, Green, Blue]` holding the different channel information of the color (channel) intensities at that pixel location. Thus an image is just a 3 dimensional matrix of size WxHx3. A Grid Observation can be thought of as a generalization of this setup where in place of a pixel there is a "cell" which is an array of length N representing different channel intensities at that cell position. From a Convolutional Neural Network point of view, the introduction of multiple channels in an "image" isn't a new concept. One such example is using an RGB-Depth image which is used in several robotics applications. The distinction of Grid Observations is what the data within the channels represents. Instead of limiting the channels to color intensities, the channels within a cell of a Grid Observation generalize to any data that can be represented by a single number (float or int).
+
+Before jumping into the details of the Grid Sensor, an important thing to note is the agent performance and qualitatively different behavior over raycasts. Unity MLAgent's comes with a suite of example environments. One in particular, the [Food Collector](https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Examples.md#food-collector), has been the focus of the Grid Sensor development.
+
+The Food Collector environment can be described as:
+* Set-up: A multi-agent environment where agents compete to collect food.
+* Goal: The agents must learn to collect as many green food spheres as possible while avoiding red spheres.
+* Agents: The environment contains 5 agents with same Behavior Parameters.
+
+When applying the Grid Sensor to this environment, in place of the Raycast Vector Sensor or the Camera Sensor, a Mean Reward of 40-50 is observed. This performance is on par with what is seen by agents trained with RayCasts but the side-by-side comparison of trained agents, shows a qualitative difference in behavior. A deeper study and interpretation of the qualitative differences between agents trained with Raycasts and Vector Sensors verses Grid Sensors is left to future studies.
+
+<img src="images/gridobs-vs-vectorobs.gif" align="middle" width="3000"/>
+
 ## Overview

 There are three main phases to the observation process of the Grid Sensor:

 ### Channel Based

-The Channel Based Grid Observations represent obsevations in a normalized form with 0 to 1. To distinguish between categorical and continuous data, one would use the ChannelDepth array to signify the ranges that the values in the `channelValues` array could take. If one sets ChannelDepth[i] to be 1, it is assumed that the value of `channelValues[i]` is already normalized. Else ChannelDepth[i] represents the total number of possible values that `channelValues[i]` can take and will be used for normalization.
+The Channel Based Grid Observations is perhaps the simplest in terms of usability and similarity with other machine learning applications. Each grid is of size WxHxC where C is the number of channels. To distinguish between categorical and continuous data, one would use the ChannelDepth array to signify the ranges that the values in the `channelValues` array could take. If one sets ChannelDepth[i] to be 1, it is assumed that the value of `channelValues[i]` is already normalized. Else ChannelDepth[i] represents the total number of possible values that `channelValues[i]` can take.
+As the "enemy" is in the second position of the observed tags, its value can be normalized by:
 For ObjectType, "weapon", "enemy" will be represented respectively as:
 ```
 weapon = DetectableObjects.IndexOfTag("weapon")/ChannelDepth[0] = 1/2 = 0.5;
--- a/com.unity.ml-agents.extensions/Documentation~/Match3.md
+++ b/com.unity.ml-agents.extensions/Documentation~/Match3.md
 This implementation includes:

 * C# implementation catered toward a Match-3 setup including concepts around encoding for moves based on [Human Like Playtesting with Deep Learning](https://www.researchgate.net/publication/328307928_Human-Like_Playtesting_with_Deep_Learning)
-* An example Match-3 scene with ML-Agents implemented (located under /Project/Assets/ML-Agents/Examples/Match3). More information, on Match-3 example [here](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/docs/Learning-Environment-Examples.md#match-3).
+* An example Match-3 scene with ML-Agents implemented (located under /Project/Assets/ML-Agents/Examples/Match3). More information, on Match-3 example [here](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/docs/Learning-Environment-Examples.md#match-3).

 ### Feedback
 If you are a Match-3 developer and are trying to leverage ML-Agents for this scenario, [we want to hear from you](https://forms.gle/TBsB9jc8WshgzViU9). Additionally, we are also looking for interested Match-3 teams to speak with us for 45 minutes. If you are interested, please indicate that in the [form](https://forms.gle/TBsB9jc8WshgzViU9).  If selected, we will provide gift cards as a token of appreciation.
--- a/com.unity.ml-agents.extensions/Documentation~/com.unity.ml-agents.extensions.md
+++ b/com.unity.ml-agents.extensions/Documentation~/com.unity.ml-agents.extensions.md
 recommended ways to install the package:

 ### Local Installation
-[Clone the repository](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Installation.md#clone-the-ml-agents-toolkit-repository-optional) and follow the
-[Local Installation for Development](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/Installation.md#advanced-local-installation-for-development-1)
+[Clone the repository](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/Installation.md#clone-the-ml-agents-toolkit-repository-optional) and follow the
+[Local Installation for Development](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/Installation.md#advanced-local-installation-for-development-1)
-![Package Manager git URL](https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/images/unity_package_manager_git_url.png)
-
+![Package Manager git URL](https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/images/unity_package_manager_git_url.png)
 In the dialog that appears, enter
 ```
 git+https://github.com/Unity-Technologies/ml-agents.git?path=com.unity.ml-agents.extensions
    - No way to customize the action space of the `InputActuatorComponent`

 ## Need Help?
-The main [README](https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/README.md) contains links for contacting the team or getting support.
+The main [README](https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/README.md) contains links for contacting the team or getting support.
--- a/com.unity.ml-agents.extensions/Runtime/Input/InputActionActuator.cs
+++ b/com.unity.ml-agents.extensions/Runtime/Input/InputActionActuator.cs
    /// <see cref="Agent"/>'s <see cref="BehaviorParameters"/> indicate that the Agent is running in Heuristic Mode,
    /// this Actuator will write actions from the <see cref="InputSystem"/> to the <see cref="ActionBuffers"/> object.
    /// </summary>
-    public class InputActionActuator : IActuator, IHeuristicProvider
+    public class InputActionActuator : IActuator, IHeuristicProvider, IBuiltInActuator
    {
        readonly BehaviorParameters m_BehaviorParameters;
        readonly InputAction m_Action;
        /// <param name="adaptor">The <see cref="IRLActionInputAdaptor"/> that will convert data between ML-Agents
        ///     and the <see cref="InputSystem"/>.</param>
        public InputActionActuator(InputDevice inputDevice, BehaviorParameters behaviorParameters,
-            InputAction action,
-            IRLActionInputAdaptor adaptor)
+                                   InputAction action,
+                                   IRLActionInputAdaptor adaptor)
        {
            m_BehaviorParameters = behaviorParameters;
            Name = $"InputActionActuator-{action.name}";
            Profiler.BeginSample("InputActionActuator.Heuristic");
            m_InputAdaptor.WriteToHeuristic(m_Action, actionBuffersOut);
            Profiler.EndSample();
+        }
+
+        /// <inheritdoc/>
+        public BuiltInActuatorType GetBuiltInActuatorType()
+        {
+            return BuiltInActuatorType.InputActionActuator;
        }
    }
 }
--- a/com.unity.ml-agents.extensions/Runtime/Input/InputActuatorComponent.cs
+++ b/com.unity.ml-agents.extensions/Runtime/Input/InputActuatorComponent.cs
    /// <see cref="InputActionActuator"/>s.
    /// </summary>
    [RequireComponent(typeof(PlayerInput), typeof(IInputActionAssetProvider))]
+    [AddComponentMenu("ML Agents/Input Actuator", (int)MenuGroup.Actuators)]
    public class InputActuatorComponent : ActuatorComponent
    {
        InputActionAsset m_InputAsset;
            }

            var inputControlScheme = new InputControlScheme(
-                 mlAgentsControlSchemeName,
+                mlAgentsControlSchemeName,
                deviceRequirements);

            return inputControlScheme;
                    var builder = new InputControlLayout.Builder()
                        .WithName(layoutName)
                        .WithFormat(mlAgentsLayoutFormat);
-                    for(var i = 0; i < defaultMap.actions.Count; i++)
+                    for (var i = 0; i < defaultMap.actions.Count; i++)
                    {
                        var action = defaultMap.actions[i];
                        builder.AddControl(action.name)
-
                }, layoutName);
            }
        }
--- a/com.unity.ml-agents.extensions/Runtime/Match3/Match3Actuator.cs
+++ b/com.unity.ml-agents.extensions/Runtime/Match3/Match3Actuator.cs
    /// Actuator for a Match3 game. It translates valid moves (defined by AbstractBoard.IsMoveValid())
    /// in action masks, and applies the action to the board via AbstractBoard.MakeMove().
    /// </summary>
-    public class Match3Actuator : IActuator, IHeuristicProvider
+    public class Match3Actuator : IActuator, IHeuristicProvider, IBuiltInActuator
    {
        protected AbstractBoard m_Board;
        protected System.Random m_Random;
        /// <inheritdoc/>
        public void ResetData()
        {
+        }
+
+        /// <inheritdoc/>
+        public BuiltInActuatorType GetBuiltInActuatorType()
+        {
+            return BuiltInActuatorType.Match3Actuator;
        }

        IEnumerable<int> InvalidMoveIndices()
        {
            return 1;
        }
-
    }
 }
--- a/com.unity.ml-agents.extensions/Runtime/Match3/Match3ActuatorComponent.cs
+++ b/com.unity.ml-agents.extensions/Runtime/Match3/Match3ActuatorComponent.cs
    /// <summary>
    /// Actuator component for a Match3 game. Generates a Match3Actuator at runtime.
    /// </summary>
+    [AddComponentMenu("ML Agents/Match 3 Actuator", (int)MenuGroup.Actuators)]
    public class Match3ActuatorComponent : ActuatorComponent
    {
        /// <summary>
--- a/com.unity.ml-agents.extensions/Runtime/Match3/Match3SensorComponent.cs
+++ b/com.unity.ml-agents.extensions/Runtime/Match3/Match3SensorComponent.cs
 using Unity.MLAgents.Sensors;
+using UnityEngine;

 namespace Unity.MLAgents.Extensions.Match3
 {
+    [AddComponentMenu("ML Agents/Match 3 Sensor", (int)MenuGroup.Sensors)]
    public class Match3SensorComponent : SensorComponent
    {
        /// <summary>
--- a/com.unity.ml-agents.extensions/Runtime/Sensors/GridSensor.cs
+++ b/com.unity.ml-agents.extensions/Runtime/Sensors/GridSensor.cs
    /// <summary>
    /// Grid-based sensor.
    /// </summary>
+    [AddComponentMenu("ML Agents/Grid Sensor", (int)MenuGroup.Sensors)]
    public class GridSensor : SensorComponent, ISensor, IBuiltInSensor
    {
        /// <summary>
        {
            return BuiltInSensorType.GridSensor;
        }
-

        /// <summary>
        /// GetCompressedObservation - Calls Perceive then puts the data stored on the perception buffer
--- a/com.unity.ml-agents.extensions/Tests/Runtime/Input/Unity.ML-Agents.Extensions.Input.Tests.Runtime.asmdef
+++ b/com.unity.ml-agents.extensions/Tests/Runtime/Input/Unity.ML-Agents.Extensions.Input.Tests.Runtime.asmdef
    "versionDefines": [
        {
            "name": "com.unity.inputsystem",
-            "expression": "1.1.0-preview",
+            "expression": "1.1.0",
            "define": "MLA_INPUT_TESTS"
        }
    ],
--- a/com.unity.ml-agents.extensions/package.json
+++ b/com.unity.ml-agents.extensions/package.json
 {
  "name": "com.unity.ml-agents.extensions",
  "displayName": "ML Agents Extensions",
-  "version": "0.0.1-preview",
+  "version": "0.1.0-preview",
  "unity": "2018.4",
  "description": "A source-only package for new features based on ML-Agents",
  "dependencies": {
--- a/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
+++ b/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
 [unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
 [unity inference engine]: https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
 [package manager documentation]: https://docs.unity3d.com/Manual/upm-ui-install.html
-[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Installation.md
+[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Installation.md
-[ML-Agents GitHub repo]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/com.unity.ml-agents.extensions
+[ML-Agents GitHub repo]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/com.unity.ml-agents.extensions
--- a/com.unity.ml-agents/Runtime/Academy.cs
+++ b/com.unity.ml-agents/Runtime/Academy.cs
 * API. For more information on each of these entities, in addition to how to
 * set-up a learning environment and train the behavior of characters in a
 * Unity scene, please browse our documentation pages on GitHub:
- * https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/docs/
+ * https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/docs/
 */

 namespace Unity.MLAgents
    /// fall back to inference or heuristic decisions. (You can also set agents to always use
    /// inference or heuristics.)
    /// </remarks>
-    [HelpURL("https://github.com/Unity-Technologies/ml-agents/tree/release_12_docs/" +
+    [HelpURL("https://github.com/Unity-Technologies/ml-agents/tree/release_13_docs/" +
        "docs/Learning-Environment-Design.md")]
    public class Academy : IDisposable
    {
--- a/com.unity.ml-agents/Runtime/Actuators/ActionSpec.cs
+++ b/com.unity.ml-agents/Runtime/Actuators/ActionSpec.cs
        /// <param name="numContinuousActions">The number of continuous actions available.</param>
        /// <param name="discreteBranchSizes">The array of branch sizes for the discrete actions.  Each index
        /// contains the number of actions available for that branch.</param>
-        /// <returns>An ActionSpec initialized with the specified action sizes.</returns>
        public ActionSpec(int numContinuousActions = 0, int[] discreteBranchSizes = null)
        {
            m_NumContinuousActions = numContinuousActions;
--- a/com.unity.ml-agents/Runtime/Actuators/IActionReceiver.cs
+++ b/com.unity.ml-agents/Runtime/Actuators/IActionReceiver.cs
        ///
        /// See [Agents - Actions] for more information on masking actions.
        ///
-        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// </remarks>
        /// <seealso cref="IActionReceiver.OnActionReceived"/>
        void WriteDiscreteActionMask(IDiscreteActionMask actionMask);
--- a/com.unity.ml-agents/Runtime/Actuators/IDiscreteActionMask.cs
+++ b/com.unity.ml-agents/Runtime/Actuators/IDiscreteActionMask.cs
        ///
        /// See [Agents - Actions] for more information on masking actions.
        ///
-        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// </remarks>
        /// <param name="branch">The branch for which the actions will be masked.</param>
        /// <param name="actionIndices">The indices of the masked actions.</param>
--- a/com.unity.ml-agents/Runtime/Actuators/VectorActuator.cs
+++ b/com.unity.ml-agents/Runtime/Actuators/VectorActuator.cs
    /// <summary>
    /// IActuator implementation that forwards calls to an <see cref="IActionReceiver"/> and an <see cref="IHeuristicProvider"/>.
    /// </summary>
-    internal class VectorActuator : IActuator, IHeuristicProvider
+    internal class VectorActuator : IActuator, IHeuristicProvider, IBuiltInActuator
    {
        IActionReceiver m_ActionReceiver;
        IHeuristicProvider m_HeuristicProvider;

        /// <inheritdoc />
        public string Name { get; }
+
+        /// <inheritdoc />
+        public virtual BuiltInActuatorType GetBuiltInActuatorType()
+        {
+            return BuiltInActuatorType.VectorActuator;
+        }
    }
 }
--- a/com.unity.ml-agents/Runtime/Agent.cs
+++ b/com.unity.ml-agents/Runtime/Agent.cs
    }

    /// <summary>
+    /// Simple wrapper around VectorActuator that overrides GetBuiltInActuatorType
+    /// so that it can be distinguished from a standard VectorActuator.
+    /// </summary>
+    internal class AgentVectorActuator : VectorActuator
+    {
+        public AgentVectorActuator(IActionReceiver actionReceiver,
+            IHeuristicProvider heuristicProvider,
+            ActionSpec actionSpec,
+            string name = "VectorActuator"
+        ) : base(actionReceiver, heuristicProvider, actionSpec, name)
+        { }
+
+        public override BuiltInActuatorType GetBuiltInActuatorType()
+        {
+            return BuiltInActuatorType.AgentVectorActuator;
+        }
+    }
+
+    /// <summary>
    /// An agent is an actor that can observe its environment, decide on the
    /// best course of action using those observations, and execute those actions
    /// within the environment.
    /// [OnDisable()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnDisable.html]
    /// [OnBeforeSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnBeforeSerialize.html
    /// [OnAfterSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnAfterSerialize.html
-    /// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md
-    /// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design.md
+    /// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md
+    /// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design.md
-    /// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Readme.md
+    /// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Readme.md
-    [HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/" +
+    [HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/" +
        "docs/Learning-Environment-Design-Agents.md")]
    [Serializable]
    [RequireComponent(typeof(BehaviorParameters))]
        /// for information about mixing reward signals from curiosity and Generative Adversarial
        /// Imitation Learning (GAIL) with rewards supplied through this method.
        ///
-        /// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#rewards
-        /// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
+        /// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#rewards
+        /// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
        /// </remarks>
        /// <param name="reward">The new value of the reward.</param>
        public void SetReward(float reward)
        /// for information about mixing reward signals from curiosity and Generative Adversarial
        /// Imitation Learning (GAIL) with rewards supplied through this method.
        ///
-        /// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#rewards
-        /// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
+        /// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#rewards
+        /// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
        ///</remarks>
        /// <param name="increment">Incremental reward value.</param>
        public void AddReward(float increment)
        /// implementing a simple heuristic function can aid in debugging agent actions and interactions
        /// with its environment.
        ///
-        /// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#recording-demonstrations
-        /// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#recording-demonstrations
+        /// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
        /// </remarks>
        /// <example>
            // Support legacy OnActionReceived
            // TODO don't set this up if the sizes are 0?
            var param = m_PolicyFactory.BrainParameters;
-            m_VectorActuator = new VectorActuator(this, this, param.ActionSpec);
+            m_VectorActuator = new AgentVectorActuator(this, this, param.ActionSpec);
            m_ActuatorManager = new ActuatorManager(attachedActuators.Length + 1);
            m_LegacyActionCache = new float[m_VectorActuator.TotalNumberOfActions()];
            m_LegacyHeuristicCache = new float[m_VectorActuator.TotalNumberOfActions()];
        /// For more information about observations, see [Observations and Sensors].
        ///
        /// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
-        /// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
+        /// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
        /// </remarks>
        public virtual void CollectObservations(VectorSensor sensor)
        {
        ///
        /// See [Agents - Actions] for more information on masking actions.
        ///
-        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// </remarks>
        /// <seealso cref="IActionReceiver.OnActionReceived"/>
        public virtual void WriteDiscreteActionMask(IDiscreteActionMask actionMask)
        ///
        /// For more information about implementing agent actions see [Agents - Actions].
        ///
-        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// </remarks>
        /// <param name="actions">
        /// Struct containing the buffers of actions to be executed at this step.
--- a/com.unity.ml-agents/Runtime/Analytics/Events.cs
+++ b/com.unity.ml-agents/Runtime/Analytics/Events.cs
        public int InferenceDevice;
        public List<EventObservationSpec> ObservationSpecs;
        public EventActionSpec ActionSpec;
+        public List<EventActuatorInfo> ActuatorInfos;
        public int MemorySize;
        public long TotalWeightSizeBytes;
        public string ModelHash;
                NumContinuousActions = actionSpec.NumContinuousActions,
                NumDiscreteActions = actionSpec.NumDiscreteActions,
                BranchSizes = branchSizes,
+            };
+        }
+    }
+
+    /// <summary>
+    /// Information about an actuator.
+    /// </summary>
+    [Serializable]
+    internal struct EventActuatorInfo
+    {
+        public int BuiltInActuatorType;
+        public int NumContinuousActions;
+        public int NumDiscreteActions;
+
+        public static EventActuatorInfo FromActuator(IActuator actuator)
+        {
+            BuiltInActuatorType builtInActuatorType = Actuators.BuiltInActuatorType.Unknown;
+            if (actuator is IBuiltInActuator builtInActuator)
+            {
+                builtInActuatorType = builtInActuator.GetBuiltInActuatorType();
+            }
+
+            var actionSpec = actuator.ActionSpec;
+
+            return new EventActuatorInfo
+            {
+                BuiltInActuatorType = (int)builtInActuatorType,
+                NumContinuousActions = actionSpec.NumContinuousActions,
+                NumDiscreteActions = actionSpec.NumDiscreteActions
            };
        }
    }
        public string BehaviorName;
        public List<EventObservationSpec> ObservationSpecs;
        public EventActionSpec ActionSpec;
+        public List<EventActuatorInfo> ActuatorInfos;

        /// <summary>
        /// This will be the same as TrainingEnvironmentInitializedEvent if available, but
--- a/com.unity.ml-agents/Runtime/Analytics/InferenceAnalytics.cs
+++ b/com.unity.ml-agents/Runtime/Analytics/InferenceAnalytics.cs
        /// <param name="inferenceDevice">Whether inference is being performed on the CPU or GPU</param>
        /// <param name="sensors">List of ISensors for the Agent. Used to generate information about the observation space.</param>
        /// <param name="actionSpec">ActionSpec for the Agent. Used to generate information about the action space.</param>
+        /// <param name="actuators">List of IActuators for the Agent. Used to generate information about the action space.</param>
        /// <returns></returns>
        public static void InferenceModelSet(
            NNModel nnModel,
-            ActionSpec actionSpec
+            ActionSpec actionSpec,
+            IList<IActuator> actuators
        )
        {
            // The event shouldn't be able to report if this is disabled but if we know we're not going to report
                return;
            }

-            var data = GetEventForModel(nnModel, behaviorName, inferenceDevice, sensors, actionSpec);
+            var data = GetEventForModel(nnModel, behaviorName, inferenceDevice, sensors, actionSpec, actuators);
-            //Debug.Log(JsonUtility.ToJson(data, true));
+            // Debug.Log(JsonUtility.ToJson(data, true));
 #if UNITY_EDITOR
            if (AnalyticsUtils.s_SendEditorAnalytics)
            {
        /// <param name="inferenceDevice"></param>
        /// <param name="sensors"></param>
        /// <param name="actionSpec"></param>
+        /// <param name="actuators"></param>
        /// <returns></returns>
        internal static InferenceEvent GetEventForModel(
            NNModel nnModel,
-            ActionSpec actionSpec
+            ActionSpec actionSpec,
+            IList<IActuator> actuators
        )
        {
            var barracudaModel = ModelLoader.Load(nnModel);
            foreach (var sensor in sensors)
            {
                inferenceEvent.ObservationSpecs.Add(EventObservationSpec.FromSensor(sensor));
+            }
+
+            inferenceEvent.ActuatorInfos = new List<EventActuatorInfo>(actuators.Count);
+            foreach (var actuator in actuators)
+            {
+                inferenceEvent.ActuatorInfos.Add(EventActuatorInfo.FromActuator(actuator));
            }

            inferenceEvent.TotalWeightSizeBytes = GetModelWeightSize(barracudaModel);
--- a/com.unity.ml-agents/Runtime/Analytics/TrainingAnalytics.cs
+++ b/com.unity.ml-agents/Runtime/Analytics/TrainingAnalytics.cs
        public static void RemotePolicyInitialized(
            string fullyQualifiedBehaviorName,
            IList<ISensor> sensors,
-            ActionSpec actionSpec
+            ActionSpec actionSpec,
+            IList<IActuator> actuators
        )
        {
            if (!IsAnalyticsEnabled())
                return;
            }

-            var data = GetEventForRemotePolicy(behaviorName, sensors, actionSpec);
+            var data = GetEventForRemotePolicy(behaviorName, sensors, actionSpec, actuators);
            // Note - to debug, use JsonUtility.ToJson on the event.
            // Debug.Log(
            //     $"Would send event {k_RemotePolicyInitializedEventName} with body {JsonUtility.ToJson(data, true)}"
 #endif
        }

-        static RemotePolicyInitializedEvent GetEventForRemotePolicy(
+        internal static RemotePolicyInitializedEvent GetEventForRemotePolicy(
-            ActionSpec actionSpec)
+            ActionSpec actionSpec,
+            IList<IActuator> actuators
+        )
        {
            var remotePolicyEvent = new RemotePolicyInitializedEvent();

            foreach (var sensor in sensors)
            {
                remotePolicyEvent.ObservationSpecs.Add(EventObservationSpec.FromSensor(sensor));
+            }
+
+            remotePolicyEvent.ActuatorInfos = new List<EventActuatorInfo>(actuators.Count);
+            foreach (var actuator in actuators)
+            {
+                remotePolicyEvent.ActuatorInfos.Add(EventActuatorInfo.FromActuator(actuator));
            }

            remotePolicyEvent.MLAgentsEnvsVersion = s_TrainerPackageVersion;
--- a/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
+++ b/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
 #if UNITY_EDITOR || UNITY_STANDALONE_WIN || UNITY_STANDALONE_OSX || UNITY_STANDALONE_LINUX
+#define MLA_SUPPORTED_TRAINING_PLATFORM
+#endif
+
+# if MLA_SUPPORTED_TRAINING_PLATFORM
 using Grpc.Core;
 #if UNITY_EDITOR
 using UnityEditor;
        /// <param name="initParametersOut">The External Initialization Parameters received.</param>
        public bool Initialize(CommunicatorInitParameters initParameters, out UnityRLInitParameters initParametersOut)
        {
+#if MLA_SUPPORTED_TRAINING_PLATFORM
            var academyParameters = new UnityRLInitializationOutputProto
            {
                Name = initParameters.name,
            UpdateEnvironmentWithInput(input.RlInput);
            initParametersOut = initializationInput.RlInitializationInput.ToUnityRLInitParameters();
            return true;
+#else
+            initParametersOut = new UnityRLInitParameters();
+            return false;
+#endif
        }

        /// <summary>
--- a/com.unity.ml-agents/Runtime/Constants.cs
+++ b/com.unity.ml-agents/Runtime/Constants.cs
    internal enum MenuGroup
    {
        Default = 0,
-        Sensors = 50
+        Sensors = 50,
+        Actuators = 100
    }
 }
--- a/com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
+++ b/com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
    /// See [Imitation Learning - Recording Demonstrations] for more information.
    ///
    /// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
-    /// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs//Learning-Environment-Design-Agents.md#recording-demonstrations
+    /// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs//Learning-Environment-Design-Agents.md#recording-demonstrations
    /// </remarks>
    [RequireComponent(typeof(Agent))]
    [AddComponentMenu("ML Agents/Demonstration Recorder", (int)MenuGroup.Default)]
--- a/com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
+++ b/com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
        ///
        /// See [Agents - Actions] for more information on masking actions.
        ///
-        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/Learning-Environment-Design-Agents.md#actions
+        /// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/docs/Learning-Environment-Design-Agents.md#actions
        /// </remarks>
        /// <param name="branch">The branch for which the actions will be masked.</param>
        /// <param name="actionIndices">The indices of the masked actions.</param>
--- a/com.unity.ml-agents/Runtime/Policies/BarracudaPolicy.cs
+++ b/com.unity.ml-agents/Runtime/Policies/BarracudaPolicy.cs
        private string m_BehaviorName;

        /// <summary>
+        /// List of actuators, only used for analytics
+        /// </summary>
+        private IList<IActuator> m_Actuators;
+
+        /// <summary>
        /// Whether or not we've tried to send analytics for this model. We only ever try to send once per policy,
        /// and do additional deduplication in the analytics code.
        /// </summary>
        public BarracudaPolicy(
            ActionSpec actionSpec,
+            IList<IActuator> actuators,
            NNModel model,
            InferenceDevice inferenceDevice,
            string behaviorName
            m_ModelRunner = modelRunner;
            m_BehaviorName = behaviorName;
            m_ActionSpec = actionSpec;
+            m_Actuators = actuators;
        }

        /// <inheritdoc />
                    m_BehaviorName,
                    m_ModelRunner.InferenceDevice,
                    sensors,
-                    m_ActionSpec
+                    m_ActionSpec,
+                    m_Actuators
                );
            }
            m_AgentId = info.episodeId;
--- a/com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs
+++ b/com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs
                                "Either assign a model, or change to a different Behavior Type."
                            );
                        }
-                        return new BarracudaPolicy(actionSpec, m_Model, m_InferenceDevice, m_BehaviorName);
+                        return new BarracudaPolicy(actionSpec, actuatorManager, m_Model, m_InferenceDevice, m_BehaviorName);
-                        return new RemotePolicy(actionSpec, FullyQualifiedBehaviorName);
+                        return new RemotePolicy(actionSpec, actuatorManager, FullyQualifiedBehaviorName);
-                        return new BarracudaPolicy(actionSpec, m_Model, m_InferenceDevice, m_BehaviorName);
+                        return new BarracudaPolicy(actionSpec, actuatorManager, m_Model, m_InferenceDevice, m_BehaviorName);
                    }
                    else
                    {
--- a/com.unity.ml-agents/Runtime/Policies/RemotePolicy.cs
+++ b/com.unity.ml-agents/Runtime/Policies/RemotePolicy.cs

        internal ICommunicator m_Communicator;

+        /// <summary>
+        /// List of actuators, only used for analytics
+        /// </summary>
+        private IList<IActuator> m_Actuators;
+
+            IList<IActuator> actuators,
            string fullyQualifiedBehaviorName)
        {
            m_FullyQualifiedBehaviorName = fullyQualifiedBehaviorName;
+            m_Actuators = actuators;
        }

        /// <inheritdoc />
                TrainingAnalytics.RemotePolicyInitialized(
                    m_FullyQualifiedBehaviorName,
                    sensors,
-                    m_ActionSpec
+                    m_ActionSpec,
+                    m_Actuators
                );
            }
            m_AgentId = info.episodeId;
--- a/com.unity.ml-agents/Runtime/Sensors/IBuiltInSensor.cs
+++ b/com.unity.ml-agents/Runtime/Sensors/IBuiltInSensor.cs
    /// </summary>
    public enum BuiltInSensorType
    {
+        /// <summary>
+        /// Default Sensor type if it cannot be determined.
+        /// </summary>
+        /// <summary>
+        /// The Vector sensor used by the agent.
+        /// </summary>
-        // Note that StackingSensor actually returns the wrapped sensor's type
+        /// <summary>
+        /// The Stacking Sensor type. NOTE: StackingSensor actually returns the wrapped sensor's type.
+        /// </summary>
+        /// <summary>
+        /// The RayPerception Sensor types, both 3D and 2D.
+        /// </summary>
+        /// <summary>
+        /// The observable attribute sensor type.
+        /// </summary>
+        /// <summary>
+        /// Sensors that use the Camera for observations.
+        /// </summary>
+        /// <summary>
+        /// Sensors that use RenderTextures for observations.
+        /// </summary>
+        /// <summary>
+        /// Sensors that use buffers or tensors for observations.
+        /// </summary>
+        /// <summary>
+        /// The sensors that observe properties of rigid bodies.
+        /// </summary>
+        /// <summary>
+        /// The sensors that observe Match 3 boards.
+        /// </summary>
+        /// <summary>
+        /// Sensors that break down the world into a grid of colliders to observe an area at a pre-defined granularity.
+        /// </summary>
        GridSensor = 10
    }

    /// </summary>
-    public interface IBuiltInSensor
+    internal interface IBuiltInSensor
    {
        /// <summary>
        /// Return the corresponding BuiltInSensorType for the sensor.
    }
-
-
 }
--- a/com.unity.ml-agents/Runtime/Sensors/ObservationWriter.cs
+++ b/com.unity.ml-agents/Runtime/Sensors/ObservationWriter.cs
            }
        }

+        /// <summary>
+        /// Write the list of floats.
+        /// </summary>
+        /// <param name="data">The actual list of floats to write.</param>
+        /// <param name="writeOffset">Optional write offset to start writing from.</param>
        public void AddList(IList<float> data, int writeOffset = 0)
        {
            if (m_Data != null)
                    var val = data[index];
                    m_Data[index + m_Offset + writeOffset] = val;
-
                }
            }
            else
--- a/com.unity.ml-agents/Runtime/SideChannels/TrainingAnalyticsSideChannel.cs
+++ b/com.unity.ml-agents/Runtime/SideChannels/TrainingAnalyticsSideChannel.cs

 namespace Unity.MLAgents.SideChannels
 {
-    public class TrainingAnalyticsSideChannel : SideChannel
+    /// <summary>
+    /// Side Channel implementation for recording which training features are being used.
+    /// </summary>
+    internal class TrainingAnalyticsSideChannel : SideChannel
    {
        const string k_TrainingAnalyticsConfigId = "b664a4a9-d86f-5a5f-95cb-e8353a7e8356";

--- a/com.unity.ml-agents/Tests/Editor/Analytics/InferenceAnalyticsTests.cs
+++ b/com.unity.ml-agents/Tests/Editor/Analytics/InferenceAnalyticsTests.cs
+using System;
 using System.Collections.Generic;
 using NUnit.Framework;
 using Unity.MLAgents.Sensors;
        {
            var sensors = new List<ISensor> { sensor_21_20_3.Sensor, sensor_20_22_3.Sensor };
            var behaviorName = "continuousModel";
+            var actionSpec = GetContinuous2vis8vec2actionActionSpec();
+
+            var vectorActuator = new VectorActuator(null, actionSpec, "test'");
+            var actuators = new IActuator[] { vectorActuator };
-                InferenceDevice.CPU, sensors, GetContinuous2vis8vec2actionActionSpec()
+                InferenceDevice.CPU, sensors, actionSpec,
+                actuators
            );

            // The behavior name should be hashed, not pass-through.
            Assert.AreEqual((int)DimensionProperty.None, continuousEvent.ObservationSpecs[0].DimensionInfos[2].Flags);
            Assert.AreEqual("None", continuousEvent.ObservationSpecs[0].CompressionType);
            Assert.AreEqual(Test3DSensor.k_BuiltInSensorType, continuousEvent.ObservationSpecs[0].BuiltInSensorType);
+            Assert.AreEqual((int)BuiltInActuatorType.VectorActuator, continuousEvent.ActuatorInfos[0].BuiltInActuatorType);
            Assert.AreNotEqual(null, continuousEvent.ModelHash);

            // Make sure nested fields get serialized
            Assert.IsTrue(jsonString.Contains("NumDiscreteActions"));
            Assert.IsTrue(jsonString.Contains("SensorName"));
            Assert.IsTrue(jsonString.Contains("Flags"));
+            Assert.IsTrue(jsonString.Contains("ActuatorInfos"));
        }

        [Test]
            using (new AnalyticsUtils.DisableAnalyticsSending())
            {
                var sensors = new List<ISensor> { sensor_21_20_3.Sensor, sensor_20_22_3.Sensor };
-                var policy = new BarracudaPolicy(GetContinuous2vis8vec2actionActionSpec(), continuousONNXModel, InferenceDevice.CPU, "testBehavior");
+                var policy = new BarracudaPolicy(
+                    GetContinuous2vis8vec2actionActionSpec(),
+                    Array.Empty<IActuator>(),
+                    continuousONNXModel,
+                    InferenceDevice.CPU,
+                    "testBehavior"
+                );
                policy.RequestDecision(new AgentInfo(), sensors);
            }
            Academy.Instance.Dispose();
--- a/com.unity.ml-agents/Tests/Editor/Analytics/TrainingAnalyticsTest.cs
+++ b/com.unity.ml-agents/Tests/Editor/Analytics/TrainingAnalyticsTest.cs
+using System;
-using UnityEngine;
-using Unity.Barracuda;
-using UnityEditor;

 namespace Unity.MLAgents.Tests.Analytics
 {
        }

        [Test]
+        public void TestRemotePolicyEvent()
+        {
+            var behaviorName = "testBehavior";
+            var sensor1 = new Test3DSensor("SensorA", 21, 20, 3);
+            var sensor2 = new Test3DSensor("SensorB", 20, 22, 3);
+            var sensors = new List<ISensor> { sensor1, sensor2 };
+
+            var actionSpec = ActionSpec.MakeContinuous(2);
+
+            var vectorActuator = new VectorActuator(null, actionSpec, "test'");
+            var actuators = new IActuator[] { vectorActuator };
+
+            var remotePolicyEvent = TrainingAnalytics.GetEventForRemotePolicy(behaviorName, sensors, actionSpec, actuators);
+
+            // The behavior name should be hashed, not pass-through.
+            Assert.AreNotEqual(behaviorName, remotePolicyEvent.BehaviorName);
+
+            Assert.AreEqual(2, remotePolicyEvent.ObservationSpecs.Count);
+            Assert.AreEqual(3, remotePolicyEvent.ObservationSpecs[0].DimensionInfos.Length);
+            Assert.AreEqual(20, remotePolicyEvent.ObservationSpecs[0].DimensionInfos[0].Size);
+            Assert.AreEqual("None", remotePolicyEvent.ObservationSpecs[0].CompressionType);
+            Assert.AreEqual(Test3DSensor.k_BuiltInSensorType, remotePolicyEvent.ObservationSpecs[0].BuiltInSensorType);
+
+            Assert.AreEqual(2, remotePolicyEvent.ActionSpec.NumContinuousActions);
+            Assert.AreEqual(0, remotePolicyEvent.ActionSpec.NumDiscreteActions);
+
+            Assert.AreEqual(2, remotePolicyEvent.ActuatorInfos[0].NumContinuousActions);
+            Assert.AreEqual(0, remotePolicyEvent.ActuatorInfos[0].NumDiscreteActions);
+
+        }
+
+        [Test]
        public void TestRemotePolicy()
        {
            if (Academy.IsInitialized)
            using (new AnalyticsUtils.DisableAnalyticsSending())
            {
                var actionSpec = ActionSpec.MakeContinuous(3);
-                var policy = new RemotePolicy(actionSpec, "TestBehavior?team=42");
+                var policy = new RemotePolicy(actionSpec, Array.Empty<IActuator>(), "TestBehavior?team=42");
                policy.RequestDecision(new AgentInfo(), new List<ISensor>());
            }

--- a/config/ppo/Match3.yaml
+++ b/config/ppo/Match3.yaml
+default_settings:
+  trainer_type: ppo
+  hyperparameters:
+    batch_size: 16
+    buffer_size: 120
+    learning_rate: 0.0003
+    beta: 0.005
+    epsilon: 0.2
+    lambd: 0.99
+    num_epoch: 3
+    learning_rate_schedule: constant
+  network_settings:
+    normalize: true
+    hidden_units: 256
+    num_layers: 4
+    vis_encode_type: match3
+  reward_signals:
+    extrinsic:
+      gamma: 0.99
+      strength: 1.0
+  keep_checkpoints: 5
+  max_steps: 5000000
+  time_horizon: 128
+  summary_freq: 10000
+  threaded: true
+
-  Match3VectorObs:
-    trainer_type: ppo
-    hyperparameters:
-      batch_size: 64
-      buffer_size: 12000
-      learning_rate: 0.0003
-      beta: 0.001
-      epsilon: 0.2
-      lambd: 0.99
-      num_epoch: 3
-      learning_rate_schedule: constant
-    network_settings:
-      normalize: true
-      hidden_units: 128
-      num_layers: 2
-      vis_encode_type: match3
-    reward_signals:
-      extrinsic:
-        gamma: 0.99
-        strength: 1.0
-    keep_checkpoints: 5
-    max_steps: 5000000
-    time_horizon: 1000
-    summary_freq: 10000
-    threaded: true
-  Match3VisualObs:
-    trainer_type: ppo
-    hyperparameters:
-      batch_size: 64
-      buffer_size: 12000
-      learning_rate: 0.0003
-      beta: 0.001
-      epsilon: 0.2
-      lambd: 0.99
-      num_epoch: 3
-      learning_rate_schedule: constant
-    network_settings:
-      normalize: true
-      hidden_units: 128
-      num_layers: 2
-      vis_encode_type: match3
-    reward_signals:
-      extrinsic:
-        gamma: 0.99
-        strength: 1.0
-    keep_checkpoints: 5
-    max_steps: 5000000
-    time_horizon: 1000
-    summary_freq: 10000
-    threaded: true
-      batch_size: 64
-      buffer_size: 128
+      batch_size: 16
+      buffer_size: 120
    network_settings:
      hidden_units: 4
      num_layers: 1
-  Match3GreedyHeuristic:
+  Match3SmartHeuristic:
-      batch_size: 64
-      buffer_size: 128
+      batch_size: 16
+      buffer_size: 120
    network_settings:
      hidden_units: 4
      num_layers: 1
--- a/config/ppo/PyramidsRND.yaml
+++ b/config/ppo/PyramidsRND.yaml
        strength: 0.01
        network_settings:
          hidden_units: 64
+          num_layers: 3
        learning_rate: 0.0001
    keep_checkpoints: 5
    max_steps: 3000000
--- a/docs/Installation-Anaconda-Windows.md
+++ b/docs/Installation-Anaconda-Windows.md
 the ml-agents Conda environment by typing `activate ml-agents`)_:

 ```sh
-git clone --branch release_12 https://github.com/Unity-Technologies/ml-agents.git
+git clone --branch release_13 https://github.com/Unity-Technologies/ml-agents.git
-The `--branch release_12` option will switch to the tag of the latest stable
+The `--branch release_13` option will switch to the tag of the latest stable
 release. Omitting that will get the `master` branch which is potentially
 unstable.

--- a/docs/Installation.md
+++ b/docs/Installation.md
 of our tutorials / guides assume you have access to our example environments).

 ```sh
-git clone --branch release_12 https://github.com/Unity-Technologies/ml-agents.git
+git clone --branch release_13 https://github.com/Unity-Technologies/ml-agents.git
-The `--branch release_12` option will switch to the tag of the latest stable
+The `--branch release_13` option will switch to the tag of the latest stable
 release. Omitting that will get the `master` branch which is potentially
 unstable.

 ML-Agents Toolkit for your purposes. If you plan to contribute those changes
-back, make sure to clone the `master` branch (by omitting `--branch release_12`
+back, make sure to clone the `master` branch (by omitting `--branch release_13`
 from the command above). See our
 [Contributions Guidelines](../com.unity.ml-agents/CONTRIBUTING.md) for more
 information on contributing to the ML-Agents Toolkit.
--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md
  - Observations and actions are defined with a sensor and actuator respectively.
 - Float Properties: None
 - Benchmark Mean Reward:
-  - 37.2 for visual observations
-  - 37.6 for vector observations
+  - 39.5 for visual observations
+  - 38.5 for vector observations
  - 34.2 for simple heuristic (pick a random valid move)
  - 37.0 for greedy heuristic (pick the highest-scoring valid move)

--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
 - [Model Types](#model-types)
  - [Learning from Vector Observations](#learning-from-vector-observations)
  - [Learning from Cameras using Convolutional Neural Networks](#learning-from-cameras-using-convolutional-neural-networks)
+  - [Learning from Variable Length Observations using Attention](#learning-from-ariable-length-observations-using-attention)
  - [Memory-enhanced Agents using Recurrent Neural Networks](#memory-enhanced-agents-using-recurrent-neural-networks)
 - [Additional Features](#additional-features)
 - [Summary and Next Steps](#summary-and-next-steps)

 Regardless of the training method deployed, there are a few model types that
 users can train using the ML-Agents Toolkit. This is due to the flexibility in
-defining agent observations, which can include vector, ray cast and visual
+defining agent observations, which include vector, ray cast and visual
 observations. You can learn more about how to instrument an agent's observation
 in the [Designing Agents](Learning-Environment-Design-Agents.md) guide.


 The choice of the architecture depends on the visual complexity of the scene and
 the available computational resources.
+
+### Learning from Variable Length Observations using Attention
+
+Using the ML-Agents Toolkit, it is possible to have agents learn from a
+varying number of inputs. To do so, each agent can keep track of a buffer
+of vector observations. At each step, the agent will go through all the
+elements in the buffer and extract information but the elements
+in the buffer can change at every step.
+This can be useful in scenarios in which the agents must keep track of
+a varying number of elements throughout the episode. For example in a game
+where an agent must learn to avoid projectiles, but the projectiles can vary in
+numbers.
+
+![Variable Length Observations Illustrated](images/variable-length-observation-illustrated.png)
+
+You can learn more about variable length observations
+[here](Learning-Environment-Design-Agents.md#variable-length-observations).
+When variable length observations are utilized, the ML-Agents Toolkit
+leverages attention networks to learn from a varying number of entities.
+Agents using attention will ignore entities that are deemed not relevant
+and pay special attention to entities relevant to the current situation
+based on context.

 ### Memory-enhanced Agents using Recurrent Neural Networks

--- a/docs/Training-on-Amazon-Web-Service.md
+++ b/docs/Training-on-Amazon-Web-Service.md
 2. Clone the ML-Agents repo and install the required Python packages

   ```sh
-   git clone --branch release_12 https://github.com/Unity-Technologies/ml-agents.git
+   git clone --branch release_13 https://github.com/Unity-Technologies/ml-agents.git
   cd ml-agents/ml-agents/
   pip3 install -e .
   ```
--- a/docs/Unity-Inference-Engine.md
+++ b/docs/Unity-Inference-Engine.md
 loading expects certain conventions for constants and tensor names. While it is
 possible to construct a model that follows these conventions, we don't provide
 any additional help for this. More details can be found in
-[TensorNames.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/com.unity.ml-agents/Runtime/Inference/TensorNames.cs)
+[TensorNames.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/com.unity.ml-agents/Runtime/Inference/TensorNames.cs)
-[BarracudaModelParamLoader.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/com.unity.ml-agents/Runtime/Inference/BarracudaModelParamLoader.cs).
+[BarracudaModelParamLoader.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_13_docs/com.unity.ml-agents/Runtime/Inference/BarracudaModelParamLoader.cs).

 If you wish to run inference on an externally trained model, you should use
 Barracuda directly, instead of trying to run it through ML-Agents.
--- a/ml-agents/mlagents/trainers/action_info.py
+++ b/ml-agents/mlagents/trainers/action_info.py


 class ActionInfo(NamedTuple):
+    """
+    A NamedTuple containing actions and related quantities to the policy forward
+    pass. Additionally contains the agent ids in the corresponding DecisionStep
+    :param action: The action output of the policy
+    :param env_action: The possibly clipped action to be executed in the environment
+    :param outputs: Dict of all quantities associated with the policy forward pass
+    :param agent_ids: List of int agent ids in DecisionStep
+    """
+
-    value: Any
-        return ActionInfo([], [], [], {}, [])
+        return ActionInfo([], [], {}, [])
--- a/ml-agents/mlagents/trainers/agent_processor.py
+++ b/ml-agents/mlagents/trainers/agent_processor.py
        if stored_decision_step is not None and stored_take_action_outputs is not None:
            obs = stored_decision_step.obs
            if self.policy.use_recurrent:
-                memory = self.policy.retrieve_memories([global_agent_id])[0, :]
+                memory = self.policy.retrieve_previous_memories([global_agent_id])[0, :]
            else:
                memory = None
            done = terminated  # Since this is an ongoing step
--- a/ml-agents/mlagents/trainers/buffer.py
+++ b/ml-agents/mlagents/trainers/buffer.py
    ENVIRONMENT_REWARDS = "environment_rewards"
    MASKS = "masks"
    MEMORY = "memory"
+    CRITIC_MEMORY = "critic_memory"
    PREV_ACTION = "prev_action"

    ADVANTAGES = "advantages"
--- a/ml-agents/mlagents/trainers/optimizer/torch_optimizer.py
+++ b/ml-agents/mlagents/trainers/optimizer/torch_optimizer.py
 from typing import Dict, Optional, Tuple, List
 from mlagents.torch_utils import torch
 import numpy as np
+import math
-from mlagents.trainers.buffer import AgentBuffer
+from mlagents.trainers.buffer import AgentBuffer, AgentBufferField
 from mlagents.trainers.trajectory import ObsUtil
 from mlagents.trainers.torch.components.bc.module import BCModule
 from mlagents.trainers.torch.components.reward_providers import create_reward_provider
        self.global_step = torch.tensor(0)
        self.bc_module: Optional[BCModule] = None
        self.create_reward_signals(trainer_settings.reward_signals)
+        self.critic_memory_dict: Dict[str, torch.Tensor] = {}
        if trainer_settings.behavioral_cloning is not None:
            self.bc_module = BCModule(
                self.policy,
                default_num_epoch=3,
            )

+    @property
+    def critic(self):
+        raise NotImplementedError
+
    def update(self, batch: AgentBuffer, num_sequences: int) -> Dict[str, float]:
        pass

                reward_signal, self.policy.behavior_spec, settings
            )

+    def _evaluate_by_sequence(
+        self, tensor_obs: List[torch.Tensor], initial_memory: np.ndarray
+    ) -> Tuple[Dict[str, torch.Tensor], AgentBufferField, torch.Tensor]:
+        """
+        Evaluate a trajectory sequence-by-sequence, assembling the result. This enables us to get the
+        intermediate memories for the critic.
+        :param tensor_obs: A List of tensors of shape (trajectory_len, <obs_dim>) that are the agent's
+            observations for this trajectory.
+        :param initial_memory: The memory that preceeds this trajectory. Of shape (1,1,<mem_size>), i.e.
+            what is returned as the output of a MemoryModules.
+        :return: A Tuple of the value estimates as a Dict of [name, tensor], an AgentBufferField of the initial
+            memories to be used during value function update, and the final memory at the end of the trajectory.
+        """
+        num_experiences = tensor_obs[0].shape[0]
+        all_next_memories = AgentBufferField()
+        # In the buffer, the 1st sequence are the ones that are padded. So if seq_len = 3 and
+        # trajectory is of length 10, the 1st sequence is [pad,pad,obs].
+        # Compute the number of elements in this padded seq.
+        leftover = num_experiences % self.policy.sequence_length
+
+        # Compute values for the potentially truncated initial sequence
+        seq_obs = []
+
+        first_seq_len = self.policy.sequence_length
+        for _obs in tensor_obs:
+            if leftover > 0:
+                first_seq_len = leftover
+            first_seq_obs = _obs[0:first_seq_len]
+            seq_obs.append(first_seq_obs)
+
+        # For the first sequence, the initial memory should be the one at the
+        # beginning of this trajectory.
+        for _ in range(first_seq_len):
+            all_next_memories.append(initial_memory.squeeze().detach().numpy())
+
+        init_values, _mem = self.critic.critic_pass(
+            seq_obs, initial_memory, sequence_length=first_seq_len
+        )
+        all_values = {
+            signal_name: [init_values[signal_name]]
+            for signal_name in init_values.keys()
+        }
+
+        # Evaluate other trajectories, carrying over _mem after each
+        # trajectory
+        for seq_num in range(
+            1, math.ceil((num_experiences) / (self.policy.sequence_length))
+        ):
+            seq_obs = []
+            for _ in range(self.policy.sequence_length):
+                all_next_memories.append(_mem.squeeze().detach().numpy())
+            for _obs in tensor_obs:
+                start = seq_num * self.policy.sequence_length - (
+                    self.policy.sequence_length - leftover
+                )
+                end = (seq_num + 1) * self.policy.sequence_length - (
+                    self.policy.sequence_length - leftover
+                )
+                seq_obs.append(_obs[start:end])
+            values, _mem = self.critic.critic_pass(
+                seq_obs, _mem, sequence_length=self.policy.sequence_length
+            )
+            for signal_name, _val in values.items():
+                all_values[signal_name].append(_val)
+        # Create one tensor per reward signal
+        all_value_tensors = {
+            signal_name: torch.cat(value_list, dim=0)
+            for signal_name, value_list in all_values.items()
+        }
+        next_mem = _mem
+        return all_value_tensors, all_next_memories, next_mem
+
-        self, batch: AgentBuffer, next_obs: List[np.ndarray], done: bool
-    ) -> Tuple[Dict[str, np.ndarray], Dict[str, float]]:
+        self,
+        batch: AgentBuffer,
+        next_obs: List[np.ndarray],
+        done: bool,
+        agent_id: str = "",
+    ) -> Tuple[Dict[str, np.ndarray], Dict[str, float], Optional[AgentBufferField]]:
+        """
+        Get value estimates and memories for a trajectory, in batch form.
+        :param batch: An AgentBuffer that consists of a trajectory.
+        :param next_obs: the next observation (after the trajectory). Used for boostrapping
+            if this is not a termiinal trajectory.
+        :param done: Set true if this is a terminal trajectory.
+        :param agent_id: Agent ID of the agent that this trajectory belongs to.
+        :returns: A Tuple of the Value Estimates as a Dict of [name, np.ndarray(trajectory_len)],
+            the final value estimate as a Dict of [name, float], and optionally (if using memories)
+            an AgentBufferField of initial critic memories to be used during update.
+        """
-        current_obs = ObsUtil.from_buffer(batch, n_obs)
+
+        if agent_id in self.critic_memory_dict:
+            memory = self.critic_memory_dict[agent_id]
+        else:
+            memory = (
+                torch.zeros((1, 1, self.critic.memory_size))
+                if self.policy.use_recurrent
+                else None
+            )
-        current_obs = [ModelUtils.list_to_tensor(obs) for obs in current_obs]
+        current_obs = [
+            ModelUtils.list_to_tensor(obs) for obs in ObsUtil.from_buffer(batch, n_obs)
+        ]
-        memory = torch.zeros([1, 1, self.policy.m_size])
+        next_obs = [obs.unsqueeze(0) for obs in next_obs]
-        next_obs = [obs.unsqueeze(0) for obs in next_obs]
+        # If we're using LSTM, we want to get all the intermediate memories.
+        all_next_memories: Optional[AgentBufferField] = None
+        if self.policy.use_recurrent:
+            (
+                value_estimates,
+                all_next_memories,
+                next_memory,
+            ) = self._evaluate_by_sequence(current_obs, memory)
+        else:
+            value_estimates, next_memory = self.critic.critic_pass(
+                current_obs, memory, sequence_length=batch.num_experiences
+            )
-        value_estimates, next_memory = self.policy.actor_critic.critic_pass(
-            current_obs, memory, sequence_length=batch.num_experiences
-        )
+        # Store the memory for the next trajectory
+        self.critic_memory_dict[agent_id] = next_memory
-        next_value_estimate, _ = self.policy.actor_critic.critic_pass(
+        next_value_estimate, _ = self.critic.critic_pass(
            next_obs, next_memory, sequence_length=1
        )

            for k in next_value_estimate:
                if not self.reward_signals[k].ignore_done:
                    next_value_estimate[k] = 0.0
-
-        return value_estimates, next_value_estimate
+            if agent_id in self.critic_memory_dict:
+                self.critic_memory_dict.pop(agent_id)
+        return value_estimates, next_value_estimate, all_next_memories
--- a/ml-agents/mlagents/trainers/policy/policy.py
+++ b/ml-agents/mlagents/trainers/policy/policy.py
        self.network_settings: NetworkSettings = trainer_settings.network_settings
        self.seed = seed
        self.previous_action_dict: Dict[str, np.ndarray] = {}
+        self.previous_memory_dict: Dict[str, np.ndarray] = {}
        self.memory_dict: Dict[str, np.ndarray] = {}
        self.normalize = trainer_settings.network_settings.normalize
        self.use_recurrent = self.network_settings.memory is not None
        if memory_matrix is None:
            return

+        # Pass old memories into previous_memory_dict
+        for agent_id in agent_ids:
+            if agent_id in self.memory_dict:
+                self.previous_memory_dict[agent_id] = self.memory_dict[agent_id]
+
        for index, agent_id in enumerate(agent_ids):
            self.memory_dict[agent_id] = memory_matrix[index, :]

                memory_matrix[index, :] = self.memory_dict[agent_id]
        return memory_matrix

+    def retrieve_previous_memories(self, agent_ids: List[str]) -> np.ndarray:
+        memory_matrix = np.zeros((len(agent_ids), self.m_size), dtype=np.float32)
+        for index, agent_id in enumerate(agent_ids):
+            if agent_id in self.previous_memory_dict:
+                memory_matrix[index, :] = self.previous_memory_dict[agent_id]
+        return memory_matrix
+
+            if agent_id in self.previous_memory_dict:
+                self.previous_memory_dict.pop(agent_id)

    def make_empty_previous_action(self, num_agents: int) -> np.ndarray:
        """
--- a/ml-agents/mlagents/trainers/policy/torch_policy.py
+++ b/ml-agents/mlagents/trainers/policy/torch_policy.py
 from mlagents_envs.timers import timed

 from mlagents.trainers.settings import TrainerSettings
-from mlagents.trainers.torch.networks import (
-    SharedActorCritic,
-    SeparateActorCritic,
-    GlobalSteps,
-)
+from mlagents.trainers.torch.networks import SimpleActor, SharedActorCritic, GlobalSteps

 from mlagents.trainers.torch.utils import ModelUtils
 from mlagents.trainers.buffer import AgentBuffer
        )  # could be much simpler if TorchPolicy is nn.Module
        self.grads = None

-        reward_signal_configs = trainer_settings.reward_signals
-        reward_signal_names = [key.value for key, _ in reward_signal_configs.items()]
-
-            ac_class = SeparateActorCritic
+            self.actor = SimpleActor(
+                observation_specs=self.behavior_spec.observation_specs,
+                network_settings=trainer_settings.network_settings,
+                action_spec=behavior_spec.action_spec,
+                conditional_sigma=self.condition_sigma_on_obs,
+                tanh_squash=tanh_squash,
+            )
+            self.shared_critic = False
-            ac_class = SharedActorCritic
-        self.actor_critic = ac_class(
-            observation_specs=self.behavior_spec.observation_specs,
-            network_settings=trainer_settings.network_settings,
-            action_spec=behavior_spec.action_spec,
-            stream_names=reward_signal_names,
-            conditional_sigma=self.condition_sigma_on_obs,
-            tanh_squash=tanh_squash,
-        )
+            reward_signal_configs = trainer_settings.reward_signals
+            reward_signal_names = [
+                key.value for key, _ in reward_signal_configs.items()
+            ]
+            self.actor = SharedActorCritic(
+                observation_specs=self.behavior_spec.observation_specs,
+                network_settings=trainer_settings.network_settings,
+                action_spec=behavior_spec.action_spec,
+                stream_names=reward_signal_names,
+                conditional_sigma=self.condition_sigma_on_obs,
+                tanh_squash=tanh_squash,
+            )
+            self.shared_critic = True
+
-        self.m_size = self.actor_critic.memory_size
+        self.m_size = self.actor.memory_size
-        self.actor_critic.to(default_device())
+        self.actor.to(default_device())
        self._clip_action = not tanh_squash

    @property
        """

        if self.normalize:
-            self.actor_critic.update_normalization(buffer)
+            self.actor.update_normalization(buffer)

    @timed
    def sample_actions(
        :param seq_len: Sequence length when using RNN.
        :return: Tuple of AgentAction, ActionLogProbs, entropies, and output memories.
        """
-        actions, log_probs, entropies, memories = self.actor_critic.get_action_stats(
+        actions, log_probs, entropies, memories = self.actor.get_action_and_stats(
            obs, masks, memories, seq_len
        )
        return (actions, log_probs, entropies, memories)
        masks: Optional[torch.Tensor] = None,
        memories: Optional[torch.Tensor] = None,
        seq_len: int = 1,
-    ) -> Tuple[ActionLogProbs, torch.Tensor, Dict[str, torch.Tensor]]:
-        log_probs, entropies, value_heads = self.actor_critic.get_stats_and_value(
+    ) -> Tuple[ActionLogProbs, torch.Tensor]:
+        log_probs, entropies = self.actor.get_stats(
-        return log_probs, entropies, value_heads
+        return log_probs, entropies

    @timed
    def evaluate(
        return ActionInfo(
            action=run_out.get("action"),
            env_action=run_out.get("env_action"),
-            value=run_out.get("value"),
            outputs=run_out,
            agent_ids=list(decision_requests.agent_id),
        )
        return self.get_current_step()

    def load_weights(self, values: List[np.ndarray]) -> None:
-        self.actor_critic.load_state_dict(values)
+        self.actor.load_state_dict(values)
-        return copy.deepcopy(self.actor_critic.state_dict())
+        return copy.deepcopy(self.actor.state_dict())
-        return {"Policy": self.actor_critic, "global_step": self.global_step}
+        return {"Policy": self.actor, "global_step": self.global_step}
--- a/ml-agents/mlagents/trainers/ppo/optimizer_torch.py
+++ b/ml-agents/mlagents/trainers/ppo/optimizer_torch.py
 from typing import Dict, cast
-from mlagents.torch_utils import torch
+from mlagents.torch_utils import torch, default_device

 from mlagents.trainers.buffer import AgentBuffer, BufferKey, RewardSignalUtil

 from mlagents.trainers.settings import TrainerSettings, PPOSettings
+from mlagents.trainers.torch.networks import ValueNetwork
 from mlagents.trainers.torch.agent_action import AgentAction
 from mlagents.trainers.torch.action_log_probs import ActionLogProbs
 from mlagents.trainers.torch.utils import ModelUtils
        # Create the graph here to give more granular control of the TF graph to the Optimizer.

        super().__init__(policy, trainer_settings)
-        params = list(self.policy.actor_critic.parameters())
+        reward_signal_configs = trainer_settings.reward_signals
+        reward_signal_names = [key.value for key, _ in reward_signal_configs.items()]
+
+        if policy.shared_critic:
+            self._critic = policy.actor
+        else:
+            self._critic = ValueNetwork(
+                reward_signal_names,
+                policy.behavior_spec.observation_specs,
+                network_settings=trainer_settings.network_settings,
+            )
+            self._critic.to(default_device())
+
+        params = list(self.policy.actor.parameters()) + list(self._critic.parameters())
        self.hyperparameters: PPOSettings = cast(
            PPOSettings, trainer_settings.hyperparameters
        )

        self.stream_names = list(self.reward_signals.keys())

+    @property
+    def critic(self):
+        return self._critic
+
    def ppo_value_loss(
        self,
        values: Dict[str, torch.Tensor],
        if len(memories) > 0:
            memories = torch.stack(memories).unsqueeze(0)

-        log_probs, entropy, values = self.policy.evaluate_actions(
+        # Get value memories
+        value_memories = [
+            ModelUtils.list_to_tensor(batch[BufferKey.CRITIC_MEMORY][i])
+            for i in range(
+                0, len(batch[BufferKey.CRITIC_MEMORY]), self.policy.sequence_length
+            )
+        ]
+        if len(value_memories) > 0:
+            value_memories = torch.stack(value_memories).unsqueeze(0)
+
+        log_probs, entropy = self.policy.evaluate_actions(
+        )
+        values, _ = self.critic.critic_pass(
+            current_obs,
+            memories=value_memories,
+            sequence_length=self.policy.sequence_length,
        )
        old_log_probs = ActionLogProbs.from_buffer(batch).flatten()
        log_probs = log_probs.flatten()
--- a/ml-agents/mlagents/trainers/ppo/trainer.py
+++ b/ml-agents/mlagents/trainers/ppo/trainer.py
            self.policy.update_normalization(agent_buffer_trajectory)

        # Get all value estimates
-        value_estimates, value_next = self.optimizer.get_trajectory_value_estimates(
+        value_estimates, value_next, value_memories = self.optimizer.get_trajectory_value_estimates(
+        if value_memories is not None:
+            agent_buffer_trajectory[BufferKey.CRITIC_MEMORY].set(value_memories)

        for name, v in value_estimates.items():
            agent_buffer_trajectory[RewardSignalUtil.value_estimates_key(name)].extend(
--- a/ml-agents/mlagents/trainers/sac/optimizer_torch.py
+++ b/ml-agents/mlagents/trainers/sac/optimizer_torch.py

    def __init__(self, policy: TorchPolicy, trainer_params: TrainerSettings):
        super().__init__(policy, trainer_params)
+        reward_signal_configs = trainer_params.reward_signals
+        reward_signal_names = [key.value for key, _ in reward_signal_configs.items()]
+        if policy.shared_critic:
+            raise UnityTrainerException("SAC does not support SharedActorCritic")
+        self._critic = ValueNetwork(
+            reward_signal_names,
+            policy.behavior_spec.observation_specs,
+            policy.network_settings,
+        )
+
        hyperparameters: SACSettings = cast(SACSettings, trainer_params.hyperparameters)
        self.tau = hyperparameters.tau
        self.init_entcoef = hyperparameters.init_entcoef
        }
        self._action_spec = self.policy.behavior_spec.action_spec

-        self.value_network = TorchSACOptimizer.PolicyValueNetwork(
+        self.q_network = TorchSACOptimizer.PolicyValueNetwork(
            self.stream_names,
            self.policy.behavior_spec.observation_specs,
            policy_network_settings,
            self.policy.behavior_spec.observation_specs,
            policy_network_settings,
        )
-        ModelUtils.soft_update(
-            self.policy.actor_critic.critic, self.target_network, 1.0
-        )
+        ModelUtils.soft_update(self._critic, self.target_network, 1.0)

        # We create one entropy coefficient per action, whether discrete or continuous.
        _disc_log_ent_coef = torch.nn.Parameter(
        self.target_entropy = TorchSACOptimizer.TargetEntropy(
            continuous=_cont_target, discrete=_disc_target
        )
-        policy_params = list(self.policy.actor_critic.network_body.parameters()) + list(
-            self.policy.actor_critic.action_model.parameters()
-        )
-        value_params = list(self.value_network.parameters()) + list(
-            self.policy.actor_critic.critic.parameters()
+        policy_params = list(self.policy.actor.parameters())
+        value_params = list(self.q_network.parameters()) + list(
+            self._critic.parameters()
        )

        logger.debug("value_vars")
        )
        self._move_to_device(default_device())

+    @property
+    def critic(self):
+        return self._critic
+
-        self.value_network.to(device)
+        self._critic.to(device)
+        self.q_network.to(device)

    def sac_q_loss(
        self,
            for i in range(0, len(batch[BufferKey.MEMORY]), self.policy.sequence_length)
        ]
        # LSTM shouldn't have sequence length <1, but stop it from going out of the index if true.
+        value_memories_list = [
+            ModelUtils.list_to_tensor(batch[BufferKey.CRITIC_MEMORY][i])
+            for i in range(
+                0, len(batch[BufferKey.CRITIC_MEMORY]), self.policy.sequence_length
+            )
+        ]
-        next_memories_list = [
+        next_value_memories_list = [
-                batch[BufferKey.MEMORY][i][self.policy.m_size // 2 :]
+                batch[BufferKey.CRITIC_MEMORY][i]
-                offset, len(batch[BufferKey.MEMORY]), self.policy.sequence_length
+                offset, len(batch[BufferKey.CRITIC_MEMORY]), self.policy.sequence_length
-            next_memories = torch.stack(next_memories_list).unsqueeze(0)
+            value_memories = torch.stack(value_memories_list).unsqueeze(0)
+            next_value_memories = torch.stack(next_value_memories_list).unsqueeze(0)
-            next_memories = None
-        # Q network memories are 0'ed out, since we don't have them during inference.
+            value_memories = None
+            next_value_memories = None
+
+        # Q and V network memories are 0'ed out, since we don't have them during inference.
-            torch.zeros_like(next_memories) if next_memories is not None else None
+            torch.zeros_like(next_value_memories)
+            if next_value_memories is not None
+            else None
-        self.value_network.q1_network.network_body.copy_normalization(
-            self.policy.actor_critic.network_body
+        self.q_network.q1_network.network_body.copy_normalization(
+            self.policy.actor.network_body
-        self.value_network.q2_network.network_body.copy_normalization(
-            self.policy.actor_critic.network_body
+        self.q_network.q2_network.network_body.copy_normalization(
+            self.policy.actor.network_body
-            self.policy.actor_critic.network_body
+            self.policy.actor.network_body
-        (
-            sampled_actions,
-            log_probs,
-            _,
-            value_estimates,
-            _,
-        ) = self.policy.actor_critic.get_action_stats_and_value(
+        self._critic.network_body.copy_normalization(self.policy.actor.network_body)
+        sampled_actions, log_probs, _, _, = self.policy.actor.get_action_and_stats(
+        value_estimates, _ = self._critic.critic_pass(
+            current_obs, value_memories, sequence_length=self.policy.sequence_length
+        )
-        q1p_out, q2p_out = self.value_network(
+        q1p_out, q2p_out = self.q_network(
            current_obs,
            cont_sampled_actions,
            memories=q_memories,
-        q1_out, q2_out = self.value_network(
+        q1_out, q2_out = self.q_network(
            current_obs,
            cont_actions,
            memories=q_memories,
        with torch.no_grad():
            target_values, _ = self.target_network(
                next_obs,
-                memories=next_memories,
+                memories=next_value_memories,
                sequence_length=self.policy.sequence_length,
            )
        masks = ModelUtils.list_to_tensor(batch[BufferKey.MASKS], dtype=torch.bool)
        policy_loss = self.sac_policy_loss(log_probs, q1p_out, masks)
        entropy_loss = self.sac_entropy_loss(log_probs, masks)

-        total_value_loss = q1_loss + q2_loss + value_loss
+        total_value_loss = q1_loss + q2_loss
+        if self.policy.shared_critic:
+            policy_loss += value_loss
+        else:
+            total_value_loss += value_loss

        decay_lr = self.decay_learning_rate.get_value(self.policy.get_current_step())
        ModelUtils.update_learning_rate(self.policy_optimizer, decay_lr)
        self.entropy_optimizer.step()

        # Update target network
-        ModelUtils.soft_update(
-            self.policy.actor_critic.critic, self.target_network, self.tau
-        )
+        ModelUtils.soft_update(self._critic, self.target_network, self.tau)
        update_stats = {
            "Losses/Policy Loss": policy_loss.item(),
            "Losses/Value Loss": value_loss.item(),

    def get_modules(self):
        modules = {
-            "Optimizer:value_network": self.value_network,
+            "Optimizer:value_network": self.q_network,
            "Optimizer:target_network": self.target_network,
            "Optimizer:policy_optimizer": self.policy_optimizer,
            "Optimizer:value_optimizer": self.value_optimizer,
--- a/ml-agents/mlagents/trainers/sac/trainer.py
+++ b/ml-agents/mlagents/trainers/sac/trainer.py
            self.collected_rewards[name][agent_id] += np.sum(evaluate_result)

        # Get all value estimates for reporting purposes
-        value_estimates, _ = self.optimizer.get_trajectory_value_estimates(
+        (
+            value_estimates,
+            _,
+            value_memories,
+        ) = self.optimizer.get_trajectory_value_estimates(
+        if value_memories is not None:
+            agent_buffer_trajectory[BufferKey.CRITIC_MEMORY].set(value_memories)
+
        for name, v in value_estimates.items():
            self._stats_reporter.add_stat(
                f"Policy/{self.optimizer.reward_signals[name].name.capitalize()} Value",
--- a/ml-agents/mlagents/trainers/tests/test_agent_processor.py
+++ b/ml-agents/mlagents/trainers/tests/test_agent_processor.py
 def create_mock_policy():
    mock_policy = mock.Mock()
    mock_policy.reward_signals = {}
-    mock_policy.retrieve_memories.return_value = np.zeros((1, 1), dtype=np.float32)
+    mock_policy.retrieve_previous_memories.return_value = np.zeros(
+        (1, 1), dtype=np.float32
+    )
    mock_policy.retrieve_previous_action.return_value = np.zeros((1, 1), dtype=np.int32)
    return mock_policy

        env_action=ActionTuple(
            continuous=np.array([[0.1]] * num_agents, dtype=np.float32)
        ),
-        value=[0.1] * num_agents,
        outputs=fake_action_outputs,
        agent_ids=agent_ids,
    )
    fake_action_info = ActionInfo(
        action=ActionTuple(continuous=np.array([[0.1]], dtype=np.float32)),
        env_action=ActionTuple(continuous=np.array([[0.1]], dtype=np.float32)),
-        value=[0.1],
        outputs=fake_action_outputs,
        agent_ids=mock_decision_step.agent_id,
    )
    fake_action_info = ActionInfo(
        action=ActionTuple(continuous=np.array([[0.1]], dtype=np.float32)),
        env_action=ActionTuple(continuous=np.array([[0.1]], dtype=np.float32)),
-        value=[0.1],
        outputs=fake_action_outputs,
        agent_ids=mock_decision_step.agent_id,
    )
--- a/ml-agents/mlagents/trainers/tests/torch/saver/test_saver.py
+++ b/ml-agents/mlagents/trainers/tests/torch/saver/test_saver.py
    """
    Make sure two policies have the same output for the same input.
    """
-    policy1.actor_critic = policy1.actor_critic.to(default_device())
-    policy2.actor_critic = policy2.actor_critic.to(default_device())
+    policy1.actor = policy1.actor.to(default_device())
+    policy2.actor = policy2.actor.to(default_device())

    decision_step, _ = mb.create_steps_from_behavior_spec(
        policy1.behavior_spec, num_agents=1
--- a/ml-agents/mlagents/trainers/tests/torch/test_networks.py
+++ b/ml-agents/mlagents/trainers/tests/torch/test_networks.py
 from mlagents.trainers.torch.networks import (
    NetworkBody,
    ValueNetwork,
+    SimpleActor,
-    SeparateActorCritic,
 )
 from mlagents.trainers.settings import NetworkSettings
 from mlagents_envs.base_env import ActionSpec
            assert _out[0] == pytest.approx(1.0, abs=0.1)


-@pytest.mark.parametrize("ac_type", [SharedActorCritic, SeparateActorCritic])
+@pytest.mark.parametrize("shared", [True, False])
-def test_actor_critic(ac_type, lstm):
+def test_actor_critic(lstm, shared):
    obs_size = 4
    network_settings = NetworkSettings(
        memory=NetworkSettings.MemorySettings() if lstm else None, normalize=True
    stream_names = [f"stream_name{n}" for n in range(4)]
    # action_spec = ActionSpec.create_continuous(act_size[0])
    action_spec = ActionSpec(act_size, tuple(act_size for _ in range(act_size)))
-    actor = ac_type(obs_spec, network_settings, action_spec, stream_names)
+    if shared:
+        actor = critic = SharedActorCritic(
+            obs_spec, network_settings, action_spec, stream_names, network_settings
+        )
+    else:
+        actor = SimpleActor(obs_spec, network_settings, action_spec)
+        critic = ValueNetwork(stream_names, obs_spec, network_settings)
    if lstm:
        sample_obs = torch.ones((1, network_settings.memory.sequence_length, obs_size))
        memories = torch.ones(
        # memories isn't always set to None, the network should be able to
        # deal with that.
    # Test critic pass
-    value_out, memories_out = actor.critic_pass([sample_obs], memories=memories)
+    value_out, memories_out = critic.critic_pass([sample_obs], memories=memories)
    for stream in stream_names:
        if lstm:
            assert value_out[stream].shape == (network_settings.memory.sequence_length,)

    # Test get action stats and_value
-    action, log_probs, entropies, value_out, mem_out = actor.get_action_stats_and_value(
+    action, log_probs, entropies, mem_out = actor.get_action_and_stats(
        [sample_obs], memories=memories, masks=mask
    )
    if lstm:

    if mem_out is not None:
        assert mem_out.shape == memories.shape
-    for stream in stream_names:
-        if lstm:
-            assert value_out[stream].shape == (network_settings.memory.sequence_length,)
-        else:
-            assert value_out[stream].shape == (1,)
--- a/ml-agents/mlagents/trainers/tests/torch/test_policy.py
+++ b/ml-agents/mlagents/trainers/tests/torch/test_policy.py
    if len(memories) > 0:
        memories = torch.stack(memories).unsqueeze(0)

-    log_probs, entropy, values = policy.evaluate_actions(
+    log_probs, entropy = policy.evaluate_actions(
        tensor_obs,
        masks=act_masks,
        actions=agent_action,

    assert log_probs.flatten().shape == (64, _size)
    assert entropy.shape == (64,)
-    for val in values.values():
-        assert val.shape == (64,)


@pytest.mark.parametrize("discrete", [True, False], ids=["discrete", "continuous"])
--- a/ml-agents/mlagents/trainers/tests/torch/test_ppo.py
+++ b/ml-agents/mlagents/trainers/tests/torch/test_ppo.py
            RewardSignalUtil.value_estimates_key("extrinsic"),
        ],
    )
+    # Copy memories to critic memories
+    copy_buffer_fields(update_buffer, BufferKey.MEMORY, [BufferKey.CRITIC_MEMORY])

    return_stats = optimizer.update(
        update_buffer,
            RewardSignalUtil.value_estimates_key("curiosity"),
        ],
    )
+    # Copy memories to critic memories
+    copy_buffer_fields(update_buffer, BufferKey.MEMORY, [BufferKey.CRITIC_MEMORY])

    optimizer.update(
        update_buffer,
        action_spec=DISCRETE_ACTION_SPEC if discrete else CONTINUOUS_ACTION_SPEC,
        max_step_complete=True,
    )
-    run_out, final_value_out = optimizer.get_trajectory_value_estimates(
+    run_out, final_value_out, all_memories = optimizer.get_trajectory_value_estimates(
+    if all_memories is not None:
+        assert len(all_memories) == 15
-    run_out, final_value_out = optimizer.get_trajectory_value_estimates(
+    run_out, final_value_out, _ = optimizer.get_trajectory_value_estimates(
        trajectory.to_agentbuffer(), trajectory.next_obs, done=True
    )
    for key, val in final_value_out.items():
    # Check if we ignore terminal states properly
    optimizer.reward_signals["extrinsic"].use_terminal_states = False
-    run_out, final_value_out = optimizer.get_trajectory_value_estimates(
+    run_out, final_value_out, _ = optimizer.get_trajectory_value_estimates(
        trajectory.to_agentbuffer(), trajectory.next_obs, done=False
    )
    for key, val in final_value_out.items():
--- a/ml-agents/mlagents/trainers/tests/torch/test_sac.py
+++ b/ml-agents/mlagents/trainers/tests/torch/test_sac.py
    )
    # Test update
    update_buffer = mb.simulate_rollout(
-        BUFFER_INIT_SAMPLES, optimizer.policy.behavior_spec, memory_size=24
+        BUFFER_INIT_SAMPLES, optimizer.policy.behavior_spec, memory_size=12
+    # Mock out value memories
+    update_buffer[BufferKey.CRITIC_MEMORY] = update_buffer[BufferKey.MEMORY]
    return_stats = optimizer.update(
        update_buffer,
        num_sequences=update_buffer.num_experiences // optimizer.policy.sequence_length,
--- a/ml-agents/mlagents/trainers/tests/torch/test_simple_rl.py
+++ b/ml-agents/mlagents/trainers/tests/torch/test_simple_rl.py
    new_hyperparams = attr.evolve(
        SAC_TORCH_CONFIG.hyperparameters,
        batch_size=256,
-        learning_rate=1e-4,
+        learning_rate=3e-4,
        buffer_init_steps=1000,
        steps_per_update=2,
    )
        network_settings=new_networksettings,
        max_steps=4000,
    )
-    check_environment_trains(env, {BRAIN_NAME: config}, training_seed=1213)
+    check_environment_trains(env, {BRAIN_NAME: config}, training_seed=1337)


@pytest.mark.parametrize("action_sizes", [(0, 1), (1, 0)])
        num_visual=1,
        num_vector=0,
        action_sizes=action_sizes,
-        step_size=0.2,
+        step_size=0.3,
    )
    bc_settings = BehavioralCloningSettings(demo_path=demo_path, steps=1500)
    reward_signals = {
--- a/ml-agents/mlagents/trainers/torch/components/bc/module.py
+++ b/ml-agents/mlagents/trainers/torch/components/bc/module.py
        self.decay_learning_rate = ModelUtils.DecayedValue(
            learning_rate_schedule, self.current_lr, 1e-10, self._anneal_steps
        )
-        params = self.policy.actor_critic.parameters()
+        params = self.policy.actor.parameters()
        self.optimizer = torch.optim.Adam(params, lr=self.current_lr)
        _, self.demonstration_buffer = demo_to_buffer(
            settings.demo_path, policy.sequence_length, policy.behavior_spec
--- a/ml-agents/mlagents/trainers/torch/model_serialization.py
+++ b/ml-agents/mlagents/trainers/torch/model_serialization.py

        with exporting_to_onnx():
            torch.onnx.export(
-                self.policy.actor_critic,
+                self.policy.actor,
                self.dummy_input,
                onnx_output_path,
                opset_version=SerializationSettings.onnx_opset,
--- a/ml-agents/mlagents/trainers/torch/networks.py
+++ b/ml-agents/mlagents/trainers/torch/networks.py
        return encoding, memories


-class ValueNetwork(nn.Module):
+class Critic(abc.ABC):
+    @abc.abstractmethod
+    def update_normalization(self, buffer: AgentBuffer) -> None:
+        """
+        Updates normalization of Actor based on the provided List of vector obs.
+        :param vector_obs: A List of vector obs as tensors.
+        """
+        pass
+
+    def critic_pass(
+        self,
+        inputs: List[torch.Tensor],
+        memories: Optional[torch.Tensor] = None,
+        sequence_length: int = 1,
+    ) -> Tuple[Dict[str, torch.Tensor], torch.Tensor]:
+        """
+        Get value outputs for the given obs.
+        :param inputs: List of inputs as tensors.
+        :param memories: Tensor of memories, if using memory. Otherwise, None.
+        :returns: Dict of reward stream to output tensor for values.
+        """
+        pass
+
+
+class ValueNetwork(nn.Module, Critic):
    def __init__(
        self,
        stream_names: List[str],
            encoding_size = network_settings.hidden_units
        self.value_heads = ValueHeads(stream_names, encoding_size, outputs_per_stream)

+    def update_normalization(self, buffer: AgentBuffer) -> None:
+        self.network_body.update_normalization(buffer)
+
+    def critic_pass(
+        self,
+        inputs: List[torch.Tensor],
+        memories: Optional[torch.Tensor] = None,
+        sequence_length: int = 1,
+    ) -> Tuple[Dict[str, torch.Tensor], torch.Tensor]:
+        value_outputs, critic_mem_out = self.forward(
+            inputs, memories=memories, sequence_length=sequence_length
+        )
+        return value_outputs, critic_mem_out
+
    def forward(
        self,
        inputs: List[torch.Tensor],
        """
        pass

-    def get_action_stats(
+    def get_action_and_stats(
        self,
        inputs: List[torch.Tensor],
        masks: Optional[torch.Tensor] = None,
        """
        Returns sampled actions.
        If memory is enabled, return the memories as well.
-        :param vec_inputs: A List of vector inputs as tensors.
-        :param vis_inputs: A List of visual inputs as tensors.
+        :param inputs: A List of inputs as tensors.
        :param masks: If using discrete actions, a Tensor of action masks.
        :param memories: If using memory, a Tensor of initial memories.
        :param sequence_length: If using memory, the sequence length.
        pass

-    @abc.abstractmethod
-    def forward(
-        self,
-        vec_inputs: List[torch.Tensor],
-        vis_inputs: List[torch.Tensor],
-        var_len_inputs: List[torch.Tensor],
-        masks: Optional[torch.Tensor] = None,
-        memories: Optional[torch.Tensor] = None,
-    ) -> Tuple[Union[int, torch.Tensor], ...]:
-        """
-        Forward pass of the Actor for inference. This is required for export to ONNX, and
-        the inputs and outputs of this method should not be changed without a respective change
-        in the ONNX export code.
-        """
-        pass
-
-
-class ActorCritic(Actor):
-    @abc.abstractmethod
-    def critic_pass(
+    def get_stats(
-        memories: Optional[torch.Tensor] = None,
-        sequence_length: int = 1,
-    ) -> Tuple[Dict[str, torch.Tensor], torch.Tensor]:
-        """
-        Get value outputs for the given obs.
-        :param inputs: List of inputs as tensors.
-        :param memories: Tensor of memories, if using memory. Otherwise, None.
-        :returns: Dict of reward stream to output tensor for values.
-        """
-        pass
-
-    @abc.abstractmethod
-    def get_action_stats_and_value(
-        self,
-        inputs: List[torch.Tensor],
+        actions: AgentAction,
-    ) -> Tuple[
-        AgentAction, ActionLogProbs, torch.Tensor, Dict[str, torch.Tensor], torch.Tensor
-    ]:
+    ) -> Tuple[ActionLogProbs, torch.Tensor]:
-        Returns sampled actions and value estimates.
+        Returns log_probs for actions and entropies.
-        :param inputs: A List of vector inputs as tensors.
+        :param inputs: A List of inputs as tensors.
+        :param actions: AgentAction of actions.
-        :return: A Tuple of AgentAction, ActionLogProbs, entropies, Dict of reward signal
-            name to value estimate, and memories. Memories will be None if not using memory.
+        :return: A Tuple of AgentAction, ActionLogProbs, entropies, and memories.
+            Memories will be None if not using memory.
+
-    @abc.abstractproperty
-    def memory_size(self):
+    @abc.abstractmethod
+    def forward(
+        self,
+        vec_inputs: List[torch.Tensor],
+        vis_inputs: List[torch.Tensor],
+        var_len_inputs: List[torch.Tensor],
+        masks: Optional[torch.Tensor] = None,
+        memories: Optional[torch.Tensor] = None,
+    ) -> Tuple[Union[int, torch.Tensor], ...]:
-        Returns the size of the memory (same size used as input and output in the other
-        methods) used by this Actor.
+        Forward pass of the Actor for inference. This is required for export to ONNX, and
+        the inputs and outputs of this method should not be changed without a respective change
+        in the ONNX export code.
        """
        pass

    def update_normalization(self, buffer: AgentBuffer) -> None:
        self.network_body.update_normalization(buffer)

-    def get_action_stats(
+    def get_action_and_stats(
        self,
        inputs: List[torch.Tensor],
        masks: Optional[torch.Tensor] = None,
        action, log_probs, entropies = self.action_model(encoding, masks)
        return action, log_probs, entropies, memories

+    def get_stats(
+        self,
+        inputs: List[torch.Tensor],
+        actions: AgentAction,
+        masks: Optional[torch.Tensor] = None,
+        memories: Optional[torch.Tensor] = None,
+        sequence_length: int = 1,
+    ) -> Tuple[ActionLogProbs, torch.Tensor]:
+        encoding, actor_mem_outs = self.network_body(
+            inputs, memories=memories, sequence_length=sequence_length
+        )
+        log_probs, entropies = self.action_model.evaluate(encoding, masks, actions)
+
+        return log_probs, entropies
+
    def forward(
        self,
        vec_inputs: List[torch.Tensor],
        return tuple(export_out)


-class SharedActorCritic(SimpleActor, ActorCritic):
+class SharedActorCritic(SimpleActor, Critic):
    def __init__(
        self,
        observation_specs: List[ObservationSpec],
            inputs, memories=memories, sequence_length=sequence_length
        )
        return self.value_heads(encoding), memories_out
-
-    def get_stats_and_value(
-        self,
-        inputs: List[torch.Tensor],
-        actions: AgentAction,
-        masks: Optional[torch.Tensor] = None,
-        memories: Optional[torch.Tensor] = None,
-        sequence_length: int = 1,
-    ) -> Tuple[ActionLogProbs, torch.Tensor, Dict[str, torch.Tensor]]:
-        encoding, memories = self.network_body(
-            inputs, memories=memories, sequence_length=sequence_length
-        )
-        log_probs, entropies = self.action_model.evaluate(encoding, masks, actions)
-        value_outputs = self.value_heads(encoding)
-        return log_probs, entropies, value_outputs
-
-    def get_action_stats_and_value(
-        self,
-        inputs: List[torch.Tensor],
-        masks: Optional[torch.Tensor] = None,
-        memories: Optional[torch.Tensor] = None,
-        sequence_length: int = 1,
-    ) -> Tuple[
-        AgentAction, ActionLogProbs, torch.Tensor, Dict[str, torch.Tensor], torch.Tensor
-    ]:
-
-        encoding, memories = self.network_body(
-            inputs, memories=memories, sequence_length=sequence_length
-        )
-        action, log_probs, entropies = self.action_model(encoding, masks)
-        value_outputs = self.value_heads(encoding)
-        return action, log_probs, entropies, value_outputs, memories
-
-
-class SeparateActorCritic(SimpleActor, ActorCritic):
-    def __init__(
-        self,
-        observation_specs: List[ObservationSpec],
-        network_settings: NetworkSettings,
-        action_spec: ActionSpec,
-        stream_names: List[str],
-        conditional_sigma: bool = False,
-        tanh_squash: bool = False,
-    ):
-        self.use_lstm = network_settings.memory is not None
-        super().__init__(
-            observation_specs,
-            network_settings,
-            action_spec,
-            conditional_sigma,
-            tanh_squash,
-        )
-        self.stream_names = stream_names
-        self.critic = ValueNetwork(stream_names, observation_specs, network_settings)
-
-    @property
-    def memory_size(self) -> int:
-        return self.network_body.memory_size + self.critic.memory_size
-
-    def _get_actor_critic_mem(
-        self, memories: Optional[torch.Tensor] = None
-    ) -> Tuple[Optional[torch.Tensor], Optional[torch.Tensor]]:
-        if self.use_lstm and memories is not None:
-            # Use only the back half of memories for critic and actor
-            actor_mem, critic_mem = torch.split(memories, self.memory_size // 2, dim=-1)
-            actor_mem, critic_mem = actor_mem.contiguous(), critic_mem.contiguous()
-        else:
-            critic_mem = None
-            actor_mem = None
-        return actor_mem, critic_mem
-
-    def critic_pass(
-        self,
-        inputs: List[torch.Tensor],
-        memories: Optional[torch.Tensor] = None,
-        sequence_length: int = 1,
-    ) -> Tuple[Dict[str, torch.Tensor], torch.Tensor]:
-        actor_mem, critic_mem = self._get_actor_critic_mem(memories)
-        value_outputs, critic_mem_out = self.critic(
-            inputs, memories=critic_mem, sequence_length=sequence_length
-        )
-        if actor_mem is not None:
-            # Make memories with the actor mem unchanged
-            memories_out = torch.cat([actor_mem, critic_mem_out], dim=-1)
-        else:
-            memories_out = None
-        return value_outputs, memories_out
-
-    def get_stats_and_value(
-        self,
-        inputs: List[torch.Tensor],
-        actions: AgentAction,
-        masks: Optional[torch.Tensor] = None,
-        memories: Optional[torch.Tensor] = None,
-        sequence_length: int = 1,
-    ) -> Tuple[ActionLogProbs, torch.Tensor, Dict[str, torch.Tensor]]:
-        actor_mem, critic_mem = self._get_actor_critic_mem(memories)
-        encoding, actor_mem_outs = self.network_body(
-            inputs, memories=actor_mem, sequence_length=sequence_length
-        )
-        log_probs, entropies = self.action_model.evaluate(encoding, masks, actions)
-        value_outputs, critic_mem_outs = self.critic(
-            inputs, memories=critic_mem, sequence_length=sequence_length
-        )
-
-        return log_probs, entropies, value_outputs
-
-    def get_action_stats(
-        self,
-        inputs: List[torch.Tensor],
-        masks: Optional[torch.Tensor] = None,
-        memories: Optional[torch.Tensor] = None,
-        sequence_length: int = 1,
-    ) -> Tuple[AgentAction, ActionLogProbs, torch.Tensor, torch.Tensor]:
-        actor_mem, critic_mem = self._get_actor_critic_mem(memories)
-        action, log_probs, entropies, actor_mem_out = super().get_action_stats(
-            inputs, masks=masks, memories=actor_mem, sequence_length=sequence_length
-        )
-        if critic_mem is not None:
-            # Make memories with the actor mem unchanged
-            memories_out = torch.cat([actor_mem_out, critic_mem], dim=-1)
-        else:
-            memories_out = None
-        return action, log_probs, entropies, memories_out
-
-    def get_action_stats_and_value(
-        self,
-        inputs: List[torch.Tensor],
-        masks: Optional[torch.Tensor] = None,
-        memories: Optional[torch.Tensor] = None,
-        sequence_length: int = 1,
-    ) -> Tuple[
-        AgentAction, ActionLogProbs, torch.Tensor, Dict[str, torch.Tensor], torch.Tensor
-    ]:
-        actor_mem, critic_mem = self._get_actor_critic_mem(memories)
-        encoding, actor_mem_outs = self.network_body(
-            inputs, memories=actor_mem, sequence_length=sequence_length
-        )
-        action, log_probs, entropies = self.action_model(encoding, masks)
-        value_outputs, critic_mem_outs = self.critic(
-            inputs, memories=critic_mem, sequence_length=sequence_length
-        )
-        if self.use_lstm:
-            mem_out = torch.cat([actor_mem_outs, critic_mem_outs], dim=-1)
-        else:
-            mem_out = None
-        return action, log_probs, entropies, value_outputs, mem_out
-
-    def update_normalization(self, buffer: AgentBuffer) -> None:
-        super().update_normalization(buffer)
-        self.critic.network_body.update_normalization(buffer)


 class GlobalSteps(nn.Module):
--- a/utils/make_readme_table.py
+++ b/utils/make_readme_table.py
    ReleaseInfo("release_10", "1.6.0", "0.22.0", "November 18, 2020"),
    ReleaseInfo("release_11", "1.7.0", "0.23.0", "December 21, 2020"),
    ReleaseInfo("release_12", "1.7.2", "0.23.0", "December 22, 2020"),
+    ReleaseInfo("release_13", "1.8.0", "0.24.0", "February 17, 2021"),
    # Verified releases
    ReleaseInfo("", "1.0.6", "0.16.1", "November 16, 2020", is_verified=True),
    ReleaseInfo("", "1.0.5", "0.16.1", "September 23, 2020", is_verified=True),
--- a/utils/validate_release_links.py
+++ b/utils/validate_release_links.py
 import os
 import re
 import subprocess
+import tempfile
+MATCH_ANY = re.compile(r"(?s).*")
-    "README.md": re.compile(r"\*\*Release [0-9]+\*\*"),
-    "docs/Versioning.md": None,
-    "com.unity.ml-agents/CHANGELOG.md": None,
-    "utils/make_readme_table.py": None,
-    "utils/validate_release_links.py": None,
+    "README.md": re.compile(r"\*\*(Verified Package ([0-9]\.?)*|Release [0-9]+)\*\*"),
+    "docs/Versioning.md": MATCH_ANY,
+    "com.unity.ml-agents/CHANGELOG.md": MATCH_ANY,
+    "utils/make_readme_table.py": MATCH_ANY,
+    "utils/validate_release_links.py": MATCH_ANY,
 }


    raise RuntimeError("Can't determine release tag")


-def check_file(filename: str, global_allow_pattern: Pattern) -> List[str]:
+def check_file(
+    filename: str, global_allow_pattern: Pattern, release_tag: str
+) -> List[str]:
-    with open(filename) as f:
-        for line in f:
-            if not RELEASE_PATTERN.search(line):
-                continue
+    with tempfile.TemporaryDirectory() as tempdir:
+        if not os.path.exists(tempdir):
+            os.makedirs(tempdir)
+        new_file_name = os.path.join(tempdir, os.path.basename(filename))
+        with open(new_file_name, "w+") as new_file:
+            # default to match everything if there is nothing in the ALLOW_LIST
+            allow_list_pattern = ALLOW_LIST.get(filename, None)
+            with open(filename) as f:
+                for line in f:
+                    keep_line = True
+                    keep_line = not RELEASE_PATTERN.search(line)
+                    keep_line |= global_allow_pattern.search(line) is not None
+                    keep_line |= (
+                        allow_list_pattern is not None
+                        and allow_list_pattern.search(line) is not None
+                    )
-            if global_allow_pattern.search(line):
-                continue
-
-            if filename in ALLOW_LIST:
-                if ALLOW_LIST[filename] is None or ALLOW_LIST[filename].search(line):
-                    continue
+                    if keep_line:
+                        new_file.write(line)
+                    else:
+                        bad_lines.append(f"{filename}: {line}")
+                        new_file.write(
+                            re.sub(r"release_[0-9]+", fr"{release_tag}", line)
+                        )
+        if bad_lines:
+            if os.path.exists(filename):
+                os.remove(filename)
+                os.rename(new_file_name, filename)
-            bad_lines.append(f"{filename}: {line.strip()}")
-def check_all_files(allow_pattern: Pattern) -> List[str]:
+def check_all_files(allow_pattern: Pattern, release_tag: str) -> List[str]:
    """
    Validate all files tracked by git.
    :param allow_pattern:
    for file_name in git_ls_files():
        if "localized" in file_name or os.path.splitext(file_name)[1] not in file_types:
            continue
-        bad_lines += check_file(file_name, allow_pattern)
+        bad_lines += check_file(file_name, allow_pattern, release_tag)
    return bad_lines



    print(f"Release tag: {release_tag}")
    allow_pattern = re.compile(f"{release_tag}(_docs)*")
-    bad_lines = check_all_files(allow_pattern)
+    bad_lines = check_all_files(allow_pattern, release_tag)
+        for line in bad_lines:
+            print(line)
+
+        print("*************************************************************")
-            f"Found lines referring to previous release. Either update the files, or add an exclusion to {__file__}"
+            "This script attempted to fix the above errors. Please double "
+            + "check them to make sure the replacements were done correctly"
-        for line in bad_lines:
-            print(line)

    sys.exit(1 if bad_lines else 0)

--- a/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx
+++ b/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx
--- a/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx.meta
+++ b/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.onnx.meta
+fileFormatVersion: 2
+guid: 28ccdfd7cb3d941ce8af0ab89e06130a
+ScriptedImporter:
+  fileIDToRecycleName:
+    11400000: main obj
+    11400002: model data
+  externalObjects: {}
+  userData: 
+  assetBundleName: 
+  assetBundleVariant: 
+  script: {fileID: 11500000, guid: 683b6cb6d0a474744822c888b46772c9, type: 3}
+  optimizeModel: 1
+  forceArbitraryBatchSize: 1
+  treatErrorsAsWarnings: 0
+  importMode: 1
--- a/com.unity.ml-agents/Runtime/Actuators/IBuiltInActuator.cs
+++ b/com.unity.ml-agents/Runtime/Actuators/IBuiltInActuator.cs
+namespace Unity.MLAgents.Actuators
+{
+    /// <summary>
+    /// Identifiers for "built in" actuator types.
+    /// These are only used for analytics, and should not be used for any runtime decisions.
+    ///
+    /// NOTE: Do not renumber these, since the values are used for analytics. Renaming is allowed though.
+    /// </summary>
+    public enum BuiltInActuatorType
+    {
+        /// <summary>
+        /// Default Sensor type if it cannot be determined.
+        /// </summary>
+        Unknown = 0,
+
+        /// <summary>
+        /// VectorActuator used by the Agent
+        /// </summary>
+        AgentVectorActuator = 1,
+
+        /// <summary>
+        /// Corresponds to <see cref="VectorActuator"/>
+        /// </summary>
+        VectorActuator = 2,
+
+        /// <summary>
+        /// Corresponds to the Match3Actuator in com.unity.ml-agents.extensions.
+        /// </summary>
+        Match3Actuator = 3,
+
+        /// <summary>
+        /// Corresponds to the InputActionActuator in com.unity.ml-agents.extensions.
+        /// </summary>
+        InputActionActuator = 4,
+    }
+
+    /// <summary>
+    /// Interface for actuators that are provided as part of ML-Agents.
+    /// User-implemented actuators don't need to use this interface.
+    /// </summary>
+    internal interface IBuiltInActuator
+    {
+        /// <summary>
+        /// Return the corresponding BuiltInActuatorType for the actuator.
+        /// </summary>
+        /// <returns>A BuiltInActuatorType corresponding to the actuator.</returns>
+        BuiltInActuatorType GetBuiltInActuatorType();
+    }
+}
--- a/com.unity.ml-agents/Runtime/Actuators/IBuiltInActuator.cs.meta
+++ b/com.unity.ml-agents/Runtime/Actuators/IBuiltInActuator.cs.meta
+fileFormatVersion: 2
+guid: e3d7ef9a9a5043549cc5c0bbee520810
+timeCreated: 1613514041
--- a/docs/images/variable-length-observation-illustrated.png
+++ b/docs/images/variable-length-observation-illustrated.png
--- a/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn
+++ b/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn
--- a/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn.meta
+++ b/Project/Assets/ML-Agents/Examples/Match3/TFModels/Match3VisualObs.nn.meta
-fileFormatVersion: 2
-guid: 48d14da88fea74d0693c691c6e3f2e34
-ScriptedImporter:
-  fileIDToRecycleName:
-    11400000: main obj
-    11400002: model data
-  externalObjects: {}
-  userData: 
-  assetBundleName: 
-  assetBundleVariant: 
-  script: {fileID: 11500000, guid: 19ed1486aa27d4903b34839f37b8f69f, type: 3}