浏览代码

Doc review (#3803)

* Edit and review package docs.

* Filter out testa and internal namespaces.

* remove offsetStep field that was accidentally revivified

* Resolving review comments

* Update com.unity.ml-agents/Runtime/Agent.cs

* fix trailing whitespace

* Revised Agent class intro and step description

* Fixed a few missed comments.

* removed prerelease warning

Co-authored-by: Chris Elion <chris.elion@unity3d.com>
/develop/dockerfile
GitHub 5 年前
当前提交
256431f7
共有 11 个文件被更改,包括 623 次插入191 次删除
  1. 78
      com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
  2. 64
      com.unity.ml-agents/Runtime/Academy.cs
  3. 551
      com.unity.ml-agents/Runtime/Agent.cs
  4. 4
      com.unity.ml-agents/Runtime/Communicator/ICommunicator.cs
  5. 13
      com.unity.ml-agents/Runtime/DecisionRequester.cs
  6. 13
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
  7. 3
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationWriter.cs
  8. 39
      com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
  9. 5
      com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs
  10. 27
      com.unity.ml-agents/Runtime/Policies/BrainParameters.cs
  11. 17
      com.unity.ml-agents/Documentation~/filter.yml

78
com.unity.ml-agents/Documentation~/com.unity.ml-agents.md


# About ML-Agents package (`com.unity.ml-agents`)
The Unity ML-Agents package contains the C# SDK for the
[Unity ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
The Unity ML-Agents package contains the C# SDK for the [Unity ML-Agents Toolkit].
The package provides the ability for any Unity scene to be converted into a learning
environment where character behaviors can be trained using a variety of machine learning
algorithms. Additionally, it enables any trained behavior to be embedded back into the Unity
scene. More specifically, the package provides the following core functionalities:
* Define Agents: entities whose behavior will be learned. Agents are entities
that generate observations (through sensors), take actions and receive rewards from
the environment.
The package allows you to convert any Unity scene to into a learning
environment and train character behaviors using a variety of machine learning
algorithms. Additionally, it allows you to embed these trained behaviors back into
Unity scenes to control your characters. More specifically, the package provides
the following core functionalities:
* Define Agents: entities, or characters, whose behavior will be learned. Agents are entities
that generate observations (through sensors), take actions, and receive rewards from
the environment.
share the same Behavior and a scene may have multiple Behaviors.
* Record demonstrations of an agent within the Editor. These demonstrations can be
valuable to train a behavior for that agent.
* Embedding a trained behavior into the scene via the
[Unity Inference Engine](https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html).
Thus an Agent can switch from a learning behavior to an inference behavior.
share the same Behavior and a scene may have multiple Behaviors.
* Record demonstrations of an agent within the Editor. You can use demonstrations
to help train a behavior for that agent.
* Embedding a trained behavior into the scene via the [Unity Inference Engine].
Embedded behaviors allow you to switch an Agent between learning and inference.
Note that this package does not contain the machine learning algorithms for training
behaviors. It relies on a Python package to orchestrate the training. This package
only enables instrumenting a Unity scene and setting it up for training, and then
embedding the trained model back into your Unity scene.
## Preview package
This package is available as a preview, so it is not ready for production use.
The features and documentation in this package might change before it is verified for release.
Note that the *ML-Agents* package does not contain the machine learning algorithms for training
behaviors. The *ML-Agents* package only supports instrumenting a Unity scene, setting it up for
training, and then embedding the trained model back into your Unity scene. The machine learning
algorithms that orchestrate training are part of the companion [Python package].
## Package contents

|*Runtime*|Contains core C# APIs for integrating ML-Agents into your Unity scene. |
|*Tests*|Contains the unit tests for the package.|
<a name="Installation"></a>
<a name="Installation"></a>
To install this package, follow the instructions in the
[Package Manager documentation](https://docs.unity3d.com/Manual/upm-ui-install.html).
To install this *ML-Agents* package, follow the instructions in the [Package Manager documentation].
To install the Python package to enable training behaviors, follow the instructions on our
[GitHub repository](https://github.com/Unity-Technologies/ml-agents/blob/latest_release/docs/Installation.md).
To install the companion Python package to enable training behaviors, follow the
[installation instructions] on our [GitHub repository].
This version of the Unity ML-Agents package is compatible with the following versions of the Unity Editor:
This version of the Unity ML-Agents package is compatible with the following versions of the
Unity Editor:
* 2018.4 and later (recommended)
* 2018.4 and later
## Known limitations

Currently the speed of the game physics can only be increased to 100x real-time.
The Academy also moves in time with FixedUpdate() rather than Update(), so game
behavior implemented in Update() may be out of sync with the agent decision
making. See
[Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
for more information.
making. See [Execution Order of Event Functions] for more information.
You can control the frequency of Academy stepping by calling
`Academy.Instance.DisableAutomaticStepping()`, and then calling

If you are new to the Unity ML-Agents package, or have a question after reading
the documentation, you can checkout our
[GitHUb Repository](https://github.com/Unity-Technologies/ml-agents), which
also includes a number of ways to
[connect with us](https://github.com/Unity-Technologies/ml-agents#community-and-feedback)
including our [ML-Agents Forum](https://forum.unity.com/forums/ml-agents.453/).
[GitHUb Repository], which also includes a number of ways to [connect with us]
including our [ML-Agents Forum].
[Unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
[Unity Inference Engine]: https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
[Package Manager documentation]: https://docs.unity3d.com/Manual/upm-ui-install.html
[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/latest_release/docs/Installation.md
[GitHUb Repository]: https://github.com/Unity-Technologies/ml-agents
[Python package]: https://github.com/Unity-Technologies/ml-agents
[Execution Order of Event Functions]: https://docs.unity3d.com/Manual/ExecutionOrder.html
[connect with us]: https://github.com/Unity-Technologies/ml-agents#community-and-feedback
[ML-Agents Forum]: https://forum.unity.com/forums/ml-agents.453/

64
com.unity.ml-agents/Runtime/Academy.cs


* API. For more information on each of these entities, in addition to how to
* set-up a learning environment and train the behavior of characters in a
* Unity scene, please browse our documentation pages on GitHub:
* https://github.com/Unity-Technologies/ml-agents/blob/master/docs/
* https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/
*/
namespace MLAgents

}
/// <summary>
/// An Academy is where Agent objects go to train their behaviors.
/// The Academy singleton manages agent training and decision making.
/// When an academy is run, it can either be in inference or training mode.
/// The mode is determined by the presence or absence of a Communicator. In
/// the presence of a communicator, the academy is run in training mode where
/// the states and observations of each agent are sent through the
/// communicator. In the absence of a communicator, the academy is run in
/// inference mode where the agent behavior is determined by the Policy
/// attached to it.
/// Access the Academy singleton through the <see cref="Instance"/>
/// property. The Academy instance is initialized the first time it is accessed (which will
/// typically be by the first <see cref="Agent"/> initialized in a scene).
///
/// At initialization, the Academy attempts to connect to the Python training process through
/// the external communicator. If successful, the training process can train <see cref="Agent"/>
/// instances. When you set an agent's <see cref="BehaviorParameters.behaviorType"/> setting
/// to <see cref="BehaviorType.Default"/>, the agent exchanges data with the training process
/// to make decisions. If no training process is available, agents with the default behavior
/// fall back to inference or heuristic decisions. (You can also set agents to always use
/// inference or heuristics.)
/// </remarks>
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/master/" +
"docs/Learning-Environment-Design.md")]

static Lazy<Academy> s_Lazy = new Lazy<Academy>(() => new Academy());
/// <summary>
/// True if the Academy is initialized, false otherwise.
///Reports whether the Academy has been initialized yet.
/// <value><c>True</c> if the Academy is initialized, <c>false</c> otherwise.</value>
public static bool IsInitialized
{
get { return s_Lazy.IsValueCreated; }

/// The singleton Academy object.
/// </summary>
/// <value>Getting the instance initializes the Academy, if necessary.</value>
/// Returns whether or not the communicator is on.
/// Reports whether or not the communicator is on.
/// <returns>
/// <c>true</c>, if communicator is on, <c>false</c> otherwise.
/// </returns>
/// <seealso cref="ICommunicator"/>
/// <value>
/// <c>True</c>, if communicator is on, <c>false</c> otherwise.
/// </value>
public bool IsCommunicatorOn
{
get { return Communicator != null; }

}
// The Academy uses a series of events to communicate with agents
// to facilitate synchronization. More specifically, it ensure
// that all the agents performs their steps in a consistent order (i.e. no
// to facilitate synchronization. More specifically, it ensures
// that all the agents perform their steps in a consistent order (i.e. no
// agent can act based on a decision before another agent has had a chance
// to request a decision).

// Signals to all the listeners that the academy is being destroyed
internal event Action DestroyAction;
// Signals the Agent that a new step is about to start.
// Signals to the Agent that a new step is about to start.
// This will mark the Agent as Done if it has reached its maxSteps.
internal event Action AgentIncrementStep;

/// <summary>
/// Determines whether or not the Academy is automatically stepped during the FixedUpdate phase.
/// </summary>
/// <value>Set <c>true</c> to enable automatic stepping; <c>false</c> to disable.</value>
public bool AutomaticSteppingEnabled
{
get { return m_FixedUpdateStepper != null; }

}
/// <summary>
/// Initializes the environment, configures it and initialized the Academy.
/// Initializes the environment, configures it and initializes the Academy.
/// </summary>
void InitializeEnvironment()
{

}
/// <summary>
/// Returns the current episode counter.
/// The current episode count.
/// <returns>
/// <value>
/// </returns>
/// </value>
public int EpisodeCount
{
get { return m_EpisodeCount; }

/// Returns the current step counter (within the current episode).
/// The current step count (within the current episode).
/// <returns>
/// <value>
/// </returns>
/// </value>
public int StepCount
{
get { return m_StepCount; }

/// Returns the total step counter.
/// Returns the total step count.
/// <returns>
/// <value>
/// </returns>
/// </value>
public int TotalStepCount
{
get { return m_TotalStepCount; }

}
/// <summary>
/// Performs a single environment update to the Academy, and Agent
/// Performs a single environment update of the Academy and Agent
/// objects within the environment.
/// </summary>
public void EnvironmentStep()

551
com.unity.ml-agents/Runtime/Agent.cs


public float[] storedVectorActions;
/// <summary>
/// For discrete control, specifies the actions that the agent cannot take. Is true if
/// the action is masked.
/// For discrete control, specifies the actions that the agent cannot take.
/// An element of the mask array is <c>true</c> if the action is prohibited.
/// Current agent reward.
/// The current agent reward.
/// </summary>
public float reward;

}
/// <summary>
/// Agent MonoBehaviour class that is attached to a Unity GameObject, making it
/// an Agent. An agent produces observations and takes actions in the
/// environment. Observations are determined by the cameras attached
/// to the agent in addition to the vector observations implemented by the
/// user in <see cref="Agent.CollectObservations(VectorSensor)"/>.
/// On the other hand, actions are determined by decisions produced by a Policy.
/// Currently, this class is expected to be extended to implement the desired agent behavior.
/// An agent is an actor that can observe its environment, decide on the
/// best course of action using those observations, and execute those actions
/// within the environment.
/// Simply speaking, an agent roams through an environment and at each step
/// of the environment extracts its current observation, sends them to its
/// policy and in return receives an action. In practice,
/// however, an agent need not send its observation at every step since very
/// little may have changed between successive steps.
/// Use the Agent class as the subclass for implementing your own agents. Add
/// your Agent implementation to a [GameObject] in the [Unity scene] that serves
/// as the agent's environment.
/// At any step, an agent may be considered done due to a variety of reasons:
/// - The agent reached an end state within its environment.
/// - The agent reached the maximum # of steps (i.e. timed out).
/// - The academy reached the maximum # of steps (forced agent to be done).
/// Agents in an environment operate in *steps*. At each step, an agent collects observations,
/// passes them to its decision-making policy, and receives an action vector in response.
/// Here, an agent reaches an end state if it completes its task successfully
/// or somehow fails along the way. In the case where an agent is done before
/// the academy, it either resets and restarts, or just lingers until the
/// academy is done.
/// Agents make observations using <see cref="ISensor"/> implementations. The ML-Agents
/// API provides implementations for visual observations (<see cref="CameraSensor"/>)
/// raycast observations (<see cref="RayPerceptionSensor"/>), and arbitrary
/// data observations (<see cref="VectorSensor"/>). You can add the
/// <see cref="CameraSensorComponent"/> and <see cref="RayPerceptionSensorComponent2D"/> or
/// <see cref="RayPerceptionSensorComponent3D"/> components to an agent's [GameObject] to use
/// those sensor types. You can implement the <see cref="CollectObservations(VectorSensor)"/>
/// function in your Agent subclass to use a vector observation. The Agent class calls this
/// function before it uses the observation vector to make a decision. (If you only use
/// visual or raycast observations, you do not need to implement
/// <see cref="CollectObservations"/>.)
/// An important note regarding steps and episodes is due. Here, an agent step
/// corresponds to an academy step, which also corresponds to Unity
/// environment step (i.e. each FixedUpdate call). This is not the case for
/// episodes. The academy controls the global episode count and each agent
/// controls its own local episode count and can reset and start a new local
/// episode independently (based on its own experience). Thus an academy
/// (global) episode can be viewed as the upper-bound on an agents episode
/// length and that within a single global episode, an agent may have completed
/// multiple local episodes. Consequently, if an agent max step is
/// set to a value larger than the academy max steps value, then the academy
/// value takes precedence (since the agent max step will never be reached).
/// Assign a decision making policy to an agent using a <see cref="BehaviorParameters"/>
/// component attached to the agent's [GameObject]. The <see cref="BehaviorType"/> setting
/// determines how decisions are made:
/// Lastly, note that at any step the policy to the agent is allowed to
/// change model with <see cref="SetModel"/>.
/// * <see cref="BehaviorType.Default"/>: decisions are made by the external process,
/// when connected. Otherwise, decisions are made using inference. If no inference model
/// is specified in the BehaviorParameters component, then heuristic decision
/// making is used.
/// * <see cref="BehaviorType.InferenceOnly"/>: decisions are always made using the trained
/// model specified in the <see cref="BehaviorParameters"/> component.
/// * <see cref="BehaviorType.HeuristicOnly"/>: when a decision is needed, the agent's
/// <see cref="Heuristic"/> function is called. Your implementation is responsible for
/// providing the appropriate action.
/// Implementation-wise, it is required that this class is extended and the
/// virtual methods overridden. For sample implementations of agent behavior,
/// see the Examples/ directory within this Unity project.
/// To trigger an agent decision automatically, you can attach a <see cref="DecisionRequester"/>
/// component to the Agent game object. You can also call the agent's <see cref="RequestDecision"/>
/// function manually. You only need to call <see cref="RequestDecision"/> when the agent is
/// in a position to act upon the decision. In many cases, this will be every [FixedUpdate]
/// callback, but could be less frequent. For example, an agent that hops around its environment
/// can only take an action when it touches the ground, so several frames might elapse between
/// one decision and the need for the next.
///
/// Use the <see cref="OnActionReceived"/> function to implement the actions your agent can take,
/// such as moving to reach a goal or interacting with its environment.
///
/// When you call <see cref="EndEpisode"/> on an agent or the agent reaches its <see cref="maxStep"/> count,
/// its current episode ends. You can reset the agent -- or remove it from the
/// environment -- by implementing the <see cref="OnEpisodeBegin"/> function. An agent also
/// becomes done when the <see cref="Academy"/> resets the environment, which only happens when
/// the <see cref="Academy"/> receives a reset signal from an external process via the
/// <see cref="Academy.Communicator"/>.
///
/// The Agent class extends the Unity [MonoBehaviour] class. You can implement the
/// standard [MonoBehaviour] functions as needed for your agent. Since an agent's
/// observations and actions typically take place during the [FixedUpdate] phase, you should
/// only use the [MonoBehaviour.Update] function for cosmetic purposes. If you override the [MonoBehaviour]
/// methods, [OnEnable()] or [OnDisable()], always call the base Agent class implementations.
///
/// You can implement the <see cref="Heuristic"/> function to specify agent actions using
/// your own heuristic algorithm. Implementing a heuristic function can be useful
/// for debugging. For example, you can use keyboard input to select agent actions in
/// order to manually control an agent's behavior.
///
/// Note that you can change the inference model assigned to an agent at any step
/// by calling <see cref="SetModel"/>.
///
/// See [Agents] and [Reinforcement Learning in Unity] in the [Unity ML-Agents Toolkit manual] for
/// more information on creating and training agents.
///
/// For sample implementations of agent behavior, see the examples available in the
/// [Unity ML-Agents Toolkit] on Github.
///
/// [MonoBehaviour]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.html
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Unity scene]: https://docs.unity3d.com/Manual/CreatingScenes.html
/// [FixedUpdate]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.FixedUpdate.html
/// [MonoBehaviour.Update]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.Update.html
/// [OnEnable()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnEnable.html
/// [OnDisable()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnDisable.html]
/// [OnBeforeSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnBeforeSerialize.html
/// [OnAfterSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnAfterSerialize.html
/// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md
/// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design.md
/// [Unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
/// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Readme.md
///
/// </remarks>
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/master/" +
"docs/Learning-Environment-Design-Agents.md")]

/// <summary>
/// The maximum number of steps the agent takes before being done.
/// </summary>
/// <value>The maximum steps for an agent to take before it resets; or 0 for
/// unlimited steps.</value>
/// If set to 0, the agent can only be set to done programmatically (or
/// when the Academy is done).
/// If set to any positive integer, the agent will be set to done after
/// that many steps. Note that setting the max step to a value greater
/// than the academy max step value renders it useless.
/// The max step value determines the maximum length of an agent's episodes.
/// Set to a positive integer to limit the episode length to that many steps.
/// Set to 0 for unlimited episode length.
///
/// When an episode ends and a new one begins, the Agent object's
/// <seealso cref="OnEpisodeBegin"/> function is called. You can implement
/// <see cref="OnEpisodeBegin"/> to reset the agent or remove it from the
/// environment. An agent's episode can also end if you call its <seealso cref="EndEpisode"/>
/// method or an external process resets the environment through the <see cref="Academy"/>.
///
/// Consider limiting the number of steps in an episode to avoid wasting time during
/// training. If you set the max step value to a reasonable estimate of the time it should
/// take to complete a task, then agents that haven’t succeeded in that time frame will
/// reset and start a new training episode rather than continue to fail.
/// <example>
/// To use a step limit when training while allowing agents to run without resetting
/// outside of training, you can set the max step to 0 in <see cref="Initialize"/>
/// if the <see cref="Academy"/> is not connected to an external process.
/// <code>
/// using MLAgents;
///
/// public class MyAgent : Agent
/// {
/// public override void Initialize()
/// {
/// if (!Academy.Instance.IsCommunicatorOn)
/// {
/// this.maxStep = 0;
/// }
/// }
/// }
/// </code>
/// **Note:** in general, you should limit the differences between the code you execute
/// during training and the code you run during inference.
/// </example>
[HideInInspector] public int maxStep;
/// Current Agent information (message sent to Brain).

/// <summary>
/// Called when the attached <see cref="GameObject"/> becomes enabled and active.
/// </summary>
/// <remarks>
/// This function initializes the Agent instance, if it hasn't been initialized yet.
/// Always call the base Agent class version of this function if you implement `OnEnable()`
/// in your own Agent subclasses.
/// </remarks>
/// <example>
/// <code>
/// protected override void OnEnable()
/// {
/// base.OnEnable();
/// // additional OnEnable logic...
/// }
/// </code>
/// </example>
protected virtual void OnEnable()
{
LazyInitialize();

/// <inheritdoc cref="OnBeforeSerialize"/>
/// Called by Unity immediately before serializing this object.
/// <remarks>
/// The Agent class uses OnBeforeSerialize() for internal housekeeping. Call the
/// base class implementation if you need your own custom serialization logic.
///
/// See [OnBeforeSerialize] for more information.
///
/// [OnBeforeSerialize]: https://docs.unity3d.com/ScriptReference/ISerializationCallbackReceiver.OnAfterDeserialize.html
/// </remarks>
/// <example>
/// <code>
/// public new void OnBeforeSerialize()
/// {
/// base.OnBeforeSerialize();
/// // additional serialization logic...
/// }
/// </code>
/// </example>
public void OnBeforeSerialize()
{
// Manages a serialization upgrade issue from v0.13 to v0.14 where maxStep moved

}
/// <summary>
/// <inheritdoc cref="OnAfterDeserialize"/>
/// Called by Unity immediately after deserializing this object.
/// <remarks>
/// The Agent class uses OnAfterDeserialize() for internal housekeeping. Call the
/// base class implementation if you need your own custom deserialization logic.
///
/// See [OnAfterDeserialize] for more information.
///
/// [OnAfterDeserialize]: https://docs.unity3d.com/ScriptReference/ISerializationCallbackReceiver.OnAfterDeserialize.html
/// </remarks>
/// <example>
/// <code>
/// public new void OnAfterDeserialize()
/// {
/// base.OnAfterDeserialize();
/// // additional deserialization logic...
/// }
/// </code>
/// </example>
public void OnAfterDeserialize()
{
// Manages a serialization upgrade issue from v0.13 to v0.14 where maxStep moved

/// <summary>
/// Initializes the agent. Can be safely called multiple times.
/// </summary>
/// <remarks>
/// This function calls your <seealso cref="Initialize"/> implementation, if one exists.
/// </remarks>
public void LazyInitialize()
{
if (m_Initialized)

}
/// <summary>
/// Reason that the Agent is being considered "done"
/// The reason that the Agent has been set to "done".
/// The <see cref="Done"/> method was called.
/// The <see cref="EndEpisode"/> method was called.
/// </summary>
DoneCalled,

MaxStepReached,
/// <summary>
/// The Agent was disabled
/// The Agent was disabled.
/// </summary>
Disabled,
}

/// </summary>
/// <remarks>
/// Always call the base Agent class version of this function if you implement `OnDisable()`
/// in your own Agent subclasses.
/// </remarks>
/// <example>
/// <code>
/// protected override void OnDisable()
/// {
/// base.OnDisable();
/// // additional OnDisable logic...
/// }
/// </code>
/// </example>
/// <seealso cref="OnEnable"/>
protected virtual void OnDisable()
{
DemonstrationWriters.Clear();

}
/// <summary>
/// Updates the Model for the agent. Any model currently assigned to the
/// agent will be replaced with the provided one. If the arguments are
/// identical to the current parameters of the agent, the model will
/// remain unchanged.
/// Updates the Model assigned to this Agent instance.
/// <remarks>
/// If the agent already has an assigned model, that model is replaced with the
/// the provided one. However, if you call this function with arguments that are
/// identical to the current parameters of the agent, then no changes are made.
///
/// **Note:** the <paramref name="behaviorName"/> parameter is ignored when not training.
/// The <paramref name="model"/> and <paramref name="inferenceDevice"/> parameters
/// are ignored when not using inference.
/// </remarks>
/// <param name = "inferenceDevice"> Define on what device the model
/// <param name = "inferenceDevice"> Define the device on which the model
/// will be run.</param>
public void SetModel(
string behaviorName,

/// Overrides the current step reward of the agent and updates the episode
/// reward accordingly.
/// </summary>
/// <remarks>
/// This function replaces any rewards given to the agent during the current step.
/// Use <see cref="AddReward(float)"/> to incrementally change the reward rather than
/// overriding it.
///
/// Typically, you assign rewards in the Agent subclass's <see cref="OnActionReceived(float[])"/>
/// implementation after carrying out the received action and evaluating its success.
///
/// Rewards are used during reinforcement learning; they are ignored during inference.
///
/// See [Agents - Rewards] for general advice on implementing rewards and [Reward Signals]
/// for information about mixing reward signals from curiosity and Generative Adversarial
/// Imitation Learning (GAIL) with rewards supplied through this method.
///
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Reward-Signals.md
/// </remarks>
/// <param name="reward">The new value of the reward.</param>
public void SetReward(float reward)
{

/// <summary>
/// Increments the step and episode rewards by the provided value.
/// </summary>
/// <remarks>Use a positive reward to reinforce desired behavior. You can use a
/// negative reward to penalize mistakes. Use <seealso cref="SetReward(float)"/> to
/// set the reward assigned to the current step with a specific value rather than
/// increasing or decreasing it.
///
/// Typically, you assign rewards in the Agent subclass's <see cref="OnActionReceived(float[])"/>
/// implementation after carrying out the received action and evaluating its success.
///
/// Rewards are used during reinforcement learning; they are ignored during inference.
///
/// See [Agents - Rewards] for general advice on implementing rewards and [Reward Signals]
/// for information about mixing reward signals from curiosity and Generative Adversarial
/// Imitation Learning (GAIL) with rewards supplied through this method.
///
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Reward-Signals.md
///</remarks>
/// <param name="increment">Incremental reward value.</param>
public void AddReward(float increment)
{

}
/// <summary>
/// Sets the done flag to true.
/// Sets the done flag to true and resets the agent.
/// <seealso cref="OnEpisodeBegin"/>
public void EndEpisode()
{
NotifyAgentDone(DoneReason.DoneCalled);

/// <summary>
/// Is called when the agent must request the brain for a new decision.
/// Requests a new decision for this agent.
/// <remarks>
/// Call `RequestDecision()` whenever an agent needs a decision. You often
/// want to request a decision every environment step. However, if an agent
/// cannot use the decision every step, then you can request a decision less
/// frequently.
///
/// You can add a <seealso cref="DecisionRequester"/> component to the agent's
/// [GameObject] to drive the agent's decision making. When you use this component,
/// do not call `RequestDecision()` separately.
///
/// Note that this function calls <seealso cref="RequestAction"/>; you do not need to
/// call both functions at the same time.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
public void RequestDecision()
{
m_RequestDecision = true;

/// <summary>
/// Is called then the agent must perform a new action.
/// Requests an action for this agent.
/// <remarks>
/// Call `RequestAction()` to repeat the previous action returned by the agent's
/// most recent decision. A new decision is not requested. When you call this function,
/// the Agent instance invokes <seealso cref="OnActionReceived(float[])"/> with the
/// existing action vector.
///
/// You can use `RequestAction()` in situations where an agent must take an action
/// every update, but doesn't need to make a decision as often. For example, an
/// agent that moves through its environment might need to apply an action to keep
/// moving, but only needs to make a decision to change course or speed occasionally.
///
/// You can add a <seealso cref="DecisionRequester"/> component to the agent's
/// [GameObject] to drive the agent's decision making and action frequency. When you
/// use this component, do not call `RequestAction()` separately.
///
/// Note that <seealso cref="RequestDecision"/> calls `RequestAction()`; you do not need to
/// call both functions at the same time.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
public void RequestAction()
{
m_RequestAction = true;

}
/// <summary>
/// Initializes the agent, called once when the agent is enabled. Can be
/// left empty if there is no special, unique set-up behavior for the
/// agent.
/// Implement `Initialize()` to perform one-time initialization or set up of the
/// Agent instance.
/// One sample use is to store local references to other objects in the
/// scene which would facilitate computing this agents observation.
/// `Initialize()` is called once when the agent is first enabled. If, for example,
/// the Agent object needs references to other [GameObjects] in the scene, you
/// can collect and store those references here.
///
/// Note that <seealso cref="OnEpisodeBegin"/> is called at the start of each of
/// the agent's "episodes". You can use that function for items that need to be reset
/// for each episode.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// When the Agent uses Heuristics, it will call this method every time it
/// needs an action. This can be used for debugging or controlling the agent
/// with keyboard. This can also be useful to record demonstrations for imitation learning.
/// Implement `Heuristic()` to choose an action for this agent using a custom heuristic.
/// <param name="actionsOut">An array corresponding to the next action of the Agent</param>
/// <remarks>
/// Implement this function to provide custom decision making logic or to support manual
/// control of an agent using keyboard, mouse, or game controller input.
///
/// Your heuristic implementation can use any decision making logic you specify. Assign decision
/// values to the float[] array, <paramref cref="actionsOut"/>, passed to your function as a parameter.
/// Add values to the array at the same indexes as they are used in your
/// <seealso cref="OnActionReceived(float[])"/> function, which receives this array and
/// implements the corresponding agent behavior. See [Actions] for more information
/// about agent actions.
///
/// An agent calls this `Heuristic()` function to make a decision when you set its behavior
/// type to <see cref="BehaviorType.HeuristicOnly"/>. The agent also calls this function if
/// you set its behavior type to <see cref="BehaviorType.Default"/> when the
/// <see cref="Academy"/> is not connected to an external training process and you do not
/// assign a trained model to the agent.
///
/// To perform imitation learning, implement manual control of the agent in the `Heuristic()`
/// function so that you can record the demonstrations required for the imitation learning
/// algorithms. (Attach a [Demonstration Recorder] component to the agent's [GameObject] to
/// record the demonstration session to a file.)
///
/// Even when you don’t plan to use heuristic decisions for an agent or imitation learning,
/// implementing a simple heuristic function can aid in debugging agent actions and interactions
/// with its environment.
///
/// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Training-Imitation-Learning.md#recording-demonstrations
/// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#actions
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
/// <example>
/// The following example illustrates a `Heuristic()` function that provides WASD-style
/// keyboard control for an agent that can move in two dimensions as well as jump. See
/// [Input Manager] for more information about the built-in Unity input functions.
/// You can also use the [Input System package], which provides a more flexible and
/// configurable input system.
/// <code>
/// public override void Heuristic(float[] actionsOut)
/// {
/// actionsOut[0] = Input.GetAxis("Horizontal");
/// actionsOut[1] = Input.GetKey(KeyCode.Space) ? 1.0f : 0.0f;
/// actionsOut[2] = Input.GetAxis("Vertical");
/// }
/// </code>
/// [Input Manager]: https://docs.unity3d.com/Manual/class-InputManager.html
/// [Input System package]: https://docs.unity3d.com/Packages/com.unity.inputsystem@1.0/manual/index.html
/// </example>
/// <seealso cref="OnActionReceived(float[])"/>
public virtual void Heuristic(float[] actionsOut)
{
Debug.LogWarning("Heuristic method called but not implemented. Returning placeholder actions.");

}
/// <summary>
/// Collects the vector observations of the agent.
/// The agent observation describes the current environment from the
/// perspective of the agent.
/// Implement `CollectObservations()` to collect the vector observations of
/// the agent for the step. The agent observation describes the current
/// environment from the perspective of the agent.
/// An agents observation is any environment information that helps
/// the Agent achieve its goal. For example, for a fighting Agent, its
/// An agent's observation is any environment information that helps
/// the agent achieve its goal. For example, for a fighting agent, its
/// Recall that an Agent may attach vector or visual observations.
/// Vector observations are added by calling the provided helper methods
/// on the VectorSensor input:
///
/// You can use a combination of vector, visual, and raycast observations for an
/// agent. If you only use visual or raycast observations, you do not need to
/// implement a `CollectObservations()` function.
///
/// Add vector observations to the <paramref name="sensor"/> parameter passed to
/// this method by calling the <seealso cref="VectorSensor"/> helper methods:
/// - <see cref="VectorSensor.AddObservation(int)"/>
/// - <see cref="VectorSensor.AddObservation(float)"/>
/// - <see cref="VectorSensor.AddObservation(Vector3)"/>

/// - <see cref="VectorSensor.AddObservation(IEnumerable{float})"/>
/// - <see cref="VectorSensor.AddOneHotObservation(int, int)"/>
/// Depending on your environment, any combination of these helpers can
/// be used. They just need to be used in the exact same order each time
/// this method is called and the resulting size of the vector observation
/// needs to match the vectorObservationSize attribute of the linked Brain.
/// Visual observations are implicitly added from the cameras attached to
/// the Agent.
///
/// You can use any combination of these helper functions to build the agent's
/// vector of observations. You must build the vector in the same order
/// each time `CollectObservations()` is called and the length of the vector
/// must always be the same. In addition, the length of the observation must
/// match the <see cref="BrainParameters.vectorObservationSize"/>
/// attribute of the linked Brain, which is set in the Editor on the
/// **Behavior Parameters** component attached to the agent's [GameObject].
///
/// For more information about observations, see [Observations and Sensors].
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
/// </remarks>
public virtual void CollectObservations(VectorSensor sensor)
{

}
/// <summary>
/// Collects the masks for discrete actions.
/// When using discrete actions, the agent will not perform the masked action.
/// Implement `CollectDiscreteActionMasks()` to collects the masks for discrete
/// actions. When using discrete actions, the agent will not perform the masked
/// action.
/// </summary>
/// <param name="actionMasker">
/// The action masker for the agent.

/// action by masking it with <see cref="DiscreteActionMasker.SetMask(int, IEnumerable{int})"/>
/// action by masking it with <see cref="DiscreteActionMasker.SetMask(int, IEnumerable{int})"/>.
///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Design-Agents.md#actions
/// <seealso cref="OnActionReceived(float[])"/>
/// Specifies the agent behavior at every step based on the provided
/// action.
/// Implement `OnActionReceived()` to specify agent behavior at every step, based
/// on the provided action.
/// <remarks>
/// An action is passed to this function in the form of an array vector. Your
/// implementation must use the array to direct the agent's behavior for the
/// current step.
///
/// You decide how many elements you need in the action array to control your
/// agent and what each element means. For example, if you want to apply a
/// force to move an agent around the environment, you can arbitrarily pick
/// three values in the action array to use as the force components. During
/// training, the agent's policy learns to set those particular elements of
/// the array to maximize the training rewards the agent receives. (Of course,
/// if you implement a <seealso cref="Heuristic"/> function, it must use the same
/// elements of the action array for the same purpose since there is no learning
/// involved.)
///
/// Actions for an agent can be either *Continuous* or *Discrete*. Specify which
/// type of action space an agent uses, along with the size of the action array,
/// in the <see cref="BrainParameters"/> of the agent's associated
/// <see cref="BehaviorParameters"/> component.
///
/// When an agent uses the continuous action space, the values in the action
/// array are floating point numbers. You should clamp the values to the range,
/// -1..1, to increase numerical stability during training.
///
/// When an agent uses the discrete action space, the values in the action array
/// are integers that each represent a specific, discrete action. For example,
/// you could define a set of discrete actions such as:
///
/// <code>
/// 0 = Do nothing
/// 1 = Move one space left
/// 2 = Move one space right
/// 3 = Move one space up
/// 4 = Move one space down
/// </code>
///
/// When making a decision, the agent picks one of the five actions and puts the
/// corresponding integer value in the action vector. For example, if the agent
/// decided to move left, the action vector parameter would contain an array with
/// a single element with the value 1.
///
/// You can define multiple sets, or branches, of discrete actions to allow an
/// agent to perform simultaneous, independent actions. For example, you could
/// use one branch for movement and another branch for throwing a ball left, right,
/// up, or down, to allow the agent to do both in the same step.
///
/// The action vector of a discrete action space contains one element for each
/// branch. The value of each element is the integer representing the chosen
/// action for that branch. The agent always chooses one action for each
/// branch.
///
/// When you use the discrete action space, you can prevent the training process
/// or the neural network model from choosing specific actions in a step by
/// implementing the <see cref="CollectDiscreteActionMasks(DiscreteActionMasker)"/>
/// function. For example, if your agent is next to a wall, you could mask out any
/// actions that would result in the agent trying to move into the wall.
///
/// For more information about implementing agent actions see [Agents - Actions].
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// Vector action. Note that for discrete actions, the provided array
/// will be of length 1.
/// An array containing the action vector. The length of the array is specified
/// by the <see cref="BrainParameters"/> of the agent's associated
/// <see cref="BehaviorParameters"/> component.
/// Specifies the agent behavior when being reset, which can be due to
/// the agent or Academy being done (i.e. completion of local or global
/// episode).
/// Implement `OnEpisodeBegin()` to set up an Agent instance at the beginning
/// of an episode.
public virtual void OnEpisodeBegin(){}
/// <seealso cref="Initialize"/>
/// <seealso cref="EndEpisode"/>
public virtual void OnEpisodeBegin() {}
/// Returns the last action that was decided on by the Agent
/// Returns the last action that was decided on by the Agent.
/// The last action that was decided by the Agent (or null if no decision has been made)
/// The last action that was decided by the Agent (or null if no decision has been made).
/// <seealso cref="OnActionReceived(float[])"/>
public float[] GetAction()
{
return m_Action.vectorActions;

/// An internal reset method that updates internal data structures in
/// addition to calling <see cref="AgentReset"/>.
/// addition to calling <see cref="OnEpisodeBegin"/>.
/// </summary>
void _AgentReset()
{

/// <summary>
/// Scales continuous action from [-1, 1] to arbitrary range.
/// </summary>
/// <param name="rawAction"></param>
/// <param name="min"></param>
/// <param name="max"></param>
/// <returns></returns>
/// <param name="rawAction">The input action value.</param>
/// <param name="min">The minimum output value.</param>
/// <param name="max">The maximum output value.</param>
/// <returns>The <paramref name="rawAction"/> scaled from [-1,1] to
/// [<paramref name="min"/>, <paramref name="max"/>].</returns>
protected static float ScaleAction(float rawAction, float min, float max)
{
var middle = (min + max) / 2;

/// <summary>
/// Signals the agent that it must sent its decision to the brain.
/// Signals the agent that it must send its decision to the brain.
/// </summary>
void SendInfo()
{

4
com.unity.ml-agents/Runtime/Communicator/ICommunicator.cs


internal struct UnityRLInitParameters
{
/// <summary>
/// An RNG seed sent from the python process to Unity.
/// A random number generator (RNG) seed sent from the python process to Unity.
/// </summary>
public int seed;

}
/// <summary>
/// Delegate for handling quite events sent back from the communicator.
/// Delegate for handling quit events sent back from the communicator.
/// </summary>
internal delegate void QuitCommandHandler();

13
com.unity.ml-agents/Runtime/DecisionRequester.cs


namespace MLAgents
{
/// <summary>
/// A component that when attached to an Agent will automatically request decisions from it
/// at regular intervals.
/// The DecisionRequester component automatically request decisions for an
/// <see cref="Agent"/> instance at regular intervals.
/// <remarks>
/// Attach a DecisionRequester component to the same [GameObject] as the
/// <see cref="Agent"/> component.
///
/// The DecisionRequester component provides a convenient and flexible way to
/// trigger the agent decision making process. Without a DecisionRequester,
/// your <see cref="Agent"/> implmentation must manually call its
/// <seealso cref="Agent.RequestDecision"/> function.
/// </remarks>
[AddComponentMenu("ML Agents/Decision Requester", (int)MenuGroup.Default)]
[RequireComponent(typeof(Agent))]
public class DecisionRequester : MonoBehaviour

13
com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs


namespace MLAgents.Demonstrations
{
/// <summary>
/// Demonstration Recorder Component.
/// The Demonstration Recorder component facilitates the recording of demonstrations
/// used for imitation learning.
/// <remarks>Add this component to the [GameObject] containing an <see cref="Agent"/>
/// to enable recording the agent for imitation learning. You must implement the
/// <see cref="Agent.Heuristic"/> function of the agent to provide manual control
/// in order to record demonstrations.
///
/// See [Imitation Learning - Recording Demonstrations] for more information.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Training-Imitation-Learning.md#recording-demonstrations
/// </remarks>
[RequireComponent(typeof(Agent))]
[AddComponentMenu("ML Agents/Demonstration Recorder", (int)MenuGroup.Default)]
public class DemonstrationRecorder : MonoBehaviour

3
com.unity.ml-agents/Runtime/Demonstrations/DemonstrationWriter.cs


/// <summary>
/// Responsible for writing demonstration data to stream (typically a file stream).
/// </summary>
/// <seealso cref="DemonstrationRecorder"/>
/// Number of bytes reserved for the Demonstration metadata at the start of the demo file.
/// Number of bytes reserved for the <see cref="Demonstration"/> metadata at the start of the demo file.
/// </summary>
internal const int MetaDataBytes = 32;

39
com.unity.ml-agents/Runtime/DiscreteActionMasker.cs


namespace MLAgents
{
/// <summary>
/// The DiscreteActionMasker class represents a set of masked (disallowed) actions and
/// provides utilities for setting and retrieving them.
/// </summary>
/// <remarks>
/// may be illegal (e.g. the King in Chess taking a move to the left if it is already in the
/// left side of the board). This class represents the set of masked actions and provides
/// the utilities for setting and retrieving them.
/// </summary>
/// may be illegal. For example, if an agent is adjacent to a wall or other obstacle
/// you could mask any actions that direct the agent to move into the blocked space.
/// </remarks>
public class DiscreteActionMasker
{
/// When using discrete control, is the starting indices of the actions

}
/// <summary>
/// Modifies an action mask for discrete control agents. When used, the agent will not be
/// able to perform the actions passed as argument at the next decision for the specified
/// action branch. The actionIndices correspond to the action options the agent will
/// be unable to perform.
/// Modifies an action mask for discrete control agents.
/// <param name="branch">The branch for which the actions will be masked</param>
/// <param name="actionIndices">The indices of the masked actions</param>
/// <remarks>
/// When used, the agent will not be able to perform the actions passed as argument
/// at the next decision for the specified action branch. The actionIndices correspond
/// to the action options the agent will be unable to perform.
///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/0.15.1/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <param name="branch">The branch for which the actions will be masked.</param>
/// <param name="actionIndices">The indices of the masked actions.</param>
public void SetMask(int branch, IEnumerable<int> actionIndices)
{
// If the branch does not exist, raise an error

}
/// <summary>
/// Get the current mask for an agent
/// Get the current mask for an agent.
/// </summary>
/// <returns>A mask for the agent. A boolean array of length equal to the total number of
/// actions.</returns>

}
/// <summary>
/// Resets the current mask for an agent
/// Resets the current mask for an agent.
/// </summary>
internal void ResetMask()
{

}
/// <summary>
/// Checks if all the actions in the input branch are masked
/// Checks if all the actions in the input branch are masked.
/// <param name="branch"> The index of the branch to check</param>
/// <returns> True if all the actions of the branch are masked</returns>
/// <param name="branch"> The index of the branch to check.</param>
/// <returns> True if all the actions of the branch are masked.</returns>
bool AreAllActionsMasked(int branch)
{
if (m_CurrentMask == null)

5
com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs


/// <summary>
/// The Factory to generate policies.
/// A component for setting an <seealso cref="Agent"/> instance's behavior and
/// brain properties.
/// <remarks>At runtime, this component generates the agent's policy objects
/// according to the settings you specified in the Editor.</remarks>
[AddComponentMenu("ML Agents/Behavior Parameters", (int)MenuGroup.Default)]
public class BehaviorParameters : MonoBehaviour
{

27
com.unity.ml-agents/Runtime/Policies/BrainParameters.cs


}
/// <summary>
/// Holds information about the Brain. It defines what are the inputs and outputs of the
/// Holds information about the brain. It defines what are the inputs and outputs of the
/// <remarks>
/// Set brain parameters for an <see cref="Agent"/> instance using the
/// <seealso cref="BehaviorParameters"/> component attached to the agent's [GameObject].
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
/// If continuous : The length of the float vector that represents the state.
/// If discrete : The number of possible values the state can take.
/// The size of the observation space.
/// <remarks>An agent creates the observation vector in its
/// <see cref="Agent.CollectObservations(Sensors.VectorSensor)"/>
/// implementation.</remarks>
/// <value>
/// The length of the vector containing observation values.
/// </value>
public int vectorObservationSize = 1;
/// <summary>

[Range(1, 50)] public int numStackedVectorObservations = 1;
/// <summary>
/// If continuous : The length of the float vector that represents the action.
/// If discrete : The number of possible values the action can take.
/// The size of the action space.
/// <remarks>The size specified is interpreted differently depending on whether
/// the agent uses the continuous or the discrete action space.</remarks>
/// <value>
/// For the continuous action space: the length of the float vector that represents
/// the action.
/// For the discrete action space: the number of branches in the action space.
/// </value>
public int[] vectorActionSize = new[] {1};
/// <summary>

17
com.unity.ml-agents/Documentation~/filter.yml


apiRules:
- exclude:
uidRegex: ^MLAgents\.Tests$
type: Namespace
- exclude:
uidRegex: ^MLAgents\.CommunicatorObjects$
type: Namespace
- exclude:
uidRegex: ^MLAgents\.Tests\.Communicator$
type: Namespace
- exclude:
uidRegex: ^MLAgents\.Editor$
type: Namespace
- exclude:
uidRegex: ^MLAgentsExamples$
type: Namespace
正在加载...
取消
保存