浏览代码

Documentation improvements to ML-Agents overview

- Incorporated feedback provided offline
- Fixed capitalizations of Agent/agent
- Re-organized trainers and features sections (renamed files accordingly)
- Change Agent Editor (code) to ODD feature
- Added a summary and next steps section
/develop-generalizationTraining-TrainerController
Marwan Mattar 6 年前
当前提交
5940c478
共有 7 个文件被更改,包括 72 次插入55 次删除
  1. 22
      docs/Background-Machine-Learning.md
  2. 4
      docs/Background-Unity.md
  3. 97
      docs/ML-Agents-Overview.md
  4. 2
      docs/Readme.md
  5. 2
      unity-environment/Assets/ML-Agents/Editor/AgentEditor.cs
  6. 0
      /docs/Feature-On-Demand-Decision.md
  7. 0
      /docs/Feature-Memory.md

22
docs/Background-Machine-Learning.md


# Background: Machine Learning
**Work In Progress**
We will not attempt to provide a thorough treatment of machine learning
as there are fantastic resources online. However, given that a number
of users of ML-Agents might not have a formal machine learning background,
this section provides an overview of terminology to facilitate the
understanding of ML-Agents.
Given that a number of users of ML-Agents might not have a formal machine
learning background, this page provides an overview to facilitate the
understanding of ML-Agents. However, We will not attempt to provide a thorough
treatment of machine learning as there are fantastic resources online.
Machine learning, a branch of artificial intelligence, focuses on learning patterns
from data. The three main classes of machine learning algorithms include:
unsupervised learning, supervised learning and reinforcement learning.
Each class of algorithm learns from a different type of data. The following paragraphs
provide an overview for each of these classes of machine learning, as well as introductory examples.
Machine learning, a branch of artificial intelligence, focuses on learning
patterns from data. The three main classes of machine learning algorithms
include: unsupervised learning, supervised learning and reinforcement learning.
Each class of algorithm learns from a different type of data. The following
paragraphs provide an overview for each of these classes of machine learning,
as well as introductory examples.
## Unsupervised Learning

4
docs/Background-Unity.md


[Unity Manual](https://docs.unity3d.com/Manual/index.html) and
[Tutorials page](https://unity3d.com/learn/tutorials). The
[Roll-a-ball tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial)
is sufficient to learn all the basic concepts of Unity to get started with
ML-Agents:
is a fantastic resource to learn all the basic concepts of Unity to get started
with ML-Agents:
* [Editor](https://docs.unity3d.com/Manual/UsingTheEditor.html)
* [Interface](https://docs.unity3d.com/Manual/LearningtheInterface.html)
* [Scene](https://docs.unity3d.com/Manual/CreatingScenes.html)

97
docs/ML-Agents-Overview.md


pages that include overviews and helpful resources on the
[Unity Engine](Background-Unity.md),
[machine learning](Background-Machine-Learning.md) and
[TensorFlow](Background-TensorFlow.md).
[TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing
the relevant background pages if you're not familiar with a Unity scene,
basic machine learning concepts or have not previously heard of TensorFlow.
The remainder of this page contains a deep dive into ML-Agents, its key
components, different training modes and scenarios. By the end of it, you

are communicated to the agent, so they need to be set up in a manner where
maximizing reward generates the desired optimal behavior.
After defining these three entities (called a **reinforcement learning task**),
After defining these three entities (the building blocks of a
**reinforcement learning task**),
we can now _train_ the medic's behavior. This is achieved by simulating the
environment for many trials where the medic, over time, learns what is the
optimal action to take for every observation it measures by maximizing

The Learning Environment contains three additional components that help
organize the Unity scene:
* **Agents** - which is attached to each agent and handles generating its
observations, performing the actions it receives and assigning a reward
(positive / negative) when appropriate. Each Agent is linked to exactly one
Brain.
* **Agents** - which is attached to a Unity GameObject (any character within a
scene) and handles generating its observations, performing the actions it
receives and assigning a reward (positive / negative) when appropriate.
Each Agent is linked to exactly one Brain.
In essence, the Brain is what holds on to the policy for each agent and
determines which actions the agent should take at each instance. More
In essence, the Brain is what holds on to the policy for each Agent and
determines which actions the Agent should take at each instance. More
specifically, it is the component that receives the observations and rewards
from the Agent and returns an action.
* **Academy** - which orchestrates the observation and decision making process.

Every Learning Environment will always have one global Academy and one Agent
for every character in the scene. While each Agent must be linked to a Brain,
it is possible for Agent that have similar observations and actions to be
it is possible for Agents that have similar observations and actions to be
linked to the same Brain. In our sample game, we have two teams each with
their own medic. Thus we will have two Agents in our Learning Environment,
one for each medic, but both of these medics can be linked to the same Brain.

while the Agents connected to it (in this case the medics) can each have
their own, unique observation and action values. If we expanded our game
to include tank driver NPCs, then the Agent attached to those characters
cannot share a Brain with the Agent linked to the medics.
cannot share a Brain with the Agent linked to the medics (medics and drivers
have different actions).
<p align="center">
<img src="images/learning_environment_example.png"

observations and rewards collected by the Brain are forwarded to the Python
API through the External Communicator. The Python API then returns the
corresponding action that needs to be taken by the Agent.
* **Internal** - where decisions are made using an embedded TensorFlow model.
* **Internal** - where decisions are made using an embedded
[TensorFlow](Background-TensorFlow.md) model.
The embedded TensorFlow model represents a learned policy and the Brain
directly uses this model to determine the action for each Agent.
* **Player** - where decisions are made using real input from a keyboard or

_An example of how a scene containing multiple Agents and Brains might be
configured._
ML-Agents includes several
[example environments](Learning-Environment-Examples.md) and a
[Making a new Learning Environment](Learning-Environment-Create-New.md)
tutorial to help you get started.
## Training Modes
Given the flexibility of ML-Agents, there are a few ways in which training and

The
[Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md)
tutorial covers this training mode with the Balance Ball sample environment.
tutorial covers this training mode with the **Balance Ball** sample environment.
### Custom Training and Inference

dynamically adjusted based on training progress.
The [Curriculum Learning](Training-Curriculum-Learning.md)
tutorial covers this training mode with the Wall Area sample environment.
tutorial covers this training mode with the **Wall Area** sample environment.
### Imitation Learning (coming soon)
### Imitation Learning
It is often more intuitive to simply demonstrate the behavior we
want an agent to perform, rather than attempting to have it learn via

will then use these pairs of observations and actions from the human player
to learn a policy.
The [Imitation Learning](Training-Imitation-Learning.md) tutorial covers this training
mode with the **Anti-Graviator** sample environment.
### Recurrent Neural Networks
In some scenarios, agents must learn to remember the past in order to take the
best decision. When an agent only has partial observability of the environment,
keeping track of past observations can help the agent learn. We provide an
implementation of [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory) in
our trainers that enable the agent to store memories to be used in future steps.
The [Training with LSTM](Training-LSTM.md) tutorial covers this feature and
the **Hallway** environment demonstrates its capabilities.
The [Imitation Learning](Training-Imitation-Learning.md) tutorial covers this
training mode with the **Banana Collector** sample environment.
## Flexible Training Scenarios

additional features which improve the flexibility and interpretability of the
training process.
* **On Demand Decision Making** - With ML-Agents it is possible to have agents
request decisions only when needed as opposed to requesting decisions at
every step of the environment. This enables training of turn based games,
games where agents
must react to events or games where agents can take actions of variable
duration. Switching between decision taking at every step and
on-demand-decision is one button click away. You can learn more about the
on-demand-decision feature [here](Feature-On-Demand-Decision.md).
* **Memory-enhanced Agents** - In some scenarios, agents must learn to
remember the past in order to take the
best decision. When an agent only has partial observability of the environment,
keeping track of past observations can help the agent learn. We provide an
implementation of _Long Short-term Memory_
([LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory))
in our trainers that enable the agent to store memories to be used in future
steps. You can learn more about enabling LSTM during training
[here](Feature-Memory.md).
* **Monitoring Agent’s Decision Making** - Since communication in ML-Agents
is a two-way street, we provide an agent Monitor class in Unity which can
display aspects of the trained agent, such as the agents perception on how

learn to integrate information from multiple visual streams. This can be
helpful in several scenarios such as training a self-driving car which requires
multiple cameras with different viewpoints, or a navigational agent which might
need to integrate aerial and first-person visuals.
need to integrate aerial and first-person visuals. You can learn more about
adding visual observations to an agent
[here](Learning-Environment-Design-Agents.md#visual-observations).
* **Broadcasting** - As discussed earlier, an External Brain sends the
observations for all its Agents to the Python API by default. This is helpful

particularly when debugging agent behaviors. You can learn more about using
the broadcasting feature [here](Feature-Broadcasting.md).
* **On Demand Decision** - With ML-Agents it is possible to have agents
request decisions only when needed as opposed to requesting decisions at
every step. This enables training of turn based games, games where agents
must react to events or games where agents can take actions of variable
duration. Switching between decision taking at every step and
on-demand-decision is one button click away. You can learn more about the
on-demand-decision feature [here](Learning-Environment-On-Demand-Decision)
## Summary and Next Steps
To briefly summarize: ML-Agents enables games and simulations built in Unity
to serve as the platform for training intelligent agents. It is designed
to enable a large variety of training modes and scenarios and comes packed
with several features to enable researchers and developers to leverage
(and enhance) machine learning within Unity.
To help you use ML-Agents, we've created several in-depth tutorials
for [installing ML-Agents](Installation.md),
[getting started](Getting-Started-with-Balance-Ball.md)
with a sample Balance Ball environment (one of our many
[sample environments](Learning-Environment-Examples.md)) and
[making your own environment](Learning-Environment-Create-New.md).

2
docs/Readme.md


* [Training with Proximal Policy Optimization](Training-PPO.md)
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with LSTM](Training-LSTM.md)
* [Training with LSTM](Feature-Memory.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)

2
unity-environment/Assets/ML-Agents/Editor/AgentEditor.cs


EditorGUILayout.PropertyField(
isODD,
new GUIContent(
"On Demand Decision",
"On Demand Decisions",
"If checked, you must manually request decisions."));
if (!isODD.boolValue)
{

/docs/Learning-Environment-On-Demand-Decision.md → /docs/Feature-On-Demand-Decision.md

/docs/Training-LSTM.md → /docs/Feature-Memory.md

正在加载...
取消
保存