Documentation improvements to ML-Agents overview

- Incorporated feedback provided offline - Fixed capitalizations of Agent/agent - Re-organized trainers and features sections (renamed files accordingly) - Change Agent Editor (code) to ODD feature - Added a summary and next steps section
7 年前 · 5940c478
--- a/docs/Background-Machine-Learning.md
+++ b/docs/Background-Machine-Learning.md
 # Background: Machine Learning

-**Work In Progress**
-
-We will not attempt to provide a thorough treatment of machine learning
-as there are fantastic resources online. However, given that a number
-of users of ML-Agents might not have a formal machine learning background,
-this section provides an overview of terminology to facilitate the
-understanding of ML-Agents.
+Given that a number of users of ML-Agents might not have a formal machine 
+learning background, this page provides an overview to facilitate the 
+understanding of ML-Agents. However, We will not attempt to provide a thorough 
+treatment of machine learning as there are fantastic resources online.
-Machine learning, a branch of artificial intelligence, focuses on learning patterns 
-from data. The three main classes of machine learning algorithms include: 
-unsupervised learning, supervised learning and reinforcement learning. 
-Each class of algorithm learns from a different type of data. The following paragraphs 
-provide an overview for each of these classes of machine learning, as well as introductory examples. 
+Machine learning, a branch of artificial intelligence, focuses on learning 
+patterns from data. The three main classes of machine learning algorithms
+include: unsupervised learning, supervised learning and reinforcement learning. 
+Each class of algorithm learns from a different type of data. The following 
+paragraphs provide an overview for each of these classes of machine learning, 
+as well as introductory examples. 

 ## Unsupervised Learning

--- a/docs/Background-Unity.md
+++ b/docs/Background-Unity.md
 [Unity Manual](https://docs.unity3d.com/Manual/index.html) and
 [Tutorials page](https://unity3d.com/learn/tutorials). The 
 [Roll-a-ball tutorial](https://unity3d.com/learn/tutorials/s/roll-ball-tutorial)
-is sufficient to learn all the basic concepts of Unity to get started with
-ML-Agents: 
+is a fantastic resource to learn all the basic concepts of Unity to get started
+with ML-Agents: 
 * [Editor](https://docs.unity3d.com/Manual/UsingTheEditor.html)
 * [Interface](https://docs.unity3d.com/Manual/LearningtheInterface.html)
 * [Scene](https://docs.unity3d.com/Manual/CreatingScenes.html)
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
 pages that include overviews and helpful resources on the 
 [Unity Engine](Background-Unity.md), 
 [machine learning](Background-Machine-Learning.md) and 
-[TensorFlow](Background-TensorFlow.md).
+[TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing
+the relevant background pages if you're not familiar with a Unity scene, 
+basic machine learning concepts or have not previously heard of TensorFlow.

 The remainder of this page contains a deep dive into ML-Agents, its key
 components, different training modes and scenarios. By the end of it, you
 are communicated to the agent, so they need to be set up in a manner where
 maximizing reward generates the desired optimal behavior.

-After defining these three entities (called a **reinforcement learning task**), 
+After defining these three entities (the building blocks of a 
+**reinforcement learning task**), 
 we can now _train_ the medic's behavior. This is achieved by simulating the
 environment for many trials where the medic, over time, learns what is the 
 optimal action to take for every observation it measures by maximizing 

 The Learning Environment contains three additional components that help
 organize the Unity scene: 
-* **Agents** - which is attached to each agent and handles generating its
-observations, performing the actions it receives and assigning a reward
-(positive / negative) when appropriate. Each Agent is linked to exactly one 
-Brain.
+* **Agents** - which is attached to a Unity GameObject (any character within a 
+scene) and handles generating its observations, performing the actions it 
+receives and assigning a reward (positive / negative) when appropriate. 
+Each Agent is linked to exactly one Brain.
-In essence, the Brain is what holds on to the policy for each agent and
-determines which actions the agent should take at each instance. More
+In essence, the Brain is what holds on to the policy for each Agent and
+determines which actions the Agent should take at each instance. More
 specifically, it is the component that receives the observations and rewards
 from the Agent and returns an action. 
 * **Academy** - which orchestrates the observation and decision making process.

 Every Learning Environment will always have one global Academy and one Agent
 for every character in the scene. While each Agent must be linked to a Brain,
-it is possible for Agent that have similar observations and actions to be 
+it is possible for Agents that have similar observations and actions to be 
 linked to the same Brain. In our sample game, we have two teams each with
 their own medic. Thus we will have two Agents in our Learning Environment, 
 one for each medic, but both of these medics can be linked to the same Brain.
 while the Agents connected to it (in this case the medics) can each have
 their own, unique observation and action values. If we expanded our game 
 to include tank driver NPCs, then the Agent attached to those characters 
-cannot share a Brain with the Agent linked to the medics.
+cannot share a Brain with the Agent linked to the medics (medics and drivers 
+have different actions).

 <p align="center">
    <img src="images/learning_environment_example.png" 
 observations and rewards collected by the Brain are forwarded to the Python
 API through the External Communicator. The Python API then returns the
 corresponding action that needs to be taken by the Agent.
-* **Internal** - where decisions are made using an embedded TensorFlow model. 
+* **Internal** - where decisions are made using an embedded 
+[TensorFlow](Background-TensorFlow.md) model. 
 The embedded TensorFlow model represents a learned policy and the Brain 
 directly uses this model to determine the action for each Agent.
 * **Player** - where decisions are made using real input from a keyboard or 
 _An example of how a scene containing multiple Agents and Brains might be 
 configured._

-ML-Agents includes several
-[example environments](Learning-Environment-Examples.md) and a 
-[Making a new Learning Environment](Learning-Environment-Create-New.md) 
-tutorial to help you get started.
-
 ## Training Modes

 Given the flexibility of ML-Agents, there are a few ways in which training and

 The
 [Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md)
-tutorial covers this training mode with the Balance Ball sample environment.
+tutorial covers this training mode with the **Balance Ball** sample environment.

 ### Custom Training and Inference

 dynamically adjusted based on training progress.

 The [Curriculum Learning](Training-Curriculum-Learning.md)
-tutorial covers this training mode with the Wall Area sample environment.
+tutorial covers this training mode with the **Wall Area** sample environment.
-### Imitation Learning (coming soon)
+### Imitation Learning

 It is often more intuitive to simply demonstrate the behavior we
 want an agent to perform, rather than attempting to have it learn via
 will then use these pairs of observations and actions from the human player
 to learn a policy.

-The [Imitation Learning](Training-Imitation-Learning.md) tutorial covers this training
-mode with the **Anti-Graviator** sample environment.
-
-### Recurrent Neural Networks
-In some scenarios, agents must learn to remember the past in order to take the 
-best decision. When an agent only has partial observability of the environment, 
-keeping track of past observations can help the agent learn. We provide an 
-implementation of [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory) in 
-our trainers that enable the agent to store memories to be used in future steps.
-
-The [Training with LSTM](Training-LSTM.md) tutorial covers this feature and 
-the **Hallway** environment demonstrates its capabilities.
+The [Imitation Learning](Training-Imitation-Learning.md) tutorial covers this 
+training mode with the **Banana Collector** sample environment.

 ## Flexible Training Scenarios

 additional features which improve the flexibility and interpretability of the
 training process.

+* **On Demand Decision Making** - With ML-Agents it is possible to have agents 
+request decisions only when needed as opposed to requesting decisions at 
+every step of the environment. This enables training of turn based games, 
+games where agents 
+must react to events or games where agents can take actions of variable 
+duration. Switching between decision taking at every step and 
+on-demand-decision is one button click away. You can learn more about the 
+on-demand-decision feature [here](Feature-On-Demand-Decision.md).
+
+* **Memory-enhanced Agents** - In some scenarios, agents must learn to 
+remember the past in order to take the 
+best decision. When an agent only has partial observability of the environment, 
+keeping track of past observations can help the agent learn. We provide an 
+implementation of _Long Short-term Memory_ 
+([LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory))
+in our trainers that enable the agent to store memories to be used in future 
+steps. You can learn more about enabling LSTM during training
+[here](Feature-Memory.md).
+
 * **Monitoring Agent’s Decision Making** - Since communication in ML-Agents
 is a two-way street, we provide an agent Monitor class in Unity which can
 display aspects of the trained agent, such as the agents perception on how
 learn to integrate information from multiple visual streams. This can be
 helpful in several scenarios such as training a self-driving car which requires 
 multiple cameras with different viewpoints, or a navigational agent which might 
-need to integrate aerial and first-person visuals.
+need to integrate aerial and first-person visuals. You can learn more about
+adding visual observations to an agent 
+[here](Learning-Environment-Design-Agents.md#visual-observations).
        
 * **Broadcasting** - As discussed earlier, an External Brain sends the
 observations for all its Agents to the Python API by default. This is helpful 
 particularly when debugging agent behaviors. You can learn more about using 
 the broadcasting feature [here](Feature-Broadcasting.md).

-* **On Demand Decision** - With ML-Agents it is possible to have agents 
-request decisions only when needed as opposed to requesting decisions at 
-every step. This enables training of turn based games, games where agents 
-must react to events or games where agents can take actions of variable 
-duration. Switching between decision taking at every step and 
-on-demand-decision is one button click away. You can learn more about the 
-on-demand-decision feature [here](Learning-Environment-On-Demand-Decision)
+## Summary and Next Steps
+
+To briefly summarize: ML-Agents enables games and simulations built in Unity
+to serve as the platform for training intelligent agents. It is designed
+to enable a large variety of training modes and scenarios and comes packed
+with several features to enable researchers and developers to leverage
+(and enhance) machine learning within Unity.
+
+To help you use ML-Agents, we've created several in-depth tutorials 
+for [installing ML-Agents](Installation.md), 
+[getting started](Getting-Started-with-Balance-Ball.md) 
+with a sample Balance Ball environment (one of our many 
+[sample environments](Learning-Environment-Examples.md)) and 
+[making your own environment](Learning-Environment-Create-New.md).
+
--- a/docs/Readme.md
+++ b/docs/Readme.md
 * [Training with Proximal Policy Optimization](Training-PPO.md)
 * [Training with Curriculum Learning](Training-Curriculum-Learning.md)
 * [Training with Imitation Learning](Training-Imitation-Learning.md)
- * [Training with LSTM](Training-LSTM.md)
+ * [Training with LSTM](Feature-Memory.md)
 * [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
 * [Using TensorBoard to Observe Training](Using-Tensorboard.md)

--- a/unity-environment/Assets/ML-Agents/Editor/AgentEditor.cs
+++ b/unity-environment/Assets/ML-Agents/Editor/AgentEditor.cs
        EditorGUILayout.PropertyField(
            isODD,
            new GUIContent(
-                "On Demand Decision",
+                "On Demand Decisions",
                "If checked, you must manually request decisions."));
        if (!isODD.boolValue)
        {
--- a//docs/Learning-Environment-On-Demand-Decision.md
+++ b//docs/Learning-Environment-On-Demand-Decision.md
--- a//docs/Feature-Memory.md
+++ b//docs/Feature-Memory.md