Documentation 0.5 Release Check List (Part 1) (#1154)

6 年前 · 2cd8e250
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md

 ## Attribution

-This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
-available at https://www.contributor-covenant.org/version/1/4/code-of-conduct/
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 1.4, available at
+https://www.contributor-covenant.org/version/1/4/code-of-conduct/

 [homepage]: https://www.contributor-covenant.org
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
 # Contribution Guidelines

-Thank you for your interest in contributing to the ML-Agents toolkit! We are incredibly
-excited to see how members of our community will use and extend the ML-Agents toolkit.
-To facilitate your contributions, we've outlined a brief set of guidelines
-to ensure that your extensions can be easily integrated.
+Thank you for your interest in contributing to the ML-Agents toolkit! We are
+incredibly excited to see how members of our community will use and extend the
+ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set
+of guidelines to ensure that your extensions can be easily integrated.
-### Communication
+## Communication
-First, please read through our [code of conduct](CODE_OF_CONDUCT.md), 
-as we expect all our contributors to follow it.
+First, please read through our [code of conduct](CODE_OF_CONDUCT.md), as we
+expect all our contributors to follow it.
-Second, before starting on a project that you intend to contribute
-to the ML-Agents toolkit (whether environments or modifications to the codebase), 
-we **strongly** recommend posting on our 
-[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) and
-briefly outlining the changes you plan to make. This will enable us to provide
-some context that may be helpful for you. This could range from advice and 
-feedback on how to optimally perform your changes or reasons for not doing it.
+Second, before starting on a project that you intend to contribute to the
+ML-Agents toolkit (whether environments or modifications to the codebase), we
+**strongly** recommend posting on our
+[Issues page](https://github.com/Unity-Technologies/ml-agents/issues)
+and briefly outlining the changes you plan to make. This will enable us to
+provide some context that may be helpful for you. This could range from advice
+and feedback on how to optimally perform your changes or reasons for not doing
+it.
-### Git Branches
+## Git Branches
-Starting with v0.3, we adopted the 
+Starting with v0.3, we adopted the
-Consequently, the `master` branch corresponds to the latest release of 
+Consequently, the `master` branch corresponds to the latest release of
+
-* Corresponding changes to documentation, unit tests and sample environments 
-(if applicable)
+* Corresponding changes to documentation, unit tests and sample environments (if
+  applicable)
-### Environments
+## Environments
-We are also actively open to adding community contributed environments as 
-examples, as long as they are small, simple, demonstrate a unique feature of 
-the platform, and provide a unique non-trivial challenge to modern 
+We are also actively open to adding community contributed environments as
+examples, as long as they are small, simple, demonstrate a unique feature of
+the platform, and provide a unique non-trivial challenge to modern
-PR explaining the nature of the environment and task. 
+PR explaining the nature of the environment and task.
-### Style Guide
+## Style Guide
-When performing changes to the codebase, ensure that you follow the style
-guide of the file you're modifying. For Python, we follow 
-[PEP 8](https://www.python.org/dev/peps/pep-0008/). For C#, we will soon be
-adding a formal style guide for our repository.
+When performing changes to the codebase, ensure that you follow the style guide
+of the file you're modifying. For Python, we follow
+[PEP 8](https://www.python.org/dev/peps/pep-0008/).
+For C#, we will soon be adding a formal style guide for our repository.
--- a/README.md
+++ b/README.md

 # Unity ML-Agents Toolkit (Beta)

-**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source Unity plugin 
-that enables games and simulations to serve as environments for training
-intelligent agents. Agents can be trained using reinforcement learning,
-imitation learning, neuroevolution, or other machine learning methods through
-a simple-to-use Python API. We also provide implementations (based on
-TensorFlow) of state-of-the-art algorithms to enable game developers
-and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
-These trained agents can be used for multiple purposes, including
-controlling NPC behavior (in a variety of settings such as multi-agent and
-adversarial), automated testing of game builds and evaluating different game
-design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
-developers and AI researchers as it provides a central platform where advances
-in AI can be evaluated on Unity’s rich environments and then made accessible
-to the wider research and game developer communities. 
+**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source
+Unity plugin that enables games and simulations to serve as environments for
+training intelligent agents. Agents can be trained using reinforcement learning,
+imitation learning, neuroevolution, or other machine learning methods through a
+simple-to-use Python API. We also provide implementations (based on TensorFlow)
+of state-of-the-art algorithms to enable game developers and hobbyists to easily
+train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be
+used for multiple purposes, including controlling NPC behavior (in a variety of
+settings such as multi-agent and adversarial), automated testing of game builds
+and evaluating different game design decisions pre-release. The ML-Agents
+toolkit is mutually beneficial for both game developers and AI researchers as it
+provides a central platform where advances in AI can be evaluated on Unity’s
+rich environments and then made accessible to the wider research and game
+developer communities.
+
-* Train memory-enhanced Agents using deep reinforcement learning
+* Train memory-enhanced agents using deep reinforcement learning
-* Broadcasting of Agent behavior for supervised learning
+* Broadcasting of agent behavior for supervised learning
-* Flexible Agent control with On Demand Decision Making
+* Flexible agent control with On Demand Decision Making
-* For more information, in addition to installation and usage
-instructions, see our [documentation home](docs/Readme.md).
-* If you have
-used a version of the ML-Agents toolkit prior to v0.4, we strongly recommend 
-our [guide on migrating from earlier versions](docs/Migrating.md).
+* For more information, in addition to installation and usage instructions, see
+  our [documentation home](docs/Readme.md).
+* If you have used a version of the ML-Agents toolkit prior to v0.4, we strongly
+  recommend our [guide on migrating from earlier versions](docs/Migrating.md).
- Overviewing reinforcement learning concepts
-([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
-and [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
- [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
- [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/) announcing the winners of our
-[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
- [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
-overviewing how Unity can be leveraged as a simulator to design safer cities.
+
+* Overviewing reinforcement learning concepts
+  ([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
+  and
+  [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
+* [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
+* [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/)
+  announcing the winners of our
+  [first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
+* [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
+  overviewing how Unity can be leveraged as a simulator to design safer cities.
- [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
- [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
- [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
+
+* [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
+* [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
+* [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
-The ML-Agents toolkit is an open-source project and we encourage and welcome contributions.
-If you wish to contribute, be sure to review our 
-[contribution guidelines](CONTRIBUTING.md) and 
+The ML-Agents toolkit is an open-source project and we encourage and welcome
+contributions. If you wish to contribute, be sure to review our
+[contribution guidelines](CONTRIBUTING.md) and
+
-[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
-to connect with others using the ML-Agents toolkit and Unity developers enthusiastic
-about machine learning. We use that channel to surface updates
-regarding the ML-Agents toolkit (and, more broadly, machine learning in games).
-* If you run into any problems using the ML-Agents toolkit, 
-[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
-make sure to include as much detail as possible.
+  [Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
+  to connect with others using the ML-Agents toolkit and Unity developers
+  enthusiastic about machine learning. We use that channel to surface updates
+  regarding the ML-Agents toolkit (and, more broadly, machine learning in
+  games).
+* If you run into any problems using the ML-Agents toolkit,
+  [submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
+  make sure to include as much detail as possible.

 For any other questions or feedback, connect directly with the ML-Agents
 team at ml-agents@unity3d.com.
 translating more pages and to other languages. Consequently,
 we welcome any enhancements and improvements from the community.

- [Chinese](docs/localized/zh-CN/)
+* [Chinese](docs/localized/zh-CN/)

 ## License

--- a/docs/API-Reference.md
+++ b/docs/API-Reference.md
 # API Reference

 Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
-documented to be compatabile with
+documented to be compatible with
 [Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
 documentation.

--- a/docs/Background-TensorFlow.md
+++ b/docs/Background-TensorFlow.md
 performing computations using data flow graphs, the underlying representation of
 deep learning models. It facilitates training and inference on CPUs and GPUs in
 a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
-train the behavior of an Agent, the output is a TensorFlow model (.bytes) file
+train the behavior of an agent, the output is a TensorFlow model (.bytes) file
 that you can then embed within an Internal Brain. Unless you implement a new
 algorithm, the use of TensorFlow is mostly abstracted away and behind the
 scenes.
--- a/docs/Basic-Guide.md
+++ b/docs/Basic-Guide.md
 # Basic Guide

-This guide will show you how to use a pretrained model in an example Unity
+This guide will show you how to use a pre-trained model in an example Unity
 environment, and show you how to train the model yourself.

 If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
 In order to use the ML-Agents toolkit within Unity, you need to change some
 Unity settings first. Also [TensorFlowSharp
 plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
-is needed for you to use pretrained model within Unity, which is based on the
+is needed for you to use pre-trained model within Unity, which is based on the
 [TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).

 1. Launch Unity
 `None` if you want to interact with the current scene in the Unity Editor.

 More information and documentation is provided in the
-[Python API](../ml-agents/README.md) page.
+[Python API](Python-API.md) page.

 ## Training the Brain with Reinforcement Learning

-the brain used by the agents to **External**. This allows the agents to
+the Brain used by the Agents to **External**. This allows the Agents to
 communicate with the external training process when making their decisions.

 1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
 ### Training the environment

 1. Open a command or terminal window.
-2. Nagivate to the folder where you installed the ML-Agents toolkit.
+2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
+   **Note**: If you followed the default [installation](Installation.md), then
+   you should be able to run `mlagents-learn` from any directory.
-   Where:
+   where:
-      trainer configuration. The defaults used by environments in the ML-Agents
-      SDK can be found in `config/trainer_config.yaml`.
+      trainer configuration. The defaults used by example environments included
+      in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
-    - And the `--train` tells `mlagents-learn` to run a training session (rather
+    - `--train` tells `mlagents-learn` to run a training session (rather
-4. When the message _"Start training by pressing the Play button in the Unity
+4. If you cloned the ML-Agents repo, then you can simply run
+      ```sh
+      mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
+      ```
+5. When the message _"Start training by pressing the Play button in the Unity
   Editor"_ is displayed on the screen, you can press the :arrow_forward: button
   in Unity to start training in the Editor.

 '--train': True,
 '--worker-id': '0',
 '<trainer-config-path>': 'config/trainer_config.yaml'}
+INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
 ```

 **Note**: If you're using Anaconda, don't forget to activate the ml-agents
 like this:

 ```console
-INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
 INFO:mlagents.envs:
 'Ball3DAcademy' started successfully!
 Unity Academy name: Ball3DAcademy
 `models/<run-identifier>/editor_<academy_name>_<run-identifier>.bytes` where
 `<academy_name>` is the name of the Academy GameObject in the current scene.
 This file corresponds to your model's latest checkpoint. You can now embed this
-trained model into your internal brain by following the steps below, which is
+trained model into your Internal Brain by following the steps below, which is
 similar to the steps described
 [above](#play-an-example-environment-using-pretrained-model).

  page.
 - For a more detailed walk-through of our 3D Balance Ball environment, check out
  the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
- For a "Hello World" introduction to creating your own learning environment,
+- For a "Hello World" introduction to creating your own Learning Environment,
  check out the [Making a New Learning
  Environment](Learning-Environment-Create-New.md) page.
 - For a series of Youtube video tutorials, checkout the
--- a/docs/FAQ.md
+++ b/docs/FAQ.md

 ## TensorFlowSharp flag not turned on

-If you have already imported the TensorFlowSharp plugin, but havn't set
+If you have already imported the TensorFlowSharp plugin, but haven't set
-You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
+You need to install and enable the TensorFlowSharp plugin in order to use the Internal Brain.
 ```

 This error message occurs because the TensorFlowSharp plugin won't be usage
-## Tensorflow epsilon placeholder error
+## TensorFlow epsilon placeholder error
-If you have a graph placeholder set in the internal Brain inspector that is not
+If you have a graph placeholder set in the Internal Brain inspector that is not
-UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
+UnityAgentsException: One of the TensorFlow placeholder could not be found. In brain <some_brain_name>, there are no FloatingPoint placeholder named <some_placeholder_name>.
-Similarly, if you have a graph scope set in the internal Brain inspector that is
+Similarly, if you have a graph scope set in the Internal Brain inspector that is
 not correctly set, you will see some error like this:

 ```console
-Solution: Make sure your Graph Scope field matches the corresponding brain
-object name in your Hierachy Inspector when there is multiple brain.
+Solution: Make sure your Graph Scope field matches the corresponding Brain
+object name in your Hierarchy Inspector when there are multiple Brains.

 ## Environment Permission Error

 ## Mean reward : nan

 If you receive a message `Mean reward : nan` when attempting to train a model
-using PPO, this is due to the episodes of the learning environment not
+using PPO, this is due to the episodes of the Learning Environment not
 terminating. In order to address this, set `Max Steps` for either the Academy or
 Agents within the Scene Inspector to a value greater than 0. Alternatively, it
 is possible to manually set `done` conditions for episodes from within scripts
--- a/docs/Feature-Memory.md
+++ b/docs/Feature-Memory.md
-# Memory-enhanced Agents using Recurrent Neural Networks
+# Memory-enhanced agents using Recurrent Neural Networks

 ## What are memories for

--- a/docs/Feature-Monitor.md
+++ b/docs/Feature-Monitor.md

 You can track many different things both related and unrelated to the agents
 themselves. By default, the Monitor is only active in the *inference* phase, so
-not during training. To change this behaviour, you can activate or deactivate it
+not during training. To change this behavior, you can activate or deactivate it
 by calling `SetActive(boolean)`. For example to also show the monitor during
 training, you can call it in the `InitializeAcademy()` method of your `Academy`:

--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md

 This tutorial walks through the end-to-end process of opening a ML-Agents
 toolkit example environment in Unity, building the Unity executable, training an
-agent in it, and finally embedding the trained model into the Unity environment.
+Agent in it, and finally embedding the trained model into the Unity environment.

 The ML-Agents toolkit includes a number of [example
 environments](Learning-Environment-Examples.md) which you can examine to help
 This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
 contains a number of platforms and balls (which are all copies of each other).
 Each platform tries to keep its ball from falling by rotating either
-horizontally or vertically. In this environment, a platform is an **agent** that
+horizontally or vertically. In this environment, a platform is an **Agent** that
 receives a reward for every step that it balances the ball. An agent is also
 penalized with a negative reward for dropping the ball. The goal of the training
 process is to have the platforms learn to never drop the ball.

 The first thing you may notice after opening the 3D Balance Ball scene is that
 it contains not one, but several platforms.  Each platform in the scene is an
-independent agent, but they all share the same brain. 3D Balance Ball does this
+independent agent, but they all share the same Brain. 3D Balance Ball does this
 to speed up training since all twelve agents contribute to training in parallel.

 ### Academy
 and **Inference Configuration**  properties set the graphics and timescale
 properties for the Unity application. The Academy uses the **Training
 Configuration**  during training and the **Inference Configuration** when not
-training. (*Inference* means that the agent is using a trained model or
+training. (*Inference* means that the Agent is using a trained model or
 heuristics or direct control — in other words, whenever **not** training.)
 Typically, you set low graphics quality and a high time scale for the **Training
 configuration** and a high graphics quality and the timescale to `1.0` for the

 * Academy.InitializeAcademy() — Called once when the environment is launched.
 * Academy.AcademyStep() — Called at every simulation step before
-  Agent.AgentAction() (and after the agents collect their observations).
+  agent.AgentAction() (and after the Agents collect their observations).
-The 3D Balance Ball environment does not use these functions — each agent resets
+The 3D Balance Ball environment does not use these functions — each Agent resets
-environment around the agents.
+environment around the Agents.
-the Academy.) All the agents in the 3D Balance Ball environment use the same
-Brain instance. A Brain doesn't store any information about an agent, it just
-routes the agent's collected observations to the decision making process and
-returns the chosen action to the agent. Thus, all agents can share the same
-brain, but act independently. The Brain settings tell you quite a bit about how
-an agent works.
+the Academy.) All the Agents in the 3D Balance Ball environment use the same
+Brain instance. A Brain doesn't store any information about an Agent, it just
+routes the Agent's collected observations to the decision making process and
+returns the chosen action to the Agent. Thus, all Agents can share the same
+Brain, but act independently. The Brain settings tell you quite a bit about how
+an Agent works.
-The **Brain Type** determines how an agent makes its decisions. The **External**
+The **Brain Type** determines how an Agent makes its decisions. The **External**
-agents; use **Internal** when using the trained model. The **Heuristic** brain
-allows you to hand-code the agent's logic by extending the Decision class.
-Finally, the **Player** brain lets you map keyboard commands to actions, which
+Agents; use **Internal** when using the trained model. The **Heuristic** Brain
+allows you to hand-code the Agent's logic by extending the Decision class.
+Finally, the **Player** Brain lets you map keyboard commands to actions, which
-of brains do what you need, you can implement your own CoreBrain to create your
+of Brains do what you need, you can implement your own CoreBrain to create your
 own type.

 In this tutorial, you will set the **Brain Type** to **External** for training;

 The Brain instance used in the 3D Balance Ball example uses the **Continuous**
 vector observation space with a **State Size** of 8. This means that the feature
-vector containing the agent's observations contains eight elements: the `x` and
+vector containing the Agent's observations contains eight elements: the `x` and
-defined in the agent's `CollectObservations()` function.)
+defined in the Agent's `CollectObservations()` function.)
-An agent is given instructions from the brain in the form of *actions*.
+An Agent is given instructions from the Brain in the form of *actions*.
-element of the vector means is defined by the agent logic (the PPO training
+element of the vector means is defined by the Agent logic (the PPO training
-element might represent a force or torque applied to a `RigidBody` in the agent.
+element might represent a force or torque applied to a `Rigidbody` in the Agent.
-given to the agent is an array of indeces into tables.
+given to the Agent is an array of indices into tables.

 The 3D Balance Ball example is programmed to use both types of vector action
 space. You can try training with both settings to observe whether there is a
 Platform GameObjects. The base Agent object has a few properties that affect its
 behavior:

-* **Brain** — Every agent must have a Brain. The brain determines how an agent
-  makes decisions. All the agents in the 3D Balance Ball scene share the same
-  brain.
-* **Visual Observations** — Defines any Camera objects used by the agent to
+* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
+  makes decisions. All the Agents in the 3D Balance Ball scene share the same
+  Brain.
+* **Visual Observations** — Defines any Camera objects used by the Agent to
-* **Max Step** — Defines how many simulation steps can occur before the agent
-  decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
-* **Reset On Done** — Defines whether an agent starts over when it is finished.
-  3D Balance Ball sets this true so that the agent restarts after reaching the
+* **Max Step** — Defines how many simulation steps can occur before the Agent
+  decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
+* **Reset On Done** — Defines whether an Agent starts over when it is finished.
+  3D Balance Ball sets this true so that the Agent restarts after reaching the
-Perhaps the more interesting aspect of an agent is the Agent subclass
-implementation. When you create an agent, you must extend the base Agent class.
+Perhaps the more interesting aspect of an agents is the Agent subclass
+implementation. When you create an Agent, you must extend the base Agent class.
-* Agent.AgentReset() — Called when the Agent resets, including at the beginning
+* agent.AgentReset() — Called when the Agent resets, including at the beginning
-* Agent.CollectObservations() — Called every simulation step. Responsible for
-  collecting the agent's observations of the environment. Since the Brain
-  instance assigned to the agent is set to the continuous vector observation
+* agent.CollectObservations() — Called every simulation step. Responsible for
+  collecting the Agent's observations of the environment. Since the Brain
+  instance assigned to the Agent is set to the continuous vector observation
-* Agent.AgentAction() — Called every simulation step. Receives the action chosen
-  by the brain. The Ball3DAgent example handles both the continuous and the
+* agent.AgentAction() — Called every simulation step. Receives the action chosen
+  by the Brain. The Ball3DAgent example handles both the continuous and the
-  assigns a reward to the agent; in this example, an agent receives a small
+  assigns a reward to the Agent; in this example, an Agent receives a small
-  negative reward for dropping the ball. An agent is also marked as done when it
+  negative reward for dropping the ball. An Agent is also marked as done when it
  drops the ball so that it will reset with a new ball for the next simulation
  step.

 explaining it.

 To train the agents within the Ball Balance environment, we will be using the
-python package. We have provided a convenient script called `mlagents-learn`
+Python package. We have provided a convenient script called `mlagents-learn`
 which accepts arguments used to configure both training and inference phases.

 We can use `run_id` to identify the experiment and create a folder where the
 The `--train` flag tells the ML-Agents toolkit to run in training mode.

 **Note**: You can train using an executable rather than the Editor. To do so,
-follow the intructions in [Using an
-Executable](Learning-Environment-Executable.md).
+follow the intructions in
+[Using an Executable](Learning-Environment-Executable.md).

 ### Observing Training Progress


 Once the training process completes, and the training process saves the model
 (denoted by the `Saved Model` message) you can add it to the Unity project and
-use it with agents having an **Internal** brain type. **Note:** Do not just
+use it with Agents having an **Internal** Brain type. **Note:** Do not just
 close the Unity Window once the `Saved Model` message appears. Either wait for
 the training process to close the window or press Ctrl+C at the command-line
 prompt. If you simply close the window manually, the .bytes file containing the
 To embed the trained model into Unity, follow the later part of [Training the
 Brain with Reinforcement
 Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
-of the Basic Buides page.
+of the Basic Guide page.
--- a/docs/Glossary.md
+++ b/docs/Glossary.md
  logic should not be placed here.
 * **External Coordinator** - ML-Agents class responsible for communication with
  outside processes (in this case, the Python API).
-* **Trainer** - Python class which is responsible for training a given external
-  brain. Contains TensorFlow graph which makes decisions for external brain.
+* **Trainer** - Python class which is responsible for training a given External
+  Brain. Contains TensorFlow graph which makes decisions for External Brain.
--- a/docs/Installation-Windows.md
+++ b/docs/Installation-Windows.md

 Next, install `tensorflow`. Install this package using `pip` - which is a
 package management system used to install Python packages. Latest versions of
-Tensorflow won't work, so you will need to make sure that you install version
+TensorFlow won't work, so you will need to make sure that you install version
 1.7.1. In the same Anaconda Prompt, type in the following command _(make sure
 you are connected to the internet)_:


 Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an
 Anaconda Prompt with the Conda environment ml-agents activated, type in the
-following command to uninstall the tensorflow for cpu and install the tensorflow
+following command to uninstall TensorFlow for cpu and install TensorFlow
 for gpu _(make sure you are connected to the internet)_:

 ```sh
--- a/docs/Installation.md
+++ b/docs/Installation.md
       width="500" border="10" />
 </p>

-## Clone the Ml-Agents Repository
+## Clone the ML-Agents Toolkit Repository
-The `UnitySDK` directory in this repository contains the Unity Assets to add
-to your projects. The `python` directory contains python packages which provide
-trainers, a python API to interface with Unity, and a package to interface with
-OpenAI Gym.
+The `UnitySDK` subdirectory contains the Unity Assets to add to your projects.
+It also contains many [example environments](Learning-Environment-Examples.md)
+that can be used to help get you familiar with Unity.
+
+The `ml-agents` subdirectory contains Python packages which provide
+trainers and a Python API to interface with Unity.
+
+The `gym-unity` subdirectory contains a package to interface with OpenAI Gym.
-## Install Python (with Dependencies)
+## Install Python and mlagents Package

 In order to use ML-Agents toolkit, you need Python 3.6 along with the
 dependencies listed in the [requirements file](../ml-agents/requirements.txt).

 ### Mac and Unix Users

-[Download](https://www.python.org/downloads/) and install Python 3 if you do not
+[Download](https://www.python.org/downloads/) and install Python 3.6 if you do not
-If your Python environment doesn't include `pip`, see these
+If your Python environment doesn't include `pip3`, see these
-To install dependencies, enter the `ml-agents/` directory and run from
-the command line:
+To install the dependencies and `mlagents` Python package, enter the
+`ml-agents/` subdirectory and run from the command line:
+
+    pip3 install .
-    pip install .
+If you installed this correctly, you should be able to run
+`mlagents-learn --help`

 ## Docker-based Installation

--- a/docs/Learning-Environment-Best-Practices.md
+++ b/docs/Learning-Environment-Best-Practices.md
  ([learn more here](Training-Curriculum-Learning.md)).
 * When possible, it is often helpful to ensure that you can complete the task by
  using a Player Brain to control the agent.
-* It is often helpful to make many copies of the agent, and attach the brain to
-  be trained to all of these agents. In this way the brain can get more feedback
+* It is often helpful to make many copies of the agent, and attach the Brain to
+  be trained to all of these agents. In this way the Brain can get more feedback
  information from all of these agents, which helps it train faster.

 ## Rewards
--- a/docs/Learning-Environment-Create-New.md
+++ b/docs/Learning-Environment-Create-New.md

 This tutorial walks through the process of creating a Unity Environment. A Unity
 Environment is an application built using the Unity Engine which can be used to
-train Reinforcement Learning agents.
+train Reinforcement Learning Agents.

 ![A simple ML-Agents environment](images/mlagents-NewTutSplash.png)

   methods to update the scene independently of any agents. For example, you can
   add, move, or delete agents and other entities in the environment.
 3. Add one or more Brain objects to the scene as children of the Academy.
-4. Implement your Agent subclasses. An Agent subclass defines the code an agent
+4. Implement your Agent subclasses. An Agent subclass defines the code an Agent
-   optional methods to reset the agent when it has finished or failed its task.
+   optional methods to reset the Agent when it has finished or failed its task.
-   in the scene that represents the agent in the simulation. Each Agent object
+   in the scene that represents the Agent in the simulation. Each Agent object
   must be assigned a Brain object.
 6. If training, set the Brain type to External and
   [run the training process](Training-ML-Agents.md).  

 Next, we will create a very simple scene to act as our ML-Agents environment.
 The "physical" components of the environment include a Plane to act as the floor
-for the agent to move around on, a Cube to act as the goal or target for the
-agent to seek, and a Sphere to represent the agent itself.
+for the Agent to move around on, a Cube to act as the goal or target for the
+agent to seek, and a Sphere to represent the Agent itself.

 ### Create the floor plane

   leave it alone for now.

 So far, these are the basic steps that you would use to add ML-Agents to any
-Unity project. Next, we will add the logic that will let our agent learn to roll
+Unity project. Next, we will add the logic that will let our Agent learn to roll
 to the cube using reinforcement learning.

 In this simple scenario, we don't use the Academy object to control the

 ### Initialization and Resetting the Agent

-When the agent reaches its target, it marks itself done and its agent reset
-function moves the target to a random location. In addition, if the agent rolls
+When the Agent reaches its target, it marks itself done and its Agent reset
+function moves the target to a random location. In addition, if the Agent rolls
 off the platform, the reset function puts it back onto the floor.

 To move the target GameObject, we need a reference to its Transform (which
 allowing you to choose which GameObject to use as the target in the Unity
-Editor. To reset the agent's velocity (and later to apply force to move the
+Editor. To reset the Agent's velocity (and later to apply force to move the
 agent) we need a reference to the Rigidbody component. A
 [Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) is Unity's
 primary element for physics simulation. (See
    {
        if (this.transform.position.y < -1.0)
        {  
-            // The agent fell
+            // The Agent fell
            this.transform.position = Vector3.zero;
            this.rBody.angularVelocity = Vector3.zero;
            this.rBody.velocity = Vector3.zero;
 ### Observing the Environment

 The Agent sends the information we collect to the Brain, which uses it to make a
-decision. When you train the agent (or use a trained model), the data is fed
-into a neural network as a feature vector. For an agent to successfully learn a
+decision. When you train the Agent (or use a trained model), the data is fed
+into a neural network as a feature vector. For an Agent to successfully learn a
-In our case, the information our agent collects includes:
+In our case, the information our Agent collects includes:
-  training. Note that the agent only collects the x and z coordinates since the
+  training. Note that the Agent only collects the x and z coordinates since the
  floor is aligned with the x-z plane and the y component of the target's
  position never changes.

 AddVectorObs(relativePosition.z / 5);
 ```

-* Position of the agent itself within the confines of the floor. This data is
-  collected as the agent's distance from each edge of the floor.
+* Position of the Agent itself within the confines of the floor. This data is
+  collected as the Agent's distance from each edge of the floor.

 ```csharp
 // Distance to edges of platform
 AddVectorObs((this.transform.position.z - 5) / 5);
 ```

-* The velocity of the agent. This helps the agent learn to control its speed so
+* The velocity of the Agent. This helps the Agent learn to control its speed so
  it doesn't overshoot the target and roll off the platform.

 ```csharp
 `AgentAction()` function. The number of elements in this array is determined by
 the `Vector Action Space Type` and `Vector Action Space Size` settings of the
 agent's Brain. The RollerAgent uses the continuous vector action space and needs
-two continuous control signals from the brain. Thus, we will set the Brain
+two continuous control signals from the Brain. Thus, we will set the Brain
-axis. (If we allowed the agent to move in three dimensions, then we would need
+axis. (If we allowed the Agent to move in three dimensions, then we would need
 to set `Vector Action Size` to 3. Each of these values returned by the network
 are between `-1` and `1.` Note the Brain really has no idea what the values in
 the action array mean. The training process just adjusts the action values in
 ### Rewards

 Reinforcement learning requires rewards. Assign rewards in the `AgentAction()`
-function. The learning algorithm uses the rewards assigned to the agent at each
+function. The learning algorithm uses the rewards assigned to the Agent at each
-the agent the optimal actions. You want to reward an agent for completing the
-assigned task (reaching the Target cube, in this case) and punish the agent if
+the Agent the optimal actions. You want to reward an Agent for completing the
+assigned task (reaching the Target cube, in this case) and punish the Agent if
-training with sub-rewards that encourage behavior that helps the agent complete
+training with sub-rewards that encourage behavior that helps the Agent complete
-the agent moves closer to the target in a step and a small negative reward at
-each step which encourages the agent to complete its task quickly.
+the Agent moves closer to the target in a step and a small negative reward at
+each step which encourages the Agent to complete its task quickly.
-agent as finished by setting the agent to done.
+agent as finished by setting the Agent to done.

 ```csharp
 float distanceToTarget = Vector3.Distance(this.transform.position,
 }
 ```

-**Note:** When you mark an agent as done, it stops its activity until it is
-reset. You can have the agent reset immediately, by setting the
+**Note:** When you mark an Agent as done, it stops its activity until it is
+reset. You can have the Agent reset immediately, by setting the
-It can also encourage an agent to finish a task more quickly to assign a
+It can also encourage an Agent to finish a task more quickly to assign a
 negative reward at each step:

 ```csharp

-Finally, to punish the agent for falling off the platform, assign a large
-negative reward and, of course, set the agent to done so that it resets itself
+Finally, to punish the Agent for falling off the platform, assign a large
+negative reward and, of course, set the Agent to done so that it resets itself
 in the next step:

 ```csharp
 Now, that all the GameObjects and ML-Agent components are in place, it is time
 to connect everything together in the Unity Editor. This involves assigning the
 Brain object to the Agent, changing some of the Agent Components properties, and
-setting the Brain properties so that they are compatible with our agent code.
+setting the Brain properties so that they are compatible with our Agent code.

 1. Expand the Academy GameObject in the Hierarchy window, so that the Brain
   object is visible.

 It is always a good idea to test your environment manually before embarking on
 an extended training run. The reason we have left the Brain set to the
-**Player** type is so that we can control the agent using direct keyboard
+**Player** type is so that we can control the Agent using direct keyboard
 control. But first, you need to define the keyboard to action mapping. Although
 the RollerAgent only has an `Action Size` of two, we will use one key to specify
 positive values and one to specify negative values for each action, for a total
 `AgentAction()` function. **Value** is assigned to action[Index] when **Key** is
 pressed.

-Press **Play** to run the scene and use the WASD keys to move the agent around
+Press **Play** to run the scene and use the WASD keys to move the Agent around
-Console window and that the agent resets when it reaches its target or falls
+Console window and that the Agent resets when it reaches its target or falls
-includes a convenient Monitor class that you can use to easily display agent
+includes a convenient Monitor class that you can use to easily display Agent
 status information in the Game window.

 One additional test you can perform is to first ensure that your environment and
 Keep in mind:

 * There can only be one Academy game object in a scene.
-* You can have multiple Brain game objects but they must be child of the Academy game object.  
+* You can have multiple Brain game objects but they must be child of the Academy
+  game object.

 Here is an example of what your scene hierarchy should look like:

--- a/docs/Learning-Environment-Design-Academy.md
+++ b/docs/Learning-Environment-Design-Academy.md
 # Creating an Academy

 An Academy orchestrates all the Agent and Brain objects in a Unity scene. Every
-scene containing agents must contain a single Academy. To use an Academy, you
+scene containing Agents must contain a single Academy. To use an Academy, you
 must create your own subclass. However, all the methods you can override are
 optional.

 ## Resetting an Environment

 Implement an `AcademyReset()` function to alter the environment at the start of
-each episode. For example, you might want to reset an agent to its starting
+each episode. For example, you might want to reset an Agent to its starting
 position or move a goal to a random position. An environment resets when the
 Academy `Max Steps` count is reached.

 ## Controlling an Environment

 The `AcademyStep()` function is called at every step in the simulation before
-any agents are updated. Use this function to update objects in the environment
+any Agents are updated. Use this function to update objects in the environment
 at every step or during the episode between environment resets. For example, if
 you want to add elements to the environment at random intervals, you can put the
 logic for creating them in the `AcademyStep()` function.
--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md
 # Agents

 An agent is an actor that can observe its environment and decide on the best
-course of action using those observations. Create agents in Unity by extending
+course of action using those observations. Create Agents in Unity by extending
-successfully learn are the observations the agent collects and, for
-reinforcement learning, the reward you assign to estimate the value of the
+successfully learn are the observations the agent collects for
+reinforcement learning and the reward you assign to estimate the value of the
-An agent passes its observations to its brain. The brain, then, makes a decision
+An Agent passes its observations to its Brain. The Brain, then, makes a decision
 and passes the chosen action back to the agent. Your agent code must execute the
 action, for example, move the agent in one direction or another. In order to
 [train an agent using reinforcement learning](Learning-Environment-Design.md),

-The Brain class abstracts out the decision making logic from the agent itself so
-that you can use the same brain in multiple agents. How a brain makes its
-decisions depends on the type of brain it is. An **External** brain simply
-passes the observations from its agents to an external process and then passes
-the decisions made externally back to the agents. An **Internal** brain uses the
+The Brain class abstracts out the decision making logic from the Agent itself so
+that you can use the same Brain in multiple Agents. How a Brain makes its
+decisions depends on the type of Brain it is. An **External** Brain simply
+passes the observations from its Agents to an external process and then passes
+the decisions made externally back to the Agents. An **Internal** Brain uses the
-parameters in search of a better decision). The other types of brains do not
+parameters in search of a better decision). The other types of Brains do not
 directly involve training, but you might find them useful as part of a training
 project. See [Brains](Learning-Environment-Design-Brains.md).
  
 of simulation steps (the frequency defaults to once-per-step). You can also set
-up an agent to request decisions on demand. Making decisions at regular step
+up an Agent to request decisions on demand. Making decisions at regular step
-decisions on demand is generally appropriate for situations where agents only
+decisions on demand is generally appropriate for situations where Agents only
 respond to specific events or take actions of variable duration. For example, an
 agent in a robotic simulator that must provide fine-control of joint torques
 should make its decisions every step of the simulation. On the other hand, an
 To control the frequency of step-based decision making, set the **Decision
 Frequency** value for the Agent object in the Unity Inspector window. Agents
 using the same Brain instance can use a different frequency. During simulation
-steps in which no decision is requested, the agent receives the same action
+steps in which no decision is requested, the Agent receives the same action
-On demand decision making allows agents to request decisions from their brains
+On demand decision making allows Agents to request decisions from their Brains
 only when needed instead of receiving decisions at a fixed frequency. This is
 useful when the agents commit to an action for a variable number of steps or
 when the agents cannot make decisions at the same time. This typically the case
-When you turn on **On Demand Decisions** for an agent, your agent code must call
+When you turn on **On Demand Decisions** for an Agent, your agent code must call
-of the observation-decision-action-reward cycle. The Brain invokes the agent's
+of the observation-decision-action-reward cycle. The Brain invokes the Agent's
-`AgentAction()` method. The Brain waits for the agent to request the next
+`AgentAction()` method. The Brain waits for the Agent to request the next
 decision before starting another iteration.

 ## Observations
  point numbers.
 * **Visual Observations** — one or more camera images.

-When you use vector observations for an agent, implement the
+When you use vector observations for an Agent, implement the
-to implement the `CollectObservations()` method when your agent uses visual
+to implement the `CollectObservations()` method when your Agent uses visual
 observations (unless it also uses vector observations).

 ### Vector Observation Space: Feature Vectors
-class calls the `CollectObservations()` method of each of its agents. Your
+class calls the `CollectObservations()` method of each of its Agents. Your
-The observation must include all the information an agent needs to accomplish
+The observation must include all the information an agents needs to accomplish
 its task. Without sufficient and relevant information, an agent may learn poorly
 or may not learn at all. A reasonable approach for determining what information
 should be included is to consider what you would need to calculate an analytical
 an agent's observations to a fixed subset. For example, instead of observing
 every enemy agent in an environment, you could only observe the closest five.

-When you set up an Agent's brain in the Unity Editor, set the following
+When you set up an Agent's Brain in the Unity Editor, set the following
 properties to use a continuous vector observation:

 * **Space Size** — The state size must match the length of your feature vector.
 ### Multiple Visual Observations

 Camera observations use rendered textures from one or more cameras in a scene.
-The brain vectorizes the textures into a 3D Tensor which can be fed into a
+The Brain vectorizes the textures into a 3D Tensor which can be fed into a
 convolutional neural network (CNN). For more information on CNNs, see [this
 guide](http://cs231n.github.io/convolutional-networks/). You can use camera
 observations along side vector observations.
 also typically less efficient and slower to train, and sometimes don't succeed
 at all.  

-To add a visual observation to an agent, click on the `Add Camera` button in the
+To add a visual observation to an Agent, click on the `Add Camera` button in the
-can have more than one camera attached to an agent.
+can have more than one camera attached to an Agent.
-specify the number of Cameras the agent is using for its visual observations.
+specify the number of Cameras the Agent is using for its visual observations.
 For each visual observation, set the width and height of the image (in pixels)
 and whether or not the observation is color or grayscale (when `Black And White`
 is checked).
-An action is an instruction from the brain that the agent carries out. The
-action is passed to the agent as a parameter when the Academy invokes the
+An action is an instruction from the Brain that the agent carries out. The
+action is passed to the Agent as a parameter when the Academy invokes the
-is **Continuous**, the action parameter passed to the agent is an array of
+is **Continuous**, the action parameter passed to the Agent is an array of
 control signals with length equal to the `Vector Action Space Size` property.
 When you specify a **Discrete** vector action space type, the action parameter
 is an array containing integers. Each integer is an index into a list or table
 corresponds to an action table, you can specify the size of each table by
 modifying the `Branches` property. Set the `Vector Action Space Size` and
-`Vector Action Space Type` properties on the Brain object assigned to the agent
+`Vector Action Space Type` properties on the Brain object assigned to the Agent
-many training episodes. Thus, the only place actions are defined for an agent is
+many training episodes. Thus, the only place actions are defined for an Agent is
 in the `AgentAction()` function. You simply specify the type of vector action
 space, and, for the continuous vector action space, the number of values, and
 then apply the received values appropriately (and consistently) in
 either continuous or the discrete vector actions. In the continuous case, you
 would set the vector action size to two (one for each dimension), and the
-agent's brain would create an action with two floating point values. In the
+agent's Brain would create an action with two floating point values. In the
-direction), and the brain would create an action array containing a single
+direction), and the Brain would create an action array containing a single
-movement), and the brain would create an action array containing two elements
+movement), and the Brain would create an action array containing two elements
-test your action logic using a **Player** brain, which lets you map keyboard
+test your action logic using a **Player** Brain, which lets you map keyboard
 commands to actions. See [Brains](Learning-Environment-Design-Brains.md).

 The [3DBall](Learning-Environment-Examples.md#3dball-3d-balance-ball) and
 ### Continuous Action Space

-When an agent uses a brain set to the **Continuous** vector action space, the
-action parameter passed to the agent's `AgentAction()` function is an array with
+When an Agent uses a Brain set to the **Continuous** vector action space, the
+action parameter passed to the Agent's `AgentAction()` function is an array with
-them. If you assign an element in the array as the speed of an agent, for
-example, the training process learns to control the speed of the agent though
+them. If you assign an element in the array as the speed of an Agent, for
+example, the training process learns to control the speed of the Agent though
 this parameter.

 The [Reacher example](Learning-Environment-Examples.md#reacher) defines a

 ### Discrete Action Space

-When an agent uses a brain set to the **Discrete** vector action space, the
-action parameter passed to the agent's `AgentAction()` function is an array
+When an Agent uses a Brain set to the **Discrete** vector action space, the
+action parameter passed to the Agent's `AgentAction()` function is an array
-For example, if we wanted an agent that can move in an plane and jump, we could
+For example, if we wanted an Agent that can move in an plane and jump, we could
-agent be able to move __and__ jump concurently. We define the first branch to
+agent be able to move __and__ jump concurrently. We define the first branch to
 have 5 possible actions (don't move, go left, go right, go backward, go forward)
 and the second one to have 2 possible actions (don't jump, jump). The
 AgentAction method would look something like:
 // Look up the index in the jump action list:
 if (jump == 1 && IsGrounded()) { directionY = 1; }

-// Apply the action results to move the agent
+// Apply the action results to move the Agent
 gameObject.GetComponent<Rigidbody>().AddForce(
    new Vector3(
        directionX * 40f, directionY * 300f, directionZ * 40f));
 #### Masking Discrete Actions

 When using Discrete Actions, it is possible to specify that some actions are
-impossible for the next decision. Then the agent is controlled by an External or
-Internal Brain, the agent will be unable to perform the specified action. Note
-that when the agent is controlled by a Player or Heuristic Brain, the agent will
+impossible for the next decision. Then the Agent is controlled by an External or
+Internal Brain, the Agent will be unable to perform the specified action. Note
+that when the Agent is controlled by a Player or Heuristic Brain, the Agent will
 still be able to decide to perform the masked action. In order to mask an
 action, call the method `SetActionMask` within the `CollectObservation` method :

 * `branch` is the index (starting at 0) of the branch on which you want to mask
  the action
 * `actionIndices` is a list of `int` or a single `int` corresponding to the
-  index of theaction that the agent cannot perform.
+  index of the action that the Agent cannot perform.
-For example, if you have an agent with 2 branches and on the first branch
+For example, if you have an Agent with 2 branches and on the first branch
-and _"change weapon"_. Then with the code bellow, the agent will either _"do
+and _"change weapon"_. Then with the code bellow, the Agent will either _"do
 nothing"_ or _"change weapon"_ for his next decision (since action index 1 and 2
 are masked)

 reward over time. The better your reward mechanism, the better your agent will
 learn.

-**Note:** Rewards are not used during inference by a brain using an already
+**Note:** Rewards are not used during inference by a Brain using an already
-to display the cumulative reward received by an agent. You can even use a Player
-brain to control the agent while watching how it accumulates rewards.
+to display the cumulative reward received by an Agent. You can even use a Player
+Brain to control the Agent while watching how it accumulates rewards.
-Allocate rewards to an agent by calling the `AddReward()` method in the
+Allocate rewards to an Agent by calling the `AddReward()` method in the
 `AgentAction()` function. The reward assigned in any step should be in the range
 [-1,1].  Values outside this range can lead to unstable training. The `reward`
 value is reset to zero at every step.
    SetReward(0.1f);
 }

-// When ball falls mark agent as done and give a negative penalty
+// When ball falls mark Agent as done and give a negative penalty
 if ((ball.transform.position.y - gameObject.transform.position.y) < -2f ||
    Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||
    Mathf.Abs(ball.transform.position.z - gameObject.transform.position.z) > 3f)

 ![Agent Inspector](images/agent.png)

-* `Brain` - The brain to register this agent to. Can be dragged into the
+* `Brain` - The Brain to register this Agent to. Can be dragged into the
-  reached, the agent will be reset if `Reset On Done` is checked.
-* `Reset On Done` - Whether the agent's `AgentReset()` function should be called
-  when the agent reaches its `Max Step` count or is marked as done in code.
-* `On Demand Decision` - Whether the agent requests decisions at a fixed step
+  reached, the Agent will be reset if `Reset On Done` is checked.
+* `Reset On Done` - Whether the Agent's `AgentReset()` function should be called
+  when the Agent reaches its `Max Step` count or is marked as done in code.
+* `On Demand Decision` - Whether the Agent requests decisions at a fixed step
  interval or explicitly requests decisions by calling `RequestDecision()`.
  * If not checked, the Agent will request a new decision every `Decision
     Frequency` steps and perform an action every step. In the example above,
    * `RequestAction()` Signals that the Agent is requesting an action. The
        action provided to the Agent in this case is the same action that was
        provided the last time it requested a decision.
-* `Decision Frequency` - The number of steps between decision requests. Not used if `On Demand Decision`, is true.
+* `Decision Frequency` - The number of steps between decision requests. Not used
+  if `On Demand Decision`, is true.
-Unity environment. While this was built for monitoring an Agent's value function
+Unity environment. While this was built for monitoring an agent's value function
 throughout the training process, we imagine it can be more broadly useful. You
 can learn more [here](Feature-Monitor.md).

 `GameObject.Instantiate()` function. It is typically easiest to instantiate an
 agent from a [Prefab](https://docs.unity3d.com/Manual/Prefabs.html) (otherwise,
-you have to instantiate every GameObject and Component that make up your agent
+you have to instantiate every GameObject and Component that make up your Agent
-following function creates a new agent given a Prefab, Brain instance, location,
+following function creates a new Agent given a Prefab, Brain instance, location,
-private void CreateAgent(GameObject agentPrefab, Brain brain, Vector3 position, Quaternion orientation)
+private void CreateAgent(GameObject AgentPrefab, Brain brain, Vector3 position, Quaternion orientation)
-    GameObject agentObj = Instantiate(agentPrefab, position, orientation);
-    Agent agent = agentObj.GetComponent<Agent>();
-    agent.GiveBrain(brain);
-    agent.AgentReset();
+    GameObject AgentObj = Instantiate(agentPrefab, position, orientation);
+    Agent Agent = AgentObj.GetComponent<Agent>();
+    Agent.GiveBrain(brain);
+    Agent.AgentReset();
 }
 ```

-the next step in the simulation) so that the Brain knows that this agent is no
-longer active. Thus, the best place to destroy an agent is in the
+the next step in the simulation) so that the Brain knows that this Agent is no
+longer active. Thus, the best place to destroy an Agent is in the
 `Agent.AgentOnDone()` function:

 ```csharp
 }
 ```

-Note that in order for `AgentOnDone()` to be called, the agent's `ResetOnDone`
-property must be false. You can set `ResetOnDone` on the agent's Inspector or in
+Note that in order for `AgentOnDone()` to be called, the Agent's `ResetOnDone`
+property must be false. You can set `ResetOnDone` on the Agent's Inspector or in
 code.
--- a/docs/Learning-Environment-Design-Brains.md
+++ b/docs/Learning-Environment-Design-Brains.md

 * [External](Learning-Environment-Design-External-Internal-Brains.md) — The
  **External** and **Internal** types typically work together; set **External**
-  when training your agents. You can also use the **External** brain to
+  when training your Agents. You can also use the **External** Brain to
-  **Heuristic** to hand-code the agent's logic by extending the Decision class.
+  **Heuristic** to hand-code the Agent's logic by extending the Decision class.
-  keyboard keys to agent actions, which can be useful to test your agent code.
+  keyboard keys to Agent actions, which can be useful to test your Agent code.
-During training, set your agent's brain type to **External**. To use the trained
-model, import the model file into the Unity project and change the brain type to
+During training, set your Agent's Brain type to **External**. To use the trained
+model, import the model file into the Unity project and change the Brain type to
-Inspector window. These properties must be appropriate for the agents using the
-brain. For example, the `Vector Observation Space Size` property must match the
-length of the feature vector created by an agent exactly. See
+Inspector window. These properties must be appropriate for the Agents using the
+Brain. For example, the `Vector Observation Space Size` property must match the
+length of the feature vector created by an Agent exactly. See
 [Agents](Learning-Environment-Design-Agents.md) for information about creating
 agents and setting up a Brain instance correctly.

 * `Brain Parameters` - Define vector observations, visual observation, and
  vector actions for the Brain.
  * `Vector Observation`
-    * `Space Size` - Length of vector observation for brain.
+    * `Space Size` - Length of vector observation for Brain.
-      effective size of the vector observation being passed to the brain being:
+      effective size of the vector observation being passed to the Brain being:
      _Space Size_ x _Stacked Vectors_.
  * `Visual Observations` - Describes height, width, and whether to grayscale
    visual observations for the Brain.
-    * `Space Size` (Continuous) - Length of action vector for brain.
-    * `Branches` (Discrete) - An array of integers, defines multiple concurent
+    * `Space Size` (Continuous) - Length of action vector for Brain.
+    * `Branches` (Discrete) - An array of integers, defines multiple concurrent
      discrete actions. The values in the `Branches` array correspond to the
      number of possible discrete values for each action branch.
    * `Action Descriptions` - A list of strings used to name the available

 ## Using the Broadcast Feature

-The Player, Heuristic and Internal brains have been updated to support
-broadcast. The broadcast feature allows you to collect data from your agents
+The Player, Heuristic and Internal Brains have been updated to support
+broadcast. The broadcast feature allows you to collect data from your Agents
 using a Python program without controlling them.  

 ### How to use: Unity
 ### How to use: Python

 When you launch your Unity Environment from a Python program, you can see what
-the agents connected to non-external brains are doing. When calling `step` or
-`reset` on your environment, you retrieve a dictionary mapping brain names to
+the Agents connected to non-External Brains are doing. When calling `step` or
+`reset` on your environment, you retrieve a dictionary mapping Brain names to
-non-external brain set to broadcast as well as for any external brains.  
+non-External Brain set to broadcast as well as for any External Brains.  
-Just like with an external brain, the `BrainInfo` object contains the fields for
+Just like with an External Brain, the `BrainInfo` object contains the fields for
-were taken by the agents at the previous step, not the current one.  
+were taken by the Agents at the previous step, not the current one.  
-for non-external brains. If there are no external brains in the scene, simply
+for non-External Brains. If there are no External Brains in the scene, simply
-Heuristics or Internal brains game sessions. You can then use this data to train
+Heuristics or Internal Brains game sessions. You can then use this data to train
 an agent in a supervised context.
--- a/docs/Learning-Environment-Design-External-Internal-Brains.md
+++ b/docs/Learning-Environment-Design-External-Internal-Brains.md
 # External and Internal Brains

 The **External** and **Internal** types of Brains work in different phases of
-training. When training your agents, set their brain types to **External**; when
-using the trained models, set their brain types to **Internal**.
+training. When training your Agents, set their Brain types to **External**; when
+using the trained models, set their Brain types to **Internal**.
-training process to collect the observations of agents using that brain and give
-the agents their actions.
+training process to collect the observations of Agents using that Brain and give
+the Agents their actions.
-In addition to using an External brain for training using the ML-Agents learning
-algorithms, you can use an External brain to control agents in a Unity
-environment using an external Python program. See [Python API](../ml-agents/README.md)
+In addition to using an External Brain for training using the ML-Agents learning
+algorithms, you can use an External Brain to control Agents in a Unity
+environment using an external Python program. See [Python API](Python-API.md)
 for more information.

 Unlike the other types, the External Brain has no properties to set in the Unity
 A __model__ is a mathematical relationship mapping an agent's observations to
 its actions. TensorFlow is a software library for performing numerical
 computation through data flow graphs. A TensorFlow model, then, defines the
-mathematical relationship between your agent's observations and its actions
+mathematical relationship between your Agent's observations and its actions
 using a TensorFlow data flow graph.

 ### Creating a graph model
 * `Graph Scope` : If you set a scope while training your TensorFlow model, all
  your placeholder name will have a prefix. You must specify that prefix here.
  Note that if more than one Brain were set to external during training, you
-  must give a `Graph Scope` to the internal Brain corresponding to the name of
+  must give a `Graph Scope` to the Internal Brain corresponding to the name of
-  graph, you must specify the name if the placeholder here. The brain will make
-  the batch size equal to the number of agents connected to the brain
+  graph, you must specify the name if the placeholder here. The Brain will make
+  the batch size equal to the number of Agents connected to the Brain
  automatically.
 * `State Node Name` : If your graph uses the state as an input, you must specify
  the name of the placeholder here.
  if the output placeholder here.
 * `Observation Placeholder Name` : If your graph uses observations as input, you
  must specify it here. Note that the number of observations is equal to the
-  length of `Camera Resolutions` in the brain parameters.
+  length of `Camera Resolutions` in the Brain parameters.
-  actions of the brain in your graph. If the action space type is continuous,
+  actions of the Brain in your graph. If the action space type is continuous,
  the output must be a one dimensional tensor of float of length `Action Space
  Size`, if the action space type is discrete, the output must be a one
  dimensional tensor of int of the same length as the `Branches` array.
--- a/docs/Learning-Environment-Design-Heuristic-Brains.md
+++ b/docs/Learning-Environment-Design-Heuristic-Brains.md
 # Heuristic Brain

-The **Heuristic** brain type allows you to hand code an agent's decision making
-process. A Heuristic brain requires an implementation of the Decision interface
+The **Heuristic** Brain type allows you to hand code an Agent's decision making
+process. A Heuristic Brain requires an implementation of the Decision interface
 to which it delegates the decision making process.

 When you set the **Brain Type** property of a Brain to **Heuristic**, you must

 The Decision interface defines two methods, `Decide()` and `MakeMemory()`.

-The `Decide()` method receives an agents current state, consisting of the
-agent's observations, reward, memory and other aspects of the agent's state, and
-must return an array containing the action that the agent should take. The
+The `Decide()` method receives an Agents current state, consisting of the
+agent's observations, reward, memory and other aspects of the Agent's state, and
+must return an array containing the action that the Agent should take. The
 format of the returned action array depends on the **Vector Action Space Type**.
 When using a **Continuous** action space, the action array is just a float array
 with a length equal to the **Vector Action Space Size** setting. When using a
 integers.

 The `MakeMemory()` function allows you to pass data forward to the next
-iteration of an agent's decision making process. The array you return from
+iteration of an Agent's decision making process. The array you return from
-can use the memory to allow the agent's decision process to take past actions
+can use the memory to allow the Agent's decision process to take past actions
 and observations into account when making the current decision. If your
 heuristic logic does not require memory, just return an empty array.
--- a/docs/Learning-Environment-Design-Player-Brains.md
+++ b/docs/Learning-Environment-Design-Player-Brains.md
 # Player Brain

-The **Player** brain type allows you to control an agent using keyboard
-commands. You can use Player brains to control a "teacher" agent that trains
-other agents during [imitation learning](Training-Imitation-Learning.md). You
-can also use Player brains to test your agents and environment before changing
-their brain types to **External** and running the training process.
+The **Player** Brain type allows you to control an Agent using keyboard
+commands. You can use Player Brains to control a "teacher" Agent that trains
+other Agents during [imitation learning](Training-Imitation-Learning.md). You
+can also use Player Brains to test your Agents and environment before changing
+their Brain types to **External** and running the training process.
-The **Player** brain properties allow you to assign one or more keyboard keys to
+The **Player** Brain properties allow you to assign one or more keyboard keys to
-brain uses the discrete action space, you can send one integer value as the
-action per step. In contrast, when a brain uses the continuous action space you
+Brain uses the discrete action space, you can send one integer value as the
+action per step. In contrast, when a Brain uses the continuous action space you
 can send any number of floating point values (up to the **Vector Action Space
 Size** setting).

  action. (If you press both keys at the same time, deterministic results are not guaranteed.)|
 ||**Element 0–N**| The mapping of keys to action values. |
 || **Key** | The key on the keyboard. |
-|| **Index** | The element of the agent's action vector to set when this key is
+|| **Index** | The element of the Agent's action vector to set when this key is
-|| **Value** | The value to send to the agent as its action for the specified
+|| **Value** | The value to send to the Agent as its action for the specified
  index when the mapped key is pressed. All other members of the action vector
  are set to 0. |
 |**Discrete Player Actions**|| The mapping for the discrete vector action space.
 || **Key** | The key on the keyboard. |
-|| **Branch Index** |The element of the agent's action vector to set when this
+|| **Branch Index** |The element of the Agent's action vector to set when this
-|| **Value** | The value to send to the agent as its action when the mapped key
+|| **Value** | The value to send to the Agent as its action when the mapped key
  is pressed. Cannot exceed the max value for the associated branch (minus 1,
  since it is an array index).|

--- a/docs/Learning-Environment-Design.md
+++ b/docs/Learning-Environment-Design.md
 Training and simulation proceed in steps orchestrated by the ML-Agents Academy
 class. The Academy works with Agent and Brain objects in the scene to step
 through the simulation. When either the Academy has reached its maximum number
-of steps or all agents in the scene are _done_, one training episode is
+of steps or all Agents in the scene are _done_, one training episode is
-neural network model. The type of Brain assigned to an agent determines whether
-it participates in training or not. The **External** brain communicates with the
+neural network model. The type of Brain assigned to an Agent determines whether
+it participates in training or not. The **External** Brain communicates with the
-with an **Internal** brain.
+with an **Internal** Brain.
-2. Calls the `AgentReset()` function for each agent in the scene.
-3. Calls the  `CollectObservations()` function for each agent in the scene.
-4. Uses each agent's Brain class to decide on the agent's next action.
+2. Calls the `AgentReset()` function for each Agent in the scene.
+3. Calls the  `CollectObservations()` function for each Agent in the scene.
+4. Uses each Agent's Brain class to decide on the Agent's next action.
-6. Calls the `AgentAction()` function for each agent in the scene, passing in
-   the action chosen by the agent's brain. (This function is not called if the
-   agent is done.)
-7. Calls the agent's `AgentOnDone()` function if the agent has reached its `Max
+6. Calls the `AgentAction()` function for each Agent in the scene, passing in
+   the action chosen by the Agent's Brain. (This function is not called if the
+   Agent is done.)
+7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
-   an agent to restart if it finishes before the end of an episode. In this
+   an Agent to restart if it finishes before the end of an episode. In this
   case, the Academy calls the `AgentReset()` function.
 8. When the Academy reaches its own `Max Step` count, it starts the next episode
   again by calling your Academy subclass's `AcademyReset()` function.
 **Note:** The API used by the Python PPO training process to communicate with
 and control the Academy during training can be used for other purposes as well.
 For example, you could use the API to use Unity as the simulation engine for
-your own machine learning algorithms. See [Python API](../ml-agents/README.md) for more
+your own machine learning algorithms. See [Python API](Python-API.md) for more
 information.

 ## Organizing the Unity Scene
 as you need. Any Brain instances in the scene must be attached to GameObjects
 that are children of the Academy in the Unity Scene Hierarchy. Agent instances
-should be attached to the GameObject representing that agent.
+should be attached to the GameObject representing that Agent.
-You must assign a brain to every agent, but you can share brains between
-multiple agents. Each agent will make its own observations and act
+You must assign a Brain to every Agent, but you can share Brains between
+multiple Agents. Each Agent will make its own observations and act
-brains, the same trained TensorFlow model.
+Brains, the same trained TensorFlow model.
-The Academy object orchestrates agents and their decision making processes. Only
+The Academy object orchestrates Agents and their decision making processes. Only
 place a single Academy object in a scene.

 You must create a subclass of the Academy class (since the base class is
 * `InitializeAcademy()` — Prepare the environment the first time it launches.
-* `AcademyReset()` — Prepare the environment and agents for the next training
+* `AcademyReset()` — Prepare the environment and Agents for the next training
-  objects in the scene before the agents take their actions. Note that the
-  agents have already collected their observations and chosen an action before
+  objects in the scene before the Agents take their actions. Note that the
+  Agents have already collected their observations and chosen an action before
  the Academy invokes this method.

 The base Academy classes also defines several important properties that you can
 assigned a Brain, but you can use the same Brain with more than one Agent.

 Use the Brain class directly, rather than a subclass. Brain behavior is
-determined by the brain type. During training, set your agent's brain type to
+determined by the Brain type. During training, set your Agent's Brain type to
-project and change the brain type to **Internal**. See
+project and change the Brain type to **Internal**. See
-different types of brains. You can extend the CoreBrain class to create
-different brain types if the four built-in types don't do what you need.
+different types of Brains. You can extend the CoreBrain class to create
+different Brain types if the four built-in types don't do what you need.
-Inspector window. These properties must be appropriate for the agents using the
-brain. For example, the `Vector Observation Space Size` property must match the
-length of the feature vector created by an agent exactly. See
+Inspector window. These properties must be appropriate for the Agents using the
+Brain. For example, the `Vector Observation Space Size` property must match the
+length of the feature vector created by an Agent exactly. See
 [Agents](Learning-Environment-Design-Agents.md) for information about creating
 agents and setting up a Brain instance correctly.

 in a football game or a car object in a vehicle simulation. Every Agent must be
 assigned a Brain.  

-To create an agent, extend the Agent class and implement the essential
+To create an Agent, extend the Agent class and implement the essential
-* `CollectObservations()` — Collects the agent's observation of its environment.
-* `AgentAction()` — Carries out the action chosen by the agent's brain and
+* `CollectObservations()` — Collects the Agent's observation of its environment.
+* `AgentAction()` — Carries out the action chosen by the Agent's Brain and
-Brain assigned to this agent must be set.
+Brain assigned to this Agent must be set.
-manually set an agent to done in your `AgentAction()` function when the agent
-has finished (or irrevocably failed) its task. You can also set the agent's `Max
-Steps` property to a positive value and the agent will consider itself done
+manually set an Agent to done in your `AgentAction()` function when the Agent
+has finished (or irrevocably failed) its task. You can also set the Agent's `Max
+Steps` property to a positive value and the Agent will consider itself done
-count, it starts the next episode. If you set an agent's `ResetOnDone` property
-to true, then the agent can attempt its task several times in one episode. (Use
-the `Agent.AgentReset()` function to prepare the agent to start again.)
+count, it starts the next episode. If you set an Agent's `ResetOnDone` property
+to true, then the Agent can attempt its task several times in one episode. (Use
+the `Agent.AgentReset()` function to prepare the Agent to start again.)
-about programing your own agents.
+about programing your own Agents.

 ## Environments


 * The training scene must start automatically when your Unity application is
  launched by the training process.
-* The scene must include at least one **External** brain.
+* The scene must include at least one **External** Brain.
-  each agent setting itself to `done`.
+  each Agent setting itself to `done`.
--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md
 * Set-up: A linear movement task where the agent must move left or right to
  rewarding states.
 * Goal: Move to the most reward state.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: One variable corresponding to current state.
  * Vector Action space: (Discrete) Two possible actions (Move left, move
    right).
 * Goal: The agent must balance the platform in order to keep the ball on it for
  as long as possible.
 * Agents: The environment contains 12 agents of the same kind, all linked to a
-  single brain.
+  single Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 8 variables corresponding to rotation of platform,
    and position, rotation, and velocity of ball.
  * Vector Observation space (Hard Version): 5 variables corresponding to
  and obstacles.
 * Goal: The agent must navigate the grid to the goal while avoiding the
  obstacles.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: None
  * Vector Action space: (Discrete) Size of 4, corresponding to movement in
    cardinal directions. Note that for this environment, 
  net.
 * Goal: The agents must bounce ball between one another while not dropping or
  sending ball out of bounds.
-* Agents: The environment contains two agent linked to a single brain named
-  TennisBrain. After training you can attach another brain named MyBrain to one
+* Agents: The environment contains two agent linked to a single Brain named
+  TennisBrain. After training you can attach another Brain named MyBrain to one
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 8 variables corresponding to position and velocity
    of ball and racket.
  * Vector Action space: (Continuous) Size of 2, corresponding to movement

 * Set-up: A platforming environment where the agent can push a block around.
 * Goal: The agent must push the block to the goal.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: (Continuous) 70 variables corresponding to 14
    ray-casts each detecting one of three possible objects (wall, goal, or
    block).

 * Set-up: A platforming environment where the agent can jump over a wall.
 * Goal: The agent must use the block to scale the wall and reach the goal.
-* Agents: The environment contains one agent linked to two different brains. The
-  brain the agent is linked to changes depending on the height of the wall.
+* Agents: The environment contains one agent linked to two different Brains. The
+  Brain the agent is linked to changes depending on the height of the wall.
-* Brains: Two brains, each with the following observation/action space.
-  * Vector Observation space: Size of 74, corresponding to 14 raycasts each
+* Brains: Two Brains, each with the following observation/action space.
+  * Vector Observation space: Size of 74, corresponding to 14 ray casts each
-    * Rotation (3 possible acions: Rotate Left, Rotate Right, No Action)
-    * Side Motion (3 possible acions: Left, Right, No Action)
+    * Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
+    * Side Motion (3 possible actions: Left, Right, No Action)
    * Jump (2 possible actions: Jump, No Action)
  * Visual Observations: None.
 * Reset Parameters: 4, corresponding to the height of the possible walls.

 * Set-up: Double-jointed arm which can move to target locations.
 * Goal: The agents must move it's hand to the goal location, and keep it there.
-* Agents: The environment contains 10 agent linked to a single brain.
+* Agents: The environment contains 10 agent linked to a single Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 26 variables corresponding to position, rotation,
    velocity, and angular velocities of the two arm Rigidbodies.
  * Vector Action space: (Continuous) Size of 4, corresponding to torque
 * Goal: The agents must move its body toward the goal direction without falling.
  * `CrawlerStaticTarget` - Goal direction is always forward.
  * `CrawlerDynamicTarget`- Goal direction is randomized.
-* Agents: The environment contains 3 agent linked to a single brain.
+* Agents: The environment contains 3 agent linked to a single Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 117 variables corresponding to position, rotation,
    velocity, and angular velocities of each limb plus the acceleration and
    angular acceleration of the body.
 * Set-up: A multi-agent environment where agents compete to collect bananas.
 * Goal: The agents must learn to move to as many yellow bananas as possible
  while avoiding blue bananas.
-* Agents: The environment contains 5 agents linked to a single brain.
+* Agents: The environment contains 5 agents linked to a single Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 53 corresponding to velocity of agent (2), whether
    agent is frozen and/or shot its laser (2), plus ray-based perception of
    objects around agent's forward direction (49; 7 raycast angles with 7
-    * Side Motion (3 possible acions: Left, Right, No Action)
-    * Rotation (3 possible acions: Rotate Left, Rotate Right, No Action)
+    * Side Motion (3 possible actions: Left, Right, No Action)
+    * Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
    * Laser (2 possible actions: Laser, No Action)
  * Visual Observations (Optional): First-person camera per-agent. Use
    `VisualBanana` scene.
  remember it, and use it to move to the correct goal.
 * Goal: Move to the goal which corresponds to the color of the block in the
  room.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
-* Brains: One brain with the following observation/action space:
+* Brains: One Brain with the following observation/action space:
  * Vector Observation space: 30 corresponding to local ray-casts detecting
    objects, goals, and walls.
  * Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent
 * Set-up: Environment where the agent needs on-demand decision making. The agent
  must decide how perform its next bounce only when it touches the ground.
 * Goal: Catch the floating banana. Only has a limited number of jumps.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
-* Brains: One brain with the following observation/action space:
+* Brains: One Brain with the following observation/action space:
  * Vector Observation space: 6 corresponding to local position of agent and
    banana.
  * Vector Action space: (Continuous) 3 corresponding to agent force applied for
 * Goal:
  * Striker: Get the ball into the opponent's goal.
  * Goalie: Prevent the ball from entering its own goal.
-* Agents: The environment contains four agents, with two linked to one brain
+* Agents: The environment contains four agents, with two linked to one Brain
  (strikers) and two linked to another (goalies).
 * Agent Reward Function (dependent):
  * Striker:
    * -1 When ball enters team's goal.
    * +0.1 When ball enters opponents goal.
    * +0.001 Existential bonus.
-* Brains: Two brain with the following observation/action space:
+* Brains: Two Brain with the following observation/action space:
  * Vector Observation space: 112 corresponding to local 14 ray casts, each
    detecting 7 possible object types, along with the object's distance.
    Perception is in 180 degree view from front of agent.

 * Set-up: Physics-based Humanoids agents with 26 degrees of freedom. These DOFs
  correspond to articulation of the following body-parts: hips, chest, spine,
-  head, thighs, shins, feets, arms, forearms and hands.
+  head, thighs, shins, feet, arms, forearms and hands.
-  brain.
+  Brain.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
  * Vector Observation space: 215 variables corresponding to position, rotation,
    velocity, and angular velocities of each limb, along with goal direction.
  * Vector Action space: (Continuous) Size of 39, corresponding to target
  pyramid, then navigate to the pyramid, knock it over, and move to the gold
  brick at the top.
 * Goal: Move to the golden brick on top of the spawned pyramid.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
-* Brains: One brain with the following observation/action space:
+* Brains: One Brain with the following observation/action space:
  * Vector Observation space: 148 corresponding to local ray-casts detecting
    switch, bricks, golden brick, and walls, plus variable indicating switch
    state.
--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md

 Make sure the Brains in the scene have the right type. For example, if you want
 to be able to control your agents from Python, you will need to set the
-corresponding brain to **External**.
+corresponding Brain to **External**.

 1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
   object.

 ## Interacting with the Environment

-If you want to use the [Python API](../ml-agents/README.md) to interact with your
+If you want to use the [Python API](Python-API.md) to interact with your
 executable, you can pass the name of the executable with the argument
 'file_name' of the `UnityEnvironment`. For instance:

 ## Training the Environment

 1. Open a command or terminal window.
-2. Nagivate to the folder where you installed ML-Agents.
-3. Change to the python directory.
-4. Run
+2. Navigate to the folder where you installed the ML-Agents Toolkit. If you
+   followed the default [installation](Installation.md), then navigate to the
+   `ml-agents/` folder.
+3. Run
-   * `<trainer-config-file>` is the filepath of the trainer configuration yaml.
+   * `<trainer-config-file>` is the file path of the trainer configuration yaml
   * `<env_name>` is the name and path to the executable you exported from Unity
     (without extension)
   * `<run-identifier>` is a string used to separate the results of different

 For example, if you are training with a 3DBall executable you exported to the
-ml-agents/python directory, run:
+the directory where you installed the ML-Agents Toolkit, run:
-mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train
+mlagents-learn ../config/trainer_config.yaml --env=3DBall --run-id=firstRun --train
 ```

 And you should see something like
 You can press Ctrl+C to stop the training, and your trained model will be at
 `models/<run-identifier>/<env_name>_<run-identifier>.bytes`, which corresponds
 to your model's latest checkpoint. You can now embed this trained model into
-your internal brain by following the steps below:
+your Internal Brain by following the steps below:

 1. Move your model file into
   `UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
--- a/docs/Limitations.md
+++ b/docs/Limitations.md

 Currently the speed of the game physics can only be increased to 100x real-time.
 The Academy also moves in time with FixedUpdate() rather than Update(), so game
-behavior implemented in Update() may be out of sync with the Agent decision
+behavior implemented in Update() may be out of sync with the agent decision
 making. See
 [Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
 for more information.

 As of version 0.3, we no longer support Python 2.

-### Tensorflow support
+### TensorFlow support

 Currently the Ml-Agents toolkit uses TensorFlow 1.7.1 due to the version of the
 TensorFlowSharp plugin we are using.
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
       border="10" />
 </p>

-_An example of how a scene containing multiple Agents and Brains might be 
+_An example of how a scene containing multiple Agents and Brains might be
 configured._

 ## Training Modes
 the scene will be controlled within Python.

 We do not currently have a tutorial highlighting this mode, but you can
-learn more about the Python API [here](../ml-agents/README.md).
+learn more about the Python API [here](Python-API.md).

 ### Curriculum Learning

 training intelligent agents, below are a few examples that can serve as
 inspiration:

- Single-Agent. A single Agent linked to a single Brain, with its own reward
+- Single-Agent. A single agent linked to a single Brain, with its own reward
- Simultaneous Single-Agent. Multiple independent Agents with independent reward
+- Simultaneous Single-Agent. Multiple independent agents with independent reward
  signals linked to a single Brain. A parallelized version of the traditional
  training scenario, which can speed-up and stabilize the training process.
  Helpful when you have multiple versions of the same character in an
- Adversarial Self-Play. Two interacting Agents with inverse reward signals
+- Adversarial Self-Play. Two interacting agents with inverse reward signals
- Cooperative Multi-Agent. Multiple interacting Agents with a shared reward
+- Cooperative Multi-Agent. Multiple interacting agents with a shared reward
- Competitive Multi-Agent. Multiple interacting Agents with inverse reward
+- Competitive Multi-Agent. Multiple interacting s with inverse reward
-  scenario, agents must compete with one another to either win a competition, or
+  scenario, s must compete with one another to either win a competition, or
- Ecosystem. Multiple interacting Agents with independent reward signals linked
+- Ecosystem. Multiple interacting s with independent reward signals linked
  to either a single or multiple different Brains. This scenario can be thought
  of as creating a small world in which animals with different goals all
  interact, such as a savanna in which there might be zebras, elephants and
  learn more about enabling LSTM during training [here](Feature-Memory.md).

 - **Monitoring Agent’s Decision Making** - Since communication in ML-Agents is a
-  two-way street, we provide an agent Monitor class in Unity which can display
-  aspects of the trained agent, such as the agents perception on how well it is
+  two-way street, we provide an Agent Monitor class in Unity which can display
+  aspects of the trained Agent, such as the Agents perception on how well it is
-  real-time, researchers and developers can more easily debug an agent’s
+  real-time, researchers and developers can more easily debug an Agent’s
  behavior. You can learn more about using the Monitor class
  [here](Feature-Monitor.md).

--- a/docs/Migrating.md
+++ b/docs/Migrating.md

 ### Unity API

-* Discrete Actions now use [branches](https://arxiv.org/abs/1711.08946). You can now specify concurrent discrete
-  actions. You will need to update the Brain Parameters in the Brain Inspector
-  in all your environments that use discrete actions. Refer to the [discrete action documentation](Learning-Environment-Design-Agents.md#discrete-action-space) for more information. 
+* Discrete Actions now use [branches](https://arxiv.org/abs/1711.08946). You can
+  now specify concurrent discrete actions. You will need to update the Brain
+  Parameters in the Brain Inspector in all your environments that use discrete
+  actions. Refer to the
+  [discrete action documentation](Learning-Environment-Design-Agents.md#discrete-action-space)
+  for more information.

 ### Python API


 ### Python API

-* We've changed some of the python packages dependencies in requirement.txt
-  file. Make sure to run `pip install .` within your `ml-agents/python` folder
-  to update your python packages.
+* We've changed some of the Python packages dependencies in requirement.txt
+  file. Make sure to run `pip3 install .` within your `ml-agents/python` folder
+  to update your Python packages.

 ## Migrating from ML-Agents toolkit v0.2 to v0.3

  replaced with a single `learn.py` script as the launching point for training
  with ML-Agents. For more information on using `learn.py`, see
  [here](Training-ML-Agents.md#training-with-mlagents-learn).
-* Hyperparameters for training brains are now stored in the
+* Hyperparameters for training Brains are now stored in the
  `trainer_config.yaml` file. For more information on using this file, see
  [here](Training-ML-Agents.md#training-config-file).

 * `AgentStep()` has been replaced by `AgentAction()`.
 * `WaitTime()` has been removed.
 * The `Frame Skip` field of the Academy is replaced by the Agent's `Decision
-  Frequency` field, enabling agent to make decisions at different frequencies.
+  Frequency` field, enabling the Agent to make decisions at different frequencies.
 * The names of the inputs in the Internal Brain have been changed. You must
  replace `state` with `vector_observation` and `observation` with
  `visual_observation`. In addition, you must remove the `epsilon` placeholder.
--- a/docs/Readme.md
+++ b/docs/Readme.md
 ## API Docs

 * [API Reference](API-Reference.md)
-* [How to use the Python API](../ml-agents/README.md)
-* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
+* [How to use the Python API](Python-API.md)
+* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md

 Each Brain in an environment can have a corresponding curriculum. These
 curriculums are held in what we call a metacurriculum. A metacurriculum allows
-different brains to follow different curriculums within the same environment.
+different Brains to follow different curriculums within the same environment.

 ### Specifying a Metacurriculum

  measure by previous values.
  * If `true`, weighting will be 0.75 (new) 0.25 (old).
 * `parameters` (dictionary of key:string, value:float array) - Corresponds to
-  academy reset parameters to control. Length of each array should be one
+  Academy reset parameters to control. Length of each array should be one
-and modify the environment from the agent's `AgentReset()` function. See
+and modify the environment from the Agent's `AgentReset()` function. See
 [WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/UnitySDK/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
 for an example. Note that if the Academy's __Max Steps__ is not set to some
 positive number the environment will never be reset. The Academy must reset
 corresponding Brain. For example, in the Wall Jump environment, there are two
-brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
+Brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
 the BigWallBrain, we will save `BigWallBrain.json` into
 `curricula/wall-jump/`.

--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md

 1. In order to use imitation learning in a scene, the first thing you will need
   is to create two Brains, one which will be the "Teacher," and the other which
-   will be the "Student." We will assume that the names of the brain
+   will be the "Student." We will assume that the names of the Brain
-2. Set the "Teacher" brain to Player mode, and properly configure the inputs to
+2. Set the "Teacher" Brain to Player mode, and properly configure the inputs to
-3. Set the "Student" brain to External mode.
-4. Link the brains to the desired agents (one agent as the teacher and at least
-   one agent as a student).
-5. In `config/trainer_config.yaml`, add an entry for the "Student" brain. Set
+3. Set the "Student" Brain to External mode.
+4. Link the Brains to the desired Agents (one Agent as the teacher and at least
+   one Agent as a student).
+5. In `config/trainer_config.yaml`, add an entry for the "Student" Brain. Set
-   `brain_to_imitate` parameter to the name of the teacher brain: "Teacher".
+   `brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
-   the agents for a longer period of time.
+   the Agents for a longer period of time.
-7. From the Unity window, control the agent with the Teacher brain by providing
+7. From the Unity window, control the Agent with the Teacher Brain by providing
-8. Watch as the agent(s) with the student brain attached begin to behave
+8. Watch as the Agent(s) with the student Brain attached begin to behave
-9. Once the Student agents are exhibiting the desired behavior, end the training
+9. Once the Student Agents are exhibiting the desired behavior, end the training
-    with `Internal` brain.
+    with `Internal` Brain.

 ### BC Teacher Helper

--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
 `ml-agents/mlagents/trainers/learn.py`. The [configuration file](#training-config-file),
 `config/trainer_config.yaml` specifies the hyperparameters used during training.
 You can edit this file with a text editor to add a specific configuration for
-each brain.
+each Brain.

 For a broader overview of reinforcement learning, imitation learning and the
 ML-Agents training process, see [ML-Agents Toolkit

 where

-* `<trainer-config-file>` is the filepath of the trainer configuration yaml.
+* `<trainer-config-file>` is the file path of the trainer configuration yaml.
 * `<env_name>`__(Optional)__ is the name (including path) of your Unity
  executable containing the agents to be trained. If `<env_name>` is not passed,
  the training will happen in the Editor. Press the :arrow_forward: button in
 1. [Build the project](Learning-Environment-Executable.md), making sure that you
   only include the training scene.
 2. Open a terminal or console window.
-3. Navigate to the ml-agents `python` folder.
+3. Navigate to the directory where you installed the ML-Agents Toolkit.
 4. Run the following to launch the training process using the path to the Unity
   environment you built in step 1:

 regular intervals (specified by the `summary_freq` option). The saved statistics
 are grouped by the `run-id` value so you should assign a unique id to each
 training run if you plan to view the statistics. You can view these statistics
-using TensorBoard during or after training by running the following command
-(from the ML-Agents python directory):
+using TensorBoard during or after training by running the following command:

 ```sh
 tensorboard --logdir=summaries
 settings. (This GameObject will be a child of the Academy in your scene.)
 Sections for the example environments are included in the provided config file.

-| ** Setting ** | **Description** | **Applies To Trainer**|
-| :--           | :--             | :--                   |
+| **Setting** | **Description** | **Applies To Trainer**|
+| :--         | :--             | :--                   |
 | batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
 | batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
 | beta | The strength of entropy regularization.| PPO, BC |
--- a/docs/Training-PPO.md
+++ b/docs/Training-PPO.md

 ### Entropy

-This corresponds to how random the decisions of a brain are. This should
+This corresponds to how random the decisions of a Brain are. This should
 consistently decrease during training. If it decreases too soon or not at all,
 `beta` should be adjusted (when using discrete action space).

--- a/docs/Training-on-Amazon-Web-Service.md
+++ b/docs/Training-on-Amazon-Web-Service.md
    source activate python3
    ```

-2. Clone the ML-Agents repo and install the required python packages
+2. Clone the ML-Agents repo and install the required Python packages

    ```sh
    git clone https://github.com/Unity-Technologies/ml-agents.git
--- a/docs/Training-on-Microsoft-Azure.md
+++ b/docs/Training-on-Microsoft-Azure.md
 following command to complete dependency installation:

 ```sh
-pip install docopt
+pip3 install docopt
 ```

 Note that, if you choose to deploy the image to an

 1. [Move](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/copy-files-to-linux-vm-using-scp)
    your built Unity application to your Virtual Machine.
-2. Set the `ml-agents` sub-folder of the ml-agents repo to your working
-   directory.
+2. Set the the directory where the ML-Agents Toolkit was installed to your
+   working directory.
 3. Run the following command:

 ```sh

 2. Unless you started the training as a background process, connect to your VM
   from another terminal instance.
-3. Set the `python` folder in ml-agents to your current working directory.
-4. Run the following command from your `tensorboard --logdir=summaries --host
-   0.0.0.0`
-5. You should now be able to open a browser and navigate to
+3. Run the following command from your terminal
+   `tensorboard --logdir=summaries --host 0.0.0.0`
+4. You should now be able to open a browser and navigate to
   `<Your_VM_IP_Address>:6060` to view the TensorBoard report.

 ## Running on Azure Container Instances
 it isn't needed.  You can read more about
 [The ML-Agents toolkit support for Docker containers here](Using-Docker.md).
 Using ACI enables you to offload training of your models without needing to
-install Python and Tensorflow on your own computer.  You can find instructions,
+install Python and TensorFlow on your own computer.  You can find instructions,
 including a pre-deployed image in DockerHub for you to use, available
 [here](https://github.com/druttka/unity-ml-on-azure).
--- a/docs/Using-TensorFlow-Sharp-in-Unity.md
+++ b/docs/Using-TensorFlow-Sharp-in-Unity.md
 placed in placeholders of dimension 1 and size 1. (Be sure to name them.)

 It is important that the inputs and outputs of the graph are exactly the ones
-you receive and return when training your model with an `External` brain. This
+you receive and return when training your model with an `External` Brain. This
 means you cannot have any operations such as reshaping outside of the graph. The
 object you get by calling `step` or `reset` has fields `vector_observations`,
 `visual_observations` and `memories` which must correspond to the placeholders
 .bytes file so Unity can load it.

 In the Unity Editor, you must specify the names of the nodes used by your graph
-in the **Internal** brain Inspector window. If you used a scope when defining
+in the **Internal** Brain Inspector window. If you used a scope when defining
 your graph, specify it in the `Graph Scope` field.

 ![Internal Brain Inspector](images/internal_brain.png)
 for more information about using Internal Brains.

-If you followed these instructions well, the agents in your environment that use
-this brain will use your fully trained network to make decisions.
+If you followed these instructions well, the Agents in your environment that use
+this Brain will use your fully trained network to make decisions.

 ## iOS additional instructions for building

--- a/docs/Using-Tensorboard.md
+++ b/docs/Using-Tensorboard.md
 start TensorBoard:

 1. Open a terminal or console window:
-2. Navigate to the ml-agents/python folder.
+2. Navigate to the directory where the ML-Agents Toolkit is installed.
-        tensorboard --logdir=summaries
+      ```sh
+      tensorboard --logdir=summaries
+      ```

 4. Open a browser window and navigate to [localhost:6006](http://localhost:6006).


 ## The ML-Agents toolkit training statistics

-The ML-agents training program saves the following statistics:
+The ML-Agents training program saves the following statistics:

 ![Example TensorBoard Run](images/mlagents-TensorBoard.png)

--- a/docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
 
 在打开 3D Balance Ball 场景后，您可能会首先注意到它包含的
 不是一个平台，而是多个平台。场景中的每个平台都是
-独立的 agent，但它们全部共享同一个 brain。3D Balance Ball 通过
+独立的 agent，但它们全部共享同一个 Brain。3D Balance Ball 通过
 这种方式可以加快训练速度，因为所有 12 个 agent 可以并行参与训练任务。

 ### Academy
 Brain 不存储关于 agent 的任何信息，
 只是将 agent 收集的观测结果发送到决策过程，
 然后将所选的动作返回给 agent。因此，所有 agent 可共享
-同一个 brain，但会独立行动。Brain 设置可以提供很多
+同一个 Brain，但会独立行动。Brain 设置可以提供很多
-**Heuristic** brain 允许您通过扩展 Decision 类来对 agent 的逻辑进行
-手动编码。最后，**Player** brain 可让您将键盘命令
+**Heuristic** Brain 允许您通过扩展 Decision 类来对 agent 的逻辑进行
+手动编码。最后，**Player** Brain 可让您将键盘命令
-会非常有用。如果这些类型的 brain 都不能满足您的需求，您可以
+会非常有用。如果这些类型的 Brain 都不能满足您的需求，您可以
 实现自己的 CoreBrain 来创建自有的类型。

 在本教程中，进行训练时，需要将 **Brain Type** 设置为 **External**；

 **向量运动空间**

-brain 以*动作*的形式向 agent 提供指令。与状态
+Brain 以*动作*的形式向 agent 提供指令。与状态
-`RigidBody` 上的力或扭矩。**Discrete** 向量运动空间将其动作
+`Rigidbody` 上的力或扭矩。**Discrete** 向量运动空间将其动作
 定义为一个表。提供给 agent 的具体动作是这个表的
 索引。

 平台游戏对象上。基础 Agent 对象有一些影响其行为的
 属性：

-* **Brain** — 每个 Agent 必须有一个 Brain。brain 决定了 agent 如何
+* **Brain** — 每个 Agent 必须有一个 Brain。Brain 决定了 agent 如何
-brain。
+Brain。
 * **Visual Observations** — 定义 agent 用来观测其环境的
 任何 Camera 对象。3D Balance Ball 不使用摄像机观测。
 * **Max Step** — 定义在 agent 决定自己完成之前可以发生多少个
 agent 的 Brain 实例设置为状态大小为 8 的连续向量观测空间，
 因此 `CollectObservations()` 必须调用 8 次 
 `AddVectorObs`。
-* Agent.AgentAction() — 在每个模拟步骤调用。接收 brain 选择的
+* Agent.AgentAction() — 在每个模拟步骤调用。接收 Brain 选择的
 动作。Ball3DAgent 示例可以处理连续和离散
 运动空间类型。在此环境中，两种状态类型之间实际上
 没有太大的差别：这两种向量运动空间在每一步都会
 ![3DBall 场景](images/mlagents-Open3DBall.png)

 由于我们要建立此环境来进行训练，因此我们需要
-将 agent 使用的 brain 设置为 **External**。这样 agent 在
+将 agent 使用的 Brain 设置为 **External**。这样 agent 在
 进行决策时能够与外部训练过程进行通信。

 1. 在 **Scene** 窗口中，单击 Ball3DAcademy 对象旁边的三角形

 一旦训练过程完成，并且训练过程保存了模型
 （通过 `Saved Model` 消息可看出），您便可以将该模型添加到 Unity 项目中，
-然后将其用于 brain 类型为 **Internal** 的 agent。
+然后将其用于 Brain 类型为 **Internal** 的 agent。

 ### 设置 TensorFlowSharp 支持


 1. 确保 TensorFlowSharp 插件位于 `Assets` 文件夹中。
 可在
-[此处](https://s3.amazonaws.com/unity-ml-agents/0.3/TFSharpPlugin.unitypackage)下载一个包含 TF# 的 Plugins 文件夹。
+[此处](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)下载一个包含 TF# 的 Plugins 文件夹。
 下载后，双击并将其导入。您可以在 Project 选项卡中
 （位于 `Assets` > `ML-Agents` > `Plugins` > `Computer` 下）
 检查 TensorFlow 的相关文件来查看是否安装成功
--- a/docs/localized/zh-CN/docs/Installation.md
+++ b/docs/localized/zh-CN/docs/Installation.md

 ### Mac 和 Unix 用户

-如果您的 Python 环境不包括 `pip`，请参阅这些
+如果您的 Python 环境不包括 `pip3`，请参阅这些
 [说明](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
 以了解其安装方法。


 ## Unity 包

-您可以通过 Unity 包的形式下载TensorFlowSharp 插件（[AWS S3链接](https://s3.amazonaws.com/unity-ml-agents/0.3/TFSharpPlugin.unitypackage)，[百度盘链接](https://pan.baidu.com/s/1s0mJN8lvuxTcYbs2kL2FqA)）
+您可以通过 Unity 包的形式下载TensorFlowSharp 插件（[AWS S3链接](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)，[百度盘链接](https://pan.baidu.com/s/1s0mJN8lvuxTcYbs2kL2FqA)）

 ## 帮助

--- a/docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
+++ b/docs/localized/zh-CN/docs/Learning-Environment-Create-New.md

 **动作**

-Brain 的决策以动作数组的形式传递给 `AgentAction()` 函数。此数组中的元素数量由 agent 的 Brain 的 `Vector Action Space Type` 和 `Vector Action Space Size` 设置确定。RollerAgent 使用连续向量运动空间，并需要 brain 提供的两个连续控制信号。因此，我们要将 Brain `Vector Action Size` 设置为 2。第一个元素 `action[0]` 确定沿 x 轴施加的力；`action[1]` 确定沿 z 轴施加的力。（如果我们允许 agent 以三维方式移动，那么我们需要将 `Vector Action Size` 设置为 3。）注意，Brain 并不知道动作数组中的值是什么意思。训练过程只是根据观测输入来调整动作值，然后看看会得到什么样的奖励。
+Brain 的决策以动作数组的形式传递给 `AgentAction()` 函数。此数组中的元素数量由 agent 的 Brain 的 `Vector Action Space Type` 和 `Vector Action Space Size` 设置确定。RollerAgent 使用连续向量运动空间，并需要 Brain 提供的两个连续控制信号。因此，我们要将 Brain `Vector Action Size` 设置为 2。第一个元素 `action[0]` 确定沿 x 轴施加的力；`action[1]` 确定沿 z 轴施加的力。（如果我们允许 agent 以三维方式移动，那么我们需要将 `Vector Action Size` 设置为 3。）注意，Brain 并不知道动作数组中的值是什么意思。训练过程只是根据观测输入来调整动作值，然后看看会得到什么样的奖励。

 RollerAgent 使用 `Rigidbody.AddForce` 函数将 action[] 数组中的值应用到其 Rigidbody 组件 `rBody`：


 1. 选择 Brain 游戏对象以便在 Inspector 中查看该对象的属性。
 2. 将 **Brain Type** 设置为 **Player**。
-3. 展开 **Continuous Player Actions**（仅在使用 **Player* brain 时可见）。
+3. 展开 **Continuous Player Actions**（仅在使用 **Player* Brain 时可见）。
 4. 将 **Size** 设置为 4。
 5. 设置以下映射：

--- a/docs/localized/zh-CN/docs/Learning-Environment-Design.md
+++ b/docs/localized/zh-CN/docs/Learning-Environment-Design.md

 训练和模拟过程以 ML-Agents Academy 类编排的步骤进行。Academy 与场景中的 Agent 和 Brain 对象一起协作逐步完成模拟。当 Academy 已达到其最大步数或场景中的所有 agent 均_完成_时，一个训练场景即完成。

-在训练期间，处于外部的 Python 进程会在训练过程中与 Academy 不断进行通信以便运行一系列场景，同时会收集数据并优化其神经网络模型。分配给 agent 的 Brain 类型决定了我们是否进行训练。**External** brain 会与外部过程进行通信以训练 TensorFlow 模型。成功完成训练后，您可以将经过训练的模型文件添加到您的 Unity 项目中，以便提供给 **Internal** brain 来控制agent的行为。
+在训练期间，处于外部的 Python 进程会在训练过程中与 Academy 不断进行通信以便运行一系列场景，同时会收集数据并优化其神经网络模型。分配给 agent 的 Brain 类型决定了我们是否进行训练。**External** Brain 会与外部过程进行通信以训练 TensorFlow 模型。成功完成训练后，您可以将经过训练的模型文件添加到您的 Unity 项目中，以便提供给 **Internal** Brain 来控制agent的行为。

 ML-Agents Academy 类按如下方式编排 agent 模拟循环：

 4. 使用每个 agent 的 Brain 类来决定 agent 的下一动作。
 5. 调用您的子类的 `AcademyAct()` 函数。
-6. 对场景中的每个 agent 调用 `AgentAction()` 函数，传入由 agent 的 brain 选择的动作。（如果 agent 已完成，则不调用此函数。）
+6. 对场景中的每个 agent 调用 `AgentAction()` 函数，传入由 agent 的 Brain 选择的动作。（如果 agent 已完成，则不调用此函数。）
 7. 如果 agent 已达到其 `Max Step` 计数或者已将其自身标记为 `done`，则调用 agent 的 `AgentOnDone()` 函数。或者，如果某个 agent 在场景结束之前已完成，您可以将其设置为重新开始。在这种情况下，Academy 会调用 `AgentReset()` 函数。
 8. 当 Academy 达到其自身的 `Max Step` 计数时，它会通过调用您的 Academy 子类的 `AcademyReset()` 函数来再次开始下一场景。


 [Screenshot of scene hierarchy]

-您必须为每个 agent 分配一个 brain，但可以在多个 agent 之间共享 brain。每个 agent 都将进行自己的观测并独立行动，但会使用相同的决策逻辑，而对于 **Internal** brain，则会使用相同的经过训练的 TensorFlow 模型。
+您必须为每个 agent 分配一个 Brain，但可以在多个 agent 之间共享 Brain。每个 agent 都将进行自己的观测并独立行动，但会使用相同的决策逻辑，而对于 **Internal** Brain，则会使用相同的经过训练的 TensorFlow 模型。

 ### Academy

 
 Brain 内部封装了决策过程。Brain 对象必须放在 Hierarchy 视图中的 Academy 的子级。我们必须为每个 Agent 分配一个 Brain，但可以在多个 Agent 之间共享同一个 Brain。

-当我们使用 Brain 类的时候不需要使用其子类，而应该直接使用 Brain 这个类。Brain 的行为取决于 brain 的类型。在训练期间，应将 agent 上连接的 Brain 的 Brain Type 设置为 **External**。要使用经过训练的模型，请将模型文件导入 Unity 项目，并将对应 Brain 的 Brain  Type 更改为 **Internal**。请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以了解有关使用不同类型的 Brain 的详细信息。如果四种内置的类型不能满足您的需求，您可以扩展 CoreBrain 类以创建其它的 Brain 类型。
+当我们使用 Brain 类的时候不需要使用其子类，而应该直接使用 Brain 这个类。Brain 的行为取决于 Brain 的类型。在训练期间，应将 agent 上连接的 Brain 的 Brain Type 设置为 **External**。要使用经过训练的模型，请将模型文件导入 Unity 项目，并将对应 Brain 的 Brain  Type 更改为 **Internal**。请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以了解有关使用不同类型的 Brain 的详细信息。如果四种内置的类型不能满足您的需求，您可以扩展 CoreBrain 类以创建其它的 Brain 类型。
-Brain 类有若干可以使用 Inspector 窗口进行设置的重要属性。对于使用 brain 的 agent，这些属性必须恰当。例如，`Vector Observation Space Size` 属性必须与 agent 创建的特征向量的长度完全匹配。请参阅 [Agent](/docs/Learning-Environment-Design-Agents.md) 以获取有关创建 agent 和正确设置 Brain 实例的信息。
+Brain 类有若干可以使用 Inspector 窗口进行设置的重要属性。对于使用 Brain 的 agent，这些属性必须恰当。例如，`Vector Observation Space Size` 属性必须与 agent 创建的特征向量的长度完全匹配。请参阅 [Agent](/docs/Learning-Environment-Design-Agents.md) 以获取有关创建 agent 和正确设置 Brain 实例的信息。

 请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以查看 Brain 属性的完整列表。

 要创建 agent，请扩展 Agent 类并实现基本的 `CollectObservations()` 和 `AgentAction()` 方法：

 * `CollectObservations()` — 收集 agent 对其环境的观测结果。
-* `AgentAction()` — 执行由 agent 的 brain 选择的动作，并为当前状态分配奖励。
+* `AgentAction()` — 执行由 agent 的 Brain 选择的动作，并为当前状态分配奖励。

 这些函数的实现决定了分配给此 agent 的 Brain 的属性要如何设置。
 
 在 Unity 中创建训练环境时，必须设置场景以便可以通过外部训练过程来控制场景。注意以下几点：

 * 在训练程序启动后，Unity 可执行文件会被自动打开，然后训练场景会自动开始训练。
-* 场景中至少须包括一个 **External** brain。
+* 场景中至少须包括一个 **External** Brain。
 * Academy 必须在每一轮训练后将场景重置为有效的初始状态。
 * 训练场景必须有明确的结束状态，为此需要使用 `Max Steps`，或让每个 agent 将自身设置为 `done`。

--- a/docs/localized/zh-CN/docs/Learning-Environment-Examples.md
+++ b/docs/localized/zh-CN/docs/Learning-Environment-Examples.md

 * 训练环境：一种线性移动任务，在此任务中 agent 必须向左或向右移动到奖励状态。
 * 目标：移动到最高奖励状态。
-* Agent设置：环境包含一个 agent，上面附带了单个 brain。
+* Agent设置：环境包含一个 agent，上面附带了单个 Brain。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：（离散变量）一个变量，对应于当前状态。
    * 向量运动空间：（离散变量）两个可能的动作（向左移动、向右移动）。
    * 视觉观测：0

 * 训练环境：一种平衡球任务，在此任务中 agent 需要控制平台。
 * 目标：agent 必须平衡平台，以尽可能长时间在平台上保持球不掉落。
-* Agent设置：环境包含 12 个全部链接到单个 brain 的同类 agent。
+* Agent设置：环境包含 12 个全部链接到单个 Brain 的同类 agent。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：（连续变量）8 个，对应于平台的旋转以及球的位置、旋转和速度。
    * 向量观测空间（困难版本，因为观测到的信息减少了）：（连续变量）5 个变量，对应于平台的旋转以及球的位置和旋转。
    * 向量运动空间：（连续变量）2 个，其中一个值对应于 X 旋转，而另一个值对应于 Z 旋转。

 * 训练环境：某一个典型版本的的grid-world任务。场景包含 agent、目标和障碍。
 * 目标：agent 必须在网格中避开障碍的同时移动到目标。
-* Agent设置：环境包含一个链接到单个 brain 的 agent。
+* Agent设置：环境包含一个链接到单个 Brain 的 agent。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：无
    * 向量运动空间：（离散变量）4 个，对应于基本方向的移动。
    * 视觉观测：一个对应于 GridWorld 自上而下的视图。

 * 训练环境：agent 控制球拍将球弹过球网的双人游戏。
 * 目标：agent 必须在彼此之间弹起网球，同时不能丢球或击球出界。
-* Agent设置：环境包含两个链接到单个 brain（名为 TennisBrain）的 agent。在训练之后，您可以将另一个名为 MyBrain 的 brain 附加到其中一个 agent，从而与经过训练的模型进行游戏比赛。
+* Agent设置：环境包含两个链接到单个 Brain TennisBrain）的 agent。在训练之后，您可以将另一个名为 MyBrain 的 Brain 附加到其中一个 agent，从而与经过训练的模型进行游戏比赛。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：（连续变量）8 个，分别对应于球和球拍的位置和速度。
    * 向量运动空间：（连续变量）2 个，分别对应于朝向球网或远离球网的运动，以及上下的运动。
    * 视觉观测：无

 * 训练环境：一个平台，agent 可以在该平台上推动方块。
 * 目标：agent 必须将方块推向目标。
-* Agent设置：环境包含一个链接到单个 brain 的 agent。
+* Agent设置：环境包含一个链接到单个 Brain 的 agent。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：（连续变量）15 个，分别对应于 agent、方块和目标的位置和速度。
    * 向量运动空间：（连续变量）2 个，分别对应于 X 和 Z 方向的移动。
    * 视觉观测：无。

 * 训练环境：一个平台环境，agent 可以在该环境中跳过墙。
 * 目标：agent 必须利用一个方块越过墙并到达目标。
-* Agent设置：环境包含一个链接到两个不同 brain 的 agent。agent 链接到的 brain 根据墙的高度而变化。
+* Agent设置：环境包含一个链接到两个不同 Brain 的 agent。agent 链接到的 Brain 根据墙的高度而变化。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：（连续变量）16 个，分别对应于 agent、方块和目标的位置和速度以及墙的高度。
    * 向量运动空间：（离散变量）74 个，分别对应于 14 个射线投射，每个射线投射可检测 4 个可能的物体，加上 agent 的全局位置以及 agent 是否落地。
    * 视觉观测：无。

 * 训练环境：可以移动到目标位置的双关节臂。
 * 目标：agent 必须将手移动到目标位置，并保持在此处。
-* Agent设置：环境包含 32 个链接到单个 brain 的 agent。
+* Agent设置：环境包含 32 个链接到单个 Brain 的 agent。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：（连续变量）26 个，对应于两个机械臂 Rigidbody 的位置、旋转、速度和角速度。
    * 向量运动空间：（连续变量）4 个，对应于两个关节的两个方向上的转动。
    * 视觉观测：无

 * 训练环境：一种有 4 个手臂的生物，每个手臂分两节
 * 目标：agent 必须沿 x 轴移动其身体，并且保持不跌倒。
-* Agent设置：环境包含 3 个链接到单个 brain 的 agent。
+* Agent设置：环境包含 3 个链接到单个 Brain 的 agent。
 * Agent 奖励函数设置（agent互相之间独立）：
    * +1 乘以 x 方向的速度
    * 跌倒时 -1。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：（连续变量）117 个，对应于每个肢体的位置、旋转、速度和角速度以及身体的加速度和角速度。
    * 向量运动空间：（连续变量）12 个，对应于适用于 12 个关节的扭矩。
    * 视觉观测：无

 * 训练环境：一个包含多个 agent 的环境，这些 agent 争相收集香蕉。
 * 目标：agent 必须学习尽可能接近更多的黄色香蕉，同时避开红色香蕉。
-* Agent设置：环境包含 10 个链接到单个 brain 的 agent。
+* Agent设置：环境包含 10 个链接到单个 Brain 的 agent。
-* Brain 设置：一个有以下观测/运动空间的 brain。
+* Brain 设置：一个有以下观测/运动空间的 Brain。
    * 向量观测空间：（连续变量）51 个，对应于 agent 的速度， agent 前进方向，以及 agent 对周围物体进行基于射线的感知。
    * 向量运动空间：（连续变量）3 个，对应于向前移动，绕 y 轴旋转，以及是否使用激光使其他 agent 瘫痪。
    * 视觉观测（可选）：每个 agent 的第一人称视图。

 * 训练环境：在一个环境中，agent 需要在房间内查找信息、记住信息并使用信息移动到正确目标。
 * 目标：移动到与房间内的方块的颜色相对应的目标。
-* Agent设置：环境包含一个链接到单个 brain 的 agent。
+* Agent设置：环境包含一个链接到单个 Brain 的 agent。
 * Agent 奖励函数设置（agent互相之间独立）：
    * 移动到正确目标时 +1。
    * 移动到错误目标时 -0.1。

 * 训练环境：在一个环境中，agent 需要按需决策。agent 必须决定在接触地面时如何进行下一次弹跳。
 * 目标：抓住漂浮的香蕉。跳跃次数有限。
-* Agent设置：环境包含一个链接到单个 brain 的 agent。
+* Agent设置：环境包含一个链接到单个 Brain 的 agent。
 * Agent 奖励函数设置（agent互相之间独立）：
    * 抓住香蕉时 +1。
    * 弹跳出界时 -1。
 * 目标：
    * 前锋：让球进入对手的球门。
    * 守门员：防止球进入自己的球门。
-* Agent设置：环境包含四个 agent，其中两个链接到一个 brain（前锋），两个链接到另一个 brain（守门员）。
+* Agent设置：环境包含四个 agent，其中两个链接到一个 Brain（前锋），两个链接到另一个 Brain（守门员）。
 * Agent 奖励函数设置（agent互相之间非独立）：
    * 前锋：
        * 球进入对手球门时 +1。
--- a/docs/localized/zh-CN/docs/ML-Agents-Overview.md
+++ b/docs/localized/zh-CN/docs/ML-Agents-Overview.md
 我们将 Brain 类型切换为 Internal，并加入从训练阶段
 生成的 TensorFlow 模型。现在，在预测阶段，军医
 仍然继续生成他们的观测结果，但不再将结果发送到 
-Python API，而是送入他们的嵌入了的 Tensorflow 模型，
+Python API，而是送入他们的嵌入了的 TensorFlow 模型，
 以便生成每个军医在每个时间点上要采取的_最佳_动作。

 总结一下：我们的实现是基于 TensorFlow 的，因此，
--- a/ml-agents/README.md
+++ b/ml-agents/README.md
-# Unity ml-agents interface and trainers
+# Unity ML-Agents Python Interface and Trainers
-The `mlagents` package contains two components : The low level API which allows
-you to interact directly with a Unity Environment and a training component whcih
-allows you to train agents in Unity Environments using our implementations of
-reinforcement learning or imitation learning.
+The `mlagents` Python package is part of the
+[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
+`mlagents` provides a Python API that allows direct interaction with the Unity
+game engine as well as a collection of trainers and algorithms to train agents
+in Unity environments.
+
+The `mlagents` Python package contains two components: The low level API which
+allows you to interact directly with a Unity Environment (`mlagents.envs`) and
+an entry point to train (`mlagents-learn`) which allows you to train agents in
+Unity Environments using our implementations of reinforcement learning or
+imitation learning.
-The `ml-agents` package can be installed using:
+Install `mlagents` with:
-or by running the following from the `ml-agents` directory of the repository:
-
-```sh
-pip install .
-```
-
-## `mlagents.envs`
-
-The ML-Agents toolkit provides a Python API for controlling the agent simulation
-loop of a environment or game built with Unity. This API is used by the ML-Agent
-training algorithms (run with `mlagents-learn`), but you can also write your
-Python programs using this API.
-
-The key objects in the Python API include:
-
- **UnityEnvironment** — the main interface between the Unity application and
-  your code. Use UnityEnvironment to start and control a simulation or training
-  session.
- **BrainInfo** — contains all the data from agents in the simulation, such as
-  observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
-  example, provides the array length of an observation in BrainInfo.
-
-These classes are all defined in the `ml-agents/mlagents/envs` folder of
-the ML-Agents SDK.
-
-To communicate with an agent in a Unity environment from a Python program, the
-agent must either use an **External** brain or use a brain that is broadcasting
-(has its **Broadcast** property set to true). Your code is expected to return
-actions for agents with external brains, but can only observe broadcasting
-brains (the information you receive for an agent is the same in both cases).
-
-_Notice: Currently communication between Unity and Python takes place over an
-open socket without authentication. As such, please make sure that the network
-where training takes place is secure. This will be addressed in a future
-release._
-
-### Loading a Unity Environment
-
-Python-side communication happens through `UnityEnvironment` which is located in
-`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
-file, put the file in the same directory as `envs`. For example, if the filename
-of your Unity environment is 3DBall.app, in python, run:
-
-```python
-from mlagents.env import UnityEnvironment
-env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
-```
-
- `file_name` is the name of the environment binary (located in the root
-  directory of the python project).
- `worker_id` indicates which port to use for communication with the
-  environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
-  training process. In environments which do not involve physics calculations,
-  setting the seed enables reproducible experimentation by ensuring that the
-  environment and trainers utilize the same random seed.
+## Usage & More Information
-If you want to directly interact with the Editor, you need to use
-`file_name=None`, then press the :arrow_forward: button in the Editor when the
-message _"Start training by pressing the Play button in the Unity Editor"_ is
-displayed on the screen
-
-### Interacting with a Unity Environment
-
-A BrainInfo object contains the following fields:
-
- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
-  the list corresponds to the n<sup>th</sup> observation of the brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
-  size, vector observation size)`.
- **`text_observations`** : A list of string corresponding to the agents text
-  observations.
- **`memories`** : A two dimensional numpy array of dimension `(batch size,
-  memory size)` which corresponds to the memories sent at the previous step.
- **`rewards`** : A list as long as the number of agents using the brain
-  containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of agents using the brain
-  containing  `done` flags (whether or not the agent is done).
- **`max_reached`** : A list as long as the number of agents using the brain
-  containing true if the agents reached their max steps.
- **`agents`** : A list of the unique ids of the agents using the brain.
- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
-  size, vector action size)` if the vector action space is continuous and
-  `(batch size, number of branches)` if the vector action space is discrete.
-
-Once loaded, you can use your UnityEnvironment object, which referenced by a
-variable named `env` in this example, can be used in the following way:  
-
- **Print : `print(str(env))`**  
-  Prints all parameters relevant to the loaded environment and the external
-  brains.  
- **Reset : `env.reset(train_model=True, config=None)`**  
-  Send a reset signal to the environment, and provides a dictionary mapping
-  brain names to BrainInfo objects.  
-  - `train_model` indicates whether to run the environment in train (`True`) or
-    test (`False`) mode.
-  - `config` is an optional dictionary of configuration flags specific to the
-    environment. For generic environments, `config` can be ignored. `config` is
-    a dictionary of strings to floats where the keys are the names of the
-    `resetParameters` and the values are their corresponding float values.
-    Define the reset parameters on the Academy Inspector window in the Unity
-    Editor.
- **Step : `env.step(action, memory=None, text_action=None)`**  
-  Sends a step signal to the environment using the actions. For each brain :
-  - `action` can be one dimensional arrays or two dimensional arrays if you have
-    multiple agents per brains.
-  - `memory` is an optional input that can be used to send a list of floats per
-    agents to be retrieved at the next step.
-  - `text_action` is an optional input that be used to send a single string per
-    agent.
-
-    Returns a dictionary mapping brain names to BrainInfo objects.
-
-    For example, to access the BrainInfo belonging to a brain called
-    'brain_name', and the BrainInfo field 'vector_observations':
-
-    ```python
-    info = env.step()
-    brainInfo = info['brain_name']
-    observations = brainInfo.vector_observations
-    ```
-
-    Note that if you have more than one external brain in the environment, you
-    must provide dictionaries from brain names to arrays for `action`, `memory`
-    and `value`. For example: If you have two external brains named `brain1` and
-    `brain2` each with one agent taking two continuous actions, then you can
-    have:
-
-    ```python
-    action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
-    ```
-
-    Returns a dictionary mapping brain names to BrainInfo objects.  
- **Close : `env.close()`**
-  Sends a shutdown signal to the environment and closes the communication
-  socket.
-
-## `mlagents.trainers`
-
-1. Open a command or terminal window.
-2. Run
-
-```sh
-mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train <environment-name>
-```
-
-Where:
-
- `<trainer-config-path>` is the relative or absolute filepath of the trainer
-  configuration. The defaults used by environments in the ML-Agents SDK can be
-  found in `config/trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
-  training runs
- The `--train` flag tells `mlagents-learn` to run a training session (rather
-  than inference)
- `<environment-name>` __(Optional)__ is the path to the Unity executable you
-  want to train. __Note:__ If this argument is not passed, the training
-  will be made through the editor.
-
-For more detailled documentation, check out the
-[ML-Agents toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md)
+For more detailed documentation, check out the
+[ML-Agents Toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md)
--- a/docs/Python-API.md
+++ b/docs/Python-API.md
+# Unity ML-Agents Python Interface and Trainers
+
+The `mlagents` Python package is part of the [ML-Agents
+Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents` provides a
+Python API that allows direct interaction with the Unity game engine as well as
+a collection of trainers and algorithms to train agents in Unity environments.
+
+The `mlagents` Python package contains two components: a low level API which
+allows you to interact directly with a Unity Environment (`mlagents.envs`) and
+an entry point to train (`mlagents-learn`) which allows you to train agents in
+Unity Environments using our implementations of reinforcement learning or
+imitation learning.
+
+## mlagents.envs
+
+The ML-Agents Toolkit provides a Python API for controlling the Agent simulation
+loop of an environment or game built with Unity. This API is used by the
+training algorithms inside the ML-Agent Toolkit, but you can also write your own
+Python programs using this API.
+
+The key objects in the Python API include:
+
+- **UnityEnvironment** — the main interface between the Unity application and
+  your code. Use UnityEnvironment to start and control a simulation or training
+  session.
+- **BrainInfo** — contains all the data from Agents in the simulation, such as
+  observations and rewards.
+- **BrainParameters** — describes the data elements in a BrainInfo object. For
+  example, provides the array length of an observation in BrainInfo.
+
+These classes are all defined in the `ml-agents/mlagents/envs` folder of
+the ML-Agents SDK.
+
+To communicate with an Agent in a Unity environment from a Python program, the
+Agent must either use an **External** Brain or use a Brain that is broadcasting
+(has its **Broadcast** property set to true). Your code is expected to return
+actions for Agents with external Brains, but can only observe broadcasting
+Brains (the information you receive for an Agent is the same in both cases).
+
+_Notice: Currently communication between Unity and Python takes place over an
+open socket without authentication. As such, please make sure that the network
+where training takes place is secure. This will be addressed in a future
+release._
+
+### Loading a Unity Environment
+
+Python-side communication happens through `UnityEnvironment` which is located in
+`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
+file, put the file in the same directory as `envs`. For example, if the filename
+of your Unity environment is 3DBall.app, in python, run:
+
+```python
+from mlagents.env import UnityEnvironment
+env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
+```
+
+- `file_name` is the name of the environment binary (located in the root
+  directory of the python project).
+- `worker_id` indicates which port to use for communication with the
+  environment. For use in parallel training regimes such as A3C.
+- `seed` indicates the seed to use when generating random numbers during the
+  training process. In environments which do not involve physics calculations,
+  setting the seed enables reproducible experimentation by ensuring that the
+  environment and trainers utilize the same random seed.
+
+If you want to directly interact with the Editor, you need to use
+`file_name=None`, then press the :arrow_forward: button in the Editor when the
+message _"Start training by pressing the Play button in the Unity Editor"_ is
+displayed on the screen
+
+### Interacting with a Unity Environment
+
+A BrainInfo object contains the following fields:
+
+- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
+  the list corresponds to the n<sup>th</sup> observation of the Brain.
+- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
+  size, vector observation size)`.
+- **`text_observations`** : A list of string corresponding to the Agents text
+  observations.
+- **`memories`** : A two dimensional numpy array of dimension `(batch size,
+  memory size)` which corresponds to the memories sent at the previous step.
+- **`rewards`** : A list as long as the number of Agents using the Brain
+  containing the rewards they each obtained at the previous step.
+- **`local_done`** : A list as long as the number of Agents using the Brain
+  containing  `done` flags (whether or not the Agent is done).
+- **`max_reached`** : A list as long as the number of Agents using the Brain
+  containing true if the Agents reached their max steps.
+- **`agents`** : A list of the unique ids of the Agents using the Brain.
+- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
+  size, vector action size)` if the vector action space is continuous and
+  `(batch size, number of branches)` if the vector action space is discrete.
+
+Once loaded, you can use your UnityEnvironment object, which referenced by a
+variable named `env` in this example, can be used in the following way:  
+
+- **Print : `print(str(env))`**  
+  Prints all parameters relevant to the loaded environment and the external
+  Brains.  
+- **Reset : `env.reset(train_model=True, config=None)`**  
+  Send a reset signal to the environment, and provides a dictionary mapping
+  Brain names to BrainInfo objects.  
+  - `train_model` indicates whether to run the environment in train (`True`) or
+    test (`False`) mode.
+  - `config` is an optional dictionary of configuration flags specific to the
+    environment. For generic environments, `config` can be ignored. `config` is
+    a dictionary of strings to floats where the keys are the names of the
+    `resetParameters` and the values are their corresponding float values.
+    Define the reset parameters on the Academy Inspector window in the Unity
+    Editor.
+- **Step : `env.step(action, memory=None, text_action=None)`**  
+  Sends a step signal to the environment using the actions. For each Brain :
+  - `action` can be one dimensional arrays or two dimensional arrays if you have
+    multiple Agents per Brain.
+  - `memory` is an optional input that can be used to send a list of floats per
+    Agents to be retrieved at the next step.
+  - `text_action` is an optional input that be used to send a single string per
+    Agent.
+
+    Returns a dictionary mapping Brain names to BrainInfo objects.
+
+    For example, to access the BrainInfo belonging to a Brain called
+    'brain_name', and the BrainInfo field 'vector_observations':
+
+    ```python
+    info = env.step()
+    brainInfo = info['brain_name']
+    observations = brainInfo.vector_observations
+    ```
+
+    Note that if you have more than one external Brain in the environment, you
+    must provide dictionaries from Brain names to arrays for `action`, `memory`
+    and `value`. For example: If you have two external Brains named `brain1` and
+    `brain2` each with one Agent taking two continuous actions, then you can
+    have:
+
+    ```python
+    action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
+    ```
+
+    Returns a dictionary mapping Brain names to BrainInfo objects.  
+- **Close : `env.close()`**
+  Sends a shutdown signal to the environment and closes the communication
+  socket.
+
+## mlagents-learn
+
+For more detailed documentation on using `mlagents-learn`, check out
+[Training ML-Agents](Training-ML-Agents.md)
--- a/gym-unity/README.md
+++ b/gym-unity/README.md
+# Unity ML-Agents Gym Wrapper
+
+A common way in which machine learning researchers interact with simulation
+environments is via a wrapper provided by OpenAI called `gym`. For more
+information on the gym interface, see [here](https://github.com/openai/gym).
+
+We provide a a gym wrapper, and instructions for using it with existing machine
+learning algorithms which utilize gyms. Both wrappers provide interfaces on top
+of our `UnityEnvironment` class, which is the default way of interfacing with a
+Unity environment via Python.
+
+## Installation
+
+The gym wrapper can be installed using:
+
+```sh
+pip install gym_unity
+```
+
+or by running the following from the `/gym-unity` directory of the repository:
+
+```sh
+pip install .
+```
+
+## Using the Gym Wrapper
+
+The gym interface is available from `gym_unity.envs`. To launch an environmnent
+from the root of the project repository use:
+
+```python
+from gym_unity.envs import UnityEnv
+
+env = UnityEnv(environment_filename, worker_id, default_visual, multiagent)
+```
+
+* `environment_filename` refers to the path to the Unity environment.
+* `worker_id` refers to the port to use for communication with the environment.
+  Defaults to `0`.
+* `use_visual` refers to whether to use visual observations (True) or vector
+  observations (False) as the default observation provided by the `reset` and
+  `step` functions. Defaults to `False`.
+* `multiagent` refers to whether you intent to launch an environment which
+  contains more than one agent. Defaults to `False`.
+
+The returned environment `env` will function as a gym.
+
+For more on using the gym interface, see our
+[Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb).
+
+## Limitation
+
+* It is only possible to use an environment with a single Brain.
+* By default the first visual observation is provided as the `observation`, if
+  present. Otherwise vector observations are provided.
+* All `BrainInfo` output from the environment can still be accessed from the
+  `info` provided by `env.step(action)`.
+* Stacked vector observations are not supported.
+* Environment registration for use with `gym.make()` is currently not supported.
+
+## Running OpenAI Baselines Algorithms
+
+OpenAI provides a set of open-source maintained and tested Reinforcement
+Learning algorithms called the [Baselines](https://github.com/openai/baselines).
+
+Using the provided Gym wrapper, it is possible to train ML-Agents environments
+using these algorithms. This requires the creation of custom training scripts to
+launch each algorithm. In most cases these scripts can be created by making
+slightly modifications to the ones provided for Atari and Mujoco environments.
+
+### Example - DQN Baseline
+
+In order to train an agent to play the `GridWorld` environment using the
+Baselines DQN algorithm, create a file called `train_unity.py` within the
+`baselines/deepq/experiments` subfolder of the baselines repository. This file
+will be a modification of the `run_atari.py` file within the same folder. Then
+create and `/envs/` directory within the repository, and build the GridWorld
+environment to that directory. For more information on building Unity
+environments, see [here](../docs/Learning-Environment-Executable.md). Add the
+following code to the `train_unity.py` file:
+
+```python
+import gym
+
+from baselines import deepq
+from gym_unity.envs import UnityEnv
+
+def main():
+    env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
+    model = deepq.models.cnn_to_mlp(
+        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
+        hiddens=[256],
+        dueling=True,
+    )
+    act = deepq.learn(
+        env,
+        q_func=model,
+        lr=1e-3,
+        max_timesteps=100000,
+        buffer_size=50000,
+        exploration_fraction=0.1,
+        exploration_final_eps=0.02,
+        print_freq=10,
+    )
+    print("Saving model to unity_model.pkl")
+    act.save("unity_model.pkl")
+
+
+if __name__ == '__main__':
+    main()
+```
+
+To start the training process, run the following from the root of the baselines
+repository:
+
+```sh
+python -m baselines.deepq.experiments.train_unity
+```
+
+### Other Algorithms
+
+Other algorithms in the Baselines repository can be run using scripts similar to
+the example provided above. In most cases, the primary changes needed to use a
+Unity environment are to import `UnityEnv`, and to replace the environment
+creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)`
+passing the environment binary path.
+
+A typical rule of thumb is that for vision-based environments, modification
+should be done to Atari training scripts, and for vector observation
+environments, modification should be done to Mujoco scripts.
+
+Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()`
+functions. These are defined in `baselines/common/cmd_util.py`. In order to use
+Unity environments for these algorithms, add the following import statement and
+function to `cmd_utils.py`:
+
+```python
+from gym_unity.envs import UnityEnv
+
+def make_unity_env(env_directory, num_env, visual, start_index=0):
+    """
+    Create a wrapped, monitored Unity environment.
+    """
+    def make_env(rank): # pylint: disable=C0111
+        def _thunk():
+            env = UnityEnv(env_directory, rank, use_visual=True)
+            env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
+            return env
+        return _thunk
+    if visual:
+        return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
+    else:
+        rank = MPI.COMM_WORLD.Get_rank()
+        env = UnityEnv(env_directory, rank, use_visual=False)
+        env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
+        return env
+
+```
--- a/MLAgentsSDK/README.md
+++ b/MLAgentsSDK/README.md
+# ML-Agents SDK
--- a/gym-unity/Readme.md
+++ b/gym-unity/Readme.md
-# Unity ML-Agents Gym Wrapper
-
-A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by OpenAI called `gym`. For more information on the gym interface, see [here](https://github.com/openai/gym). 
-
-We provide a a gym wrapper, and instructions for using it with existing machine learning algorithms which utilize gyms. Both wrappers provide interfaces on top of our `UnityEnvironment` class, which is the default way of interfacing with a Unity environment via Python.
-
-## Installation
-
-The gym wrapper can be installed using:
-
-```
-pip install gym_unity
-```
-
-or by running the following from the `/gym-unity` directory of the repository:
-
-```
-pip install .
-```
-
-
-## Using the Gym Wrapper
-The gym interface is available from `gym_unity.envs`. To launch an environmnent from the root of the project repository use:
-
-```python
-from gym_unity.envs import UnityEnv
-
-env = UnityEnv(environment_filename, worker_id, default_visual, multiagent)
-```
-
-* `environment_filename` refers to the path to the Unity environment.
-* `worker_id` refers to the port to use for communication with the environment. Defaults to `0`.
-* `use_visual` refers to whether to use visual observations (True) or vector observations (False) as the default observation provided by the `reset` and `step` functions. Defaults to `False`.
-* `multiagent` refers to whether you intent to launch an environment which contains more than one agent. Defaults to `False`.
-
-The returned environment `env` will function as a gym.
-
-For more on using the gym interface, see our [Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb).
-
-## Limitation
-
-* It is only possible to use an environment with a single Brain.
-* By default the first visual observation is provided as the `observation`, if
-  present. Otherwise vector observations are provided. 
-* All `BrainInfo` output from the environment can still be accessed from the
-  `info` provided by `env.step(action)`.
-* Stacked vector observations are not supported.
-* Environment registration for use with `gym.make()` is currently not supported.
-
-## Running OpenAI Baselines Algorithms
-
-OpenAI provides a set of open-source maintained and tested Reinforcement Learning algorithms called the [Baselines](https://github.com/openai/baselines). 
-
-Using the provided Gym wrapper, it is possible to train ML-Agents environments using these algorithms. This requires the creation of custom training scripts to launch each algorithm. In most cases these scripts can be created by making slightly modifications to the ones provided for Atari and Mujoco environments.
-
-### Example - DQN Baseline
-
-In order to train an agent to play the `GridWorld` environment using the Baselines DQN algorithm, create a file called `train_unity.py` within the `baselines/deepq/experiments` subfolder of the baselines repository. This file will be a modification of the `run_atari.py` file within the same folder. Then create and `/envs/` directory within the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). Add the following code to the `train_unity.py` file:
-
-```
-import gym
-
-from baselines import deepq
-from gym_unity.envs import UnityEnv
-
-def main():
-    env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
-    model = deepq.models.cnn_to_mlp(
-        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
-        hiddens=[256],
-        dueling=True,
-    )
-    act = deepq.learn(
-        env,
-        q_func=model,
-        lr=1e-3,
-        max_timesteps=100000,
-        buffer_size=50000,
-        exploration_fraction=0.1,
-        exploration_final_eps=0.02,
-        print_freq=10,
-    )
-    print("Saving model to unity_model.pkl")
-    act.save("unity_model.pkl")
-
-
-if __name__ == '__main__':
-    main()
-```
-
-
-To start the training process, run the following from the root of the baselines repository:
-
-```
-python -m baselines.deepq.experiments.train_unity
-```
-
-### Other Algorithms
-
-Other algorithms in the Baselines repository can be run using scripts similar to the example provided above. In most cases, the primary changes needed to use a Unity environment are to import `UnityEnv`, and to replace the environment creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)` passing the environment binary path. 
-
-A typical rule of thumb is that for vision-based environments, modification should be done to Atari training scripts, and for vector observation environments, modification should be done to Mujoco scripts.
-
-Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()` functions. These are defined in `baselines/common/cmd_util.py`. In order to use Unity environments for these algorithms, add the following import statement and function to `cmd_utils.py`:
-
-```python
-from gym_unity.envs import UnityEnv
-
-def make_unity_env(env_directory, num_env, visual, start_index=0):
-    """
-    Create a wrapped, monitored Unity environment.
-    """
-    def make_env(rank): # pylint: disable=C0111
-        def _thunk():
-            env = UnityEnv(env_directory, rank, use_visual=True)
-            env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
-            return env
-        return _thunk
-    if visual:
-        return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
-    else:
-        rank = MPI.COMM_WORLD.Get_rank()
-        env = UnityEnv(env_directory, rank, use_visual=False)
-        env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
-        return env
-
-```