Merge pull request #461 from Unity-Technologies/dev-doc-fixes

Several documentation enhancements
7 年前 · aff0ba28
--- a/README.md
+++ b/README.md
 ## Documentation and References

 **For more information, in addition to installation and usage
-instructions, see our [documentation home](docs/README.md).**
+instructions, see our [documentation home](docs/README.md).** If you have
+used a version of ML-Agents prior to v0.3, we strongly recommend 
+our [guide on migrating to v0.3](docs/Migrating-v0.3.md).

 We have also published a series of blog posts that are relevant for ML-Agents:
 - Overviewing reinforcement learning concepts
--- a/docs/Feature-Memory.md
+++ b/docs/Feature-Memory.md
 ## Limitations
 * LSTM does not work well with continuous vector action space. 
 Please use discrete vector action space for better results.
-* Since the memories must be sent back and forth between python 
+* Since the memories must be sent back and forth between Python 
 and Unity, using too large `memory_size` will slow down training.
 * Adding a recurrent layer increases the complexity of the neural 
 network, it is recommended to decrease `num_layers` when using recurrent.
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md


 To train the agents within the Ball Balance environment, we will be using the python 
-package. We have provided a convenient python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
+package. We have provided a convenient Python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.


 We will pass to this script the path of the environment executable that we just built. (Optionally) We can

 ```

-python python/learn.py <env_file_path> --run-id=<run-identifier> --train 
+python3 python/learn.py <env_file_path> --run-id=<run-identifier> --train 

 ```

--- a/docs/Installation.md
+++ b/docs/Installation.md

 ## Install Python (with Dependencies)

-In order to use ML-Agents, you need Python (2 or 3; 64 bit required) along with
+In order to use ML-Agents, you need Python 3 along with
-
-We **strongly** recommend using Python 3 as we do not guarantee supporting Python 2
-in future releases. In all of our subsequent instructions, we use `python` 
-to refer to either Python 2 or 3, depending on your installation.

 ### Windows Users

 on installing it.

 To install dependencies, go into the `python` subdirectory of the repository,
-and run (depending on your Python version) from the command line:
-
-    pip install .
-
-or 
+and run from the command line:

    pip3 install .

--- a/docs/Learning-Environment-Create-New.md
+++ b/docs/Learning-Environment-Create-New.md

 Press **Play** to run the scene and use the WASD keys to move the agent around the platform. Make sure that there are no errors displayed in the Unity editor Console window and that the agent resets when it reaches its target or falls from the platform. Note that for more involved debugging, the ML-Agents SDK includes a convenient Monitor class that you can use to easily display agent status information in the Game window.

+One additional test you can perform is to first ensure that your environment 
+and the Python API work as expected using the `python/Basics` 
+[Jupyter notebook](Background-Jupyter.md). Within `Basics`, be sure to set 
+`env_name` to the name of the environment file you specify when building
+this environment.
+
+## Review: Scene Layout
+
+This section briefly reviews how to organize your scene when using 
+Agents in your Unity environment.
+
+There are three kinds of game objects you need to include in your scene in order to use Unity ML-Agents:
+ * Academy  
+ * Brain  
+ * Agents  
+
+Keep in mind:
+ * There can only be one Academy game object in a scene.   
+ * You can have multiple Brain game objects but they must be child of the Academy game object.  
+
+Here is an example of what your scene hierarchy should look like:
+
+![Scene Hierarchy](images/scene-hierarchy.png)
--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md

 To control the frequency of step-based decision making, set the **Decision Frequency** value for the Agent object in the Unity Inspector window. Agents using the same Brain instance can use a different frequency. During simulation steps in which no decision is requested, the agent receives the same action chosen by the previous decision.

-When you turn on **On Demand Decisions** for an agent, your agent code must call the `Agent.RequestDecision()` function. This function call starts one iteration of the observation-decision-action-reward cycle. The Brain invokes the agent's `CollectObservations()` method, makes a decision and returns it by calling the `AgentAction()` method. The Brain waits for the agent to request the next decision before starting another iteration.
+### On Demand Decision Making
-See [On Demand Decision Making](Feature-On-Demand-Decision.md).
+On demand decision making allows agents to request decisions from their 
+brains only when needed instead of receiving decisions at a fixed 
+frequency. This is useful when the agents commit to an action for a 
+variable number of steps or when the agents cannot make decisions 
+at the same time. This typically the case for turn based games, games 
+where agents must react to events or games where agents can take 
+actions of variable duration.
+
+When you turn on **On Demand Decisions** for an agent, your agent code must call the `Agent.RequestDecision()` function. This function call starts one iteration of the observation-decision-action-reward cycle. The Brain invokes the agent's `CollectObservations()` method, makes a decision and returns it by calling the `AgentAction()` method. The Brain waits for the agent to request the next decision before starting another iteration.

 ## Observations

 * `Max Step` - The per-agent maximum number of steps. Once this number is reached, the agent will be reset if `Reset On Done` is checked.
 * `Reset On Done` - Whether the agent's `AgentReset()` function should be called when the agent reaches its `Max Step` count or is marked as done in code.
 * `On Demand Decision` - Whether the agent requests decisions at a fixed step interval or explicitly requests decisions by calling `RequestDecision()`.
+     * If not checked, the Agent will request a new 
+        decision every `Decision Frequency` steps and 
+        perform an action every step. In the example above, 
+        `CollectObservations()` will be called every 5 steps and 
+        `AgentAction()` will be called at every step. This means that the 
+        Agent will reuse the decision the Brain has given it. 
+     * If checked, the Agent controls when to receive
+        decisions, and take actions. To do so, the Agent may leverage one or two methods:
+        * `RequestDecision()` Signals that the Agent is requesting a decision.
+            This causes the Agent to collect its observations and ask the Brain for a 
+            decision at the next step of the simulation. Note that when an Agent 
+            requests a decision, it also request an action. 
+            This is to ensure that all decisions lead to an action during training.
+        * `RequestAction()` Signals that the Agent is requesting an action. The
+            action provided to the Agent in this case is the same action that was
+            provided the last time it requested a decision. 
+
+## Monitoring Agents
+
+We created a helpful `Monitor` class that enables visualizing variables within
+a Unity environment. While this was built for monitoring an Agent's value
+function throughout the training process, we imagine it can be more broadly
+useful. You can learn more [here](Feature-Monitor.md).

 ## Instantiating an Agent at Runtime

--- a/docs/Learning-Environment-Design-Brains.md
+++ b/docs/Learning-Environment-Design-Brains.md
    * `Player` - Actions are decided using keyboard input mappings.
    * `Heuristic` - Actions are decided using a custom `Decision` script, which must be attached to the Brain game object.

+## Using the Broadcast Feature
+
+The Player, Heuristic and Internal brains have been updated to support broadcast. The broadcast feature allows you to collect data from your agents using a Python program without controlling them.  
+
+### How to use: Unity
+
+To turn it on in Unity, simply check the `Broadcast` box as shown bellow:
+
+![Broadcast](images/broadcast.png)
+
+### How to use: Python 
+
+When you launch your Unity Environment from a Python program, you can see what the agents connected to non-external brains are doing. When calling `step` or `reset` on your environment, you retrieve a dictionary mapping brain names to `BrainInfo` objects. The  dictionary contains a `BrainInfo` object for each non-external brain set to broadcast as well as for any external brains.  
+
+Just like with an external brain, the `BrainInfo` object contains the fields for `visual_observations`, `vector_observations`,  `text_observations`, `memories`,`rewards`, `local_done`, `max_reached`, `agents` and `previous_actions`. Note that `previous_actions` corresponds to the actions that were taken by the agents at the previous step, not the current one.  
+
+Note that when you do a `step` on the environment, you cannot provide actions for non-external brains. If there are no external brains in the scene, simply call `step()` with no arguments.  
+
+You can use the broadcast feature to collect data generated by Player, Heuristics or Internal brains game sessions. You can then use this data to train an agent in a supervised context.


--- a/docs/Limitations-and-Common-Issues.md
+++ b/docs/Limitations-and-Common-Issues.md

 If you receive an exception `"Couldn't launch new environment because 
 communication port {} is still in use. "`, you can change the worker 
-number in the python script when calling 
+number in the Python script when calling 

 `UnityEnvironment(file_name=filename, worker_id=X)`

--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
 must react to events or games where agents can take actions of variable 
 duration. Switching between decision taking at every step and 
 on-demand-decision is one button click away. You can learn more about the 
-on-demand-decision feature [here](Feature-On-Demand-Decisions.md).
+on-demand-decision feature 
+[here](Learning-Environment-Design-Agents.md#on-demand-decision-making).

 * **Memory-enhanced Agents** - In some scenarios, agents must learn to 
 remember the past in order to take the 
 Player Brain are used to learn the policies of an agent through demonstration.
 However, this could also be helpful for the Heuristic and Internal Brains,
 particularly when debugging agent behaviors. You can learn more about using 
-the broadcasting feature [here](Feature-Broadcasting.md).
+the broadcasting feature 
+[here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
+
+* **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents
+without installing Python or TensorFlow directly, we provide a 
+[guide](Using-Docker.md) on how
+to create and run a Docker container. Due to limitations on rendering visual
+observations, this feature is marked experimental.
+
+* **Cloud Training on AWS** - To facilitate using ML-Agents on
+Amazon Web Services (AWS) machines, we provide a 
+[guide](Training-on-Amazon-Web-Service.md)
+on how to set-up EC2 instances in addition to a public pre-configured Amazon 
+Machine Image (AMI).

 ## Summary and Next Steps

--- a/docs/Python-API.md
+++ b/docs/Python-API.md

 These classes are all defined in the `python/unityagents` folder of the ML-Agents SDK.

-To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Feature-Broadcast.md).
+To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).

 For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook, which opens an environment, runs a few simulation steps taking random actions, and closes the environment. 


 ```python
 from unityagents import UnityEnvironment
-env = UnityEnvironment(file_name="3DBall", worker_id=0)
+env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
+* `seed` indicates the seed to use when generating random numbers during the training process. In environments which do not involve physics calculations, setting the seed enables reproducible experimentation by ensuring that the environment and trainers utilize the same random seed.

 ## Interacting with a Unity Environment

--- a/docs/Readme.md
+++ b/docs/Readme.md
 * [Designing a Learning Environment](Learning-Environment-Design.md)
     * [Agents](Learning-Environment-Design-Agents.md)
     * [Academy](Learning-Environment-Design-Academy.md)
-     * [Brains](Learning-Environment-Design-Brains.md)
+     * [Brains](Learning-Environment-Design-Brains.md): [Player](Learning-Environment-Design-Player-Brains.md), [Heuristic](Learning-Environment-Design-Heuristic-Brains.md), [Internal & External](Learning-Environment-Design-External-Internal-Brains.md)
+ * [Using the Monitor](Feature-Monitor.md)
 * [TensorFlowSharp in Unity (Experimental)](Using-TensorFlow-Sharp-in-Unity.md)
 
 ## Training
 * [Using TensorBoard to Observe Training](Using-Tensorboard.md)

 ## Help
+ * [Migrating to ML-Agents v0.3](Migrating-v0.3.md)
- * [Migrating to ML-Agents v0.3](Migrating-v0.3.md)
- * Brain
- * CoreBrain
 * Decision
 * Monitor

--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md
 4. Link the brains to the desired agents (one agent as the teacher and at least one agent as a student).
 5. Build the Unity executable for your desired platform.
 6. In `trainer_config.yaml`,  add an entry for the "Student" brain. Set the `trainer` parameter of this entry to `imitation`, and the `brain_to_imitate` parameter to the name of the teacher brain: "Teacher". Additionally, set `batches_per_epoch`, which controls how much training to do each moment. Increase the `max_steps` option if you'd like to keep training the agents for a longer period of time.
-7. Launch the training process with `python python/learn.py <env_name> --train --slow`, where `<env_name>` is the path to your built Unity executable.
+7. Launch the training process with `python3 python/learn.py <env_name> --train --slow`, where `<env_name>` is the path to your built Unity executable.
 8. From the Unity window, control the agent with the Teacher brain by providing "teacher demonstrations" of the behavior you would like to see.
 9. Watch as the agent(s) with the student brain attached begin to behave similarly to the demonstrations.
 10. Once the Student agents are exhibiting the desired behavior, end the training process with `CTL+C` from the command line.
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md

 The basic command for training is:

-    python learn.py <env_file_path> --run-id=<run-identifier> --train
+    python3 learn.py <env_file_path> --run-id=<run-identifier> --train

 where `<env_file_path>` is the path to your Unity executable containing the agents to be trained and `<run-identifier>` is an optional identifier you can use to identify the results of individual training runs.

 3. Navigate to the ml-agents `python` folder.
 4. Run the following to launch the training process using the path to the Unity environment you built in step 1:

-        python learn.py ../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train
+        python3 learn.py ../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train

 During a training session, the training program prints out and saves updates at regular intervals (specified by the `summary_freq` option). The saved statistics are grouped by the `run-id` value so you should assign a unique id to each training run if you plan to view the statistics. You can view these statistics using TensorBoard during or after training by running the following command (from the ML-Agents python directory):

--- a/docs/Training-on-Amazon-Web-Service.md
+++ b/docs/Training-on-Amazon-Web-Service.md

 ## Testing

-If all steps worked correctly, upload an example binary built for Linux to the instance, and test it from python with:
+If all steps worked correctly, upload an example binary built for Linux to the instance, and test it from Python with:
 ```python
 from unityagents import UnityEnvironment

--- a/docs/images/agent.png
+++ b/docs/images/agent.png
--- a/docs/images/mlagents-3DBall.png
+++ b/docs/images/mlagents-3DBall.png
--- a/docs/images/mlagents-Scene.png
+++ b/docs/images/mlagents-Scene.png
--- a/docs/images/agents_diagram.png
+++ b/docs/images/agents_diagram.png
--- a/docs/images/ml-agents-ODD.png
+++ b/docs/images/ml-agents-ODD.png
--- a/docs/Feature-Broadcasting.md
+++ b/docs/Feature-Broadcasting.md
-# Using the Broadcast Feature
-
-The Player, Heuristic and Internal brains have been updated to support broadcast. The broadcast feature allows you to collect data from your agents using a Python program without controlling them.  
-
-## How to use : Unity
-
-To turn it on in Unity, simply check the `Broadcast` box as shown bellow:
-
-![Broadcast](images/broadcast.png)
-
-## How to use : Python 
-
-When you launch your Unity Environment from a Python program, you can see what the agents connected to non-external brains are doing. When calling `step` or `reset` on your environment, you retrieve a dictionary mapping brain names to `BrainInfo` objects. The  dictionary contains a `BrainInfo` object for each non-external brain set to broadcast as well as for any external brains.  
-
-Just like with an external brain, the `BrainInfo` object contains the fields for `visual_observations`, `vector_observations`,  `text_observations`, `memories`,`rewards`, `local_done`, `max_reached`, `agents` and `previous_actions`. Note that `previous_actions` corresponds to the actions that were taken by the agents at the previous step, not the current one.  
-
-Note that when you do a `step` on the environment, you cannot provide actions for non-external brains. If there are no external brains in the scene, simply call `step()` with no arguments.  
-
-You can use the broadcast feature to collect data generated by Player, Heuristics or Internal brains game sessions. You can then use this data to train an agent in a supervised context.
--- a/docs/Feature-On-Demand-Decisions.md
+++ b/docs/Feature-On-Demand-Decisions.md
-# On Demand Decision Making
-
-## Description
-On demand decision making allows agents to request decisions from their 
-brains only when needed instead of receiving decisions at a fixed 
-frequency. This is useful when the agents commit to an action for a 
-variable number of steps or when the agents cannot make decisions 
-at the same time. This typically the case for turn based games, games 
-where agents must react to events or games where agents can take 
-actions of variable duration.
-
-## How to use
-
-To enable or disable on demand decision making, use the checkbox called
-`On Demand Decisions` in the Agent Inspector.
-
-<p align="center">
-    <img src="images/ml-agents-ODD.png" 
-        alt="On Demand Decision" 
-        width="500" border="10" />
-</p>
-
- * If `On Demand Decisions` is not checked, the Agent will request a new 
- decision every `Decision Frequency` steps and 
- perform an action every step. In the example above, 
- `CollectObservations()` will be called every 5 steps and 
- `AgentAction()` will be called at every step. This means that the 
- Agent will reuse the decision the Brain has given it. 
-
- * If `On Demand Decisions` is checked, the Agent controls when to receive
- decisions, and take actions. To do so, the Agent may leverage one or two methods:
-   * `RequestDecision()` Signals that the Agent is requesting a decision.
-   This causes the Agent to collect its observations and ask the Brain for a 
-   decision at the next step of the simulation. Note that when an Agent 
-   requests a decision, it also request an action. 
-   This is to ensure that all decisions lead to an action during training.
-   * `RequestAction()` Signals that the Agent is requesting an action. The
-   action provided to the Agent in this case is the same action that was
-   provided the last time it requested a decision.