Merge branch 'development-0.3' into hotfix/issue#333

7 年前 · 1bc43933
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md

 ![Balance Ball](images/balance.png)

-This walkthrough uses the **3D Balance Ball** environment. 3D Balance Ball 
-contains a number of platforms and balls (which are all copies of each other). 
+This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains 
+a number of platforms and balls (which are all copies of each other). 
 Each platform tries to keep its ball from falling by rotating either 
 horizontally or vertically. In this environment, a platform is an **agent** 
 that receives a reward for every step that it balances the ball. An agent is 
 training process just learns what values are better given particular state 
 observations based on the rewards received when it tries different values). 
 For example, an element might represent a force or torque applied to a 
-Rigidbody in the agent. The **Discrete** action vector space defines its
+`RigidBody` in the agent. The **Discrete** action vector space defines its
 actions as a table. A specific action given to the agent is an index into 
 this table. 

 OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) 
 explaining it.

-In order to train the agents within the Ball Balance environment:
-1. Open `python/PPO.ipynb` notebook from Jupyter.
-2. Set `env_name` to the name of your environment file earlier.
-3. (optional) In order to get the best results quickly, set `max_steps` to 
-50000, set `buffer_size` to 5000, and set `batch_size` to 512.  For this 
-exercise, this will train the model in approximately ~5-10 minutes.
-4. (optional) Set `run_path` directory to your choice. When using TensorBoard 
-to observe the training statistics, it helps to set this to a sequential value 
+To train the agents within the Ball Balance environment, we will be using the python 
+package. We have provided a convenient python wrapper script called `learn.py` which accepts arguments used to configure both training and inference phases.
+
+
+We will pass to this script the path of the environment executable that we just built. (Optionally) We can
+use `run_id` to identify the experiment and create a folder where the model and summary statistics are stored. When 
+using TensorBoard to observe the training statistics, it helps to set this to a sequential value 
-5. Run all cells of notebook with the exception of the last one under "Export 
-the trained Tensorflow graph."
+
+To summarize, go to your command line, enter the `ml-agents` directory and type: 
+
+```
+
+python python/learn.py <env_file_path> --run-id=<run-identifier> --train 
+
+```
+
+The `--train` flag tells ML-Agents to run in training mode. `env_file_path` should be the path to the Unity executable that was just created. 
+
-In order to observe the training process in more detail, you can use 
-TensorBoard. In your command line, enter into `python` directory and then run :
+
+Once you start training using `learn.py` in the way described in the previous section, the `ml-agents` folder will 
+contain a `summaries` directory. In order to observe the training process 
+in more detail, you can use TensorBoard. From the command line run :

 `tensorboard --logdir=summaries`


 ### Embedding the trained model into Unity

-1. Run the final cell of the notebook under "Export the trained TensorFlow
-graph" to produce an `<env_name >.bytes` file.
-2. Move `<env_name>.bytes` from `python/models/ppo/` into 
+1. The trained model is stored in `models/<run-identifier>` in the `ml-agents` folder. Once the 
+training is complete, there will be a `<env_name>.bytes` file in that location where `<env_name>` is the name 
+of the executable used during training. 
+ 2. Move `<env_name>.bytes` from `python/models/ppo/` into 
 `unity-environment/Assets/ML-Agents/Examples/3DBall/TFModels/`.
 3. Open the Unity Editor, and select the `3DBall` scene as described above.
 4. Select the `Ball3DBrain` object from the Scene hierarchy.
--- a/docs/Installation.md
+++ b/docs/Installation.md

 ## Install **Unity 2017.1** or Later

-[Download](https://store.unity.com/download) and install Unity.
+[Download](https://store.unity.com/download) and install Unity. If you would
+like to use our Docker set-up (introduced later), make sure to select the 
+_Linux Build Support_ component when installing Unity.
+
+<p align="center">
+    <img src="images/unity_linux_build_support.png" 
+        alt="Linux Build Support" 
+        width="500" border="10" />
+</p>

 ## Clone the ml-agents Repository


    pip3 install .

-## Docker-based Installation _[Experimental]_
+## Docker-based Installation (Experimental)

 If you'd like to use Docker for ML-Agents, please follow 
 [this guide](Using-Docker.md). 
 [submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
 make sure to cite relevant information on OS, Python version, and exact error 
 message (whenever possible). 
-
--- a/docs/Python-API.md
+++ b/docs/Python-API.md

 ## Loading a Unity Environment

-Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. In python, run:
-
+Python-side communication happens through `UnityEnvironment` which is located in `python/unityagents`. To load a Unity environment from a built binary file, put the file in the same directory as `unityagents`. If your filename is 3DBall.app, in python, run:
-env = UnityEnvironment(file_name=filename, worker_id=0)
+env = UnityEnvironment(file_name="3DBall", worker_id=0)
-* `file_name` is the name of the environment binary (located in the root directory of the python project). 
+* `file_name` is the name of the environment binary (located in the root directory of the python project).
 * `worker_id` indicates which port to use for communication with the environment. For use in parallel training regimes such as A3C.

 ## Interacting with a Unity Environment
--- a/docs/Training-PPO.md
+++ b/docs/Training-PPO.md
 # Training with Proximal Policy Optimization

-This document is still to be written. Refer to [Getting Started with the Balance Ball Environment](Getting-Started-with-Balance-Ball.md) for a walk-through of the PPO training process.
+This section is still to be written. Refer to [Getting Started with the Balance Ball Environment](Getting-Started-with-Balance-Ball.md) for a walk-through of the PPO training process.

 ## Best Practices when training with PPO


 ### Hyperparameters

-#### Batch Size
-
-`batch_size` corresponds to how many experiences are used for each gradient descent update. This should always be a fraction
-of the `buffer_size`. If you are using a continuous action space, this value should be large (in 1000s). If you are using a discrete action space, this value should be smaller (in 10s). 
-
-Typical Range (Continuous): `512` - `5120`
-
-Typical Range (Discrete): `32` - `512`
+#### Buffer Size
+`buffer_size` corresponds to how many experiences (agent observations, actions and rewards obtained) should be collected before we do any 
+learning or updating of the model. **This should be a multiple of `batch_size`**. Typically larger `buffer_size` correspond to more stable training updates.
-#### Beta (Used only in Discrete Control)
-
-`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that discrete action space agents properly explore during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`.
-
-Typical Range: `1e-4` - `1e-2`
+Typical Range: `2048` - `409600`
-#### Buffer Size
+#### Batch Size
-`buffer_size` corresponds to how many experiences should be collected before gradient descent is performed on them all.
-This should be a multiple of `batch_size`. Typically larger buffer sizes correspond to more stable training updates.
+`batch_size` is the number of experiences used for one iteration of a gradient descent update. **This should always be a fraction of the 
+`buffer_size`**. If you are using a continuous action space, this value should be large (in the order of  1000s). If you are using a discrete action space, this value 
+should be smaller (in order of 10s). 
-Typical Range: `2048` - `409600`
+Typical Range (Continuous): `512` - `5120`
-#### Epsilon
+Typical Range (Discrete): `32` - `512`
-`epsilon` corresponds to the acceptable threshold of divergence between the old and new policies during gradient descent updating. Setting this value small will result in more stable updates, but will also slow the training process.
-Typical Range: `0.1` - `0.3`
+#### Number of Epochs
-#### Hidden Units
+`num_epoch` is the number of passes through the experience buffer during gradient descent. The larger the `batch_size`, the
+larger it is acceptable to make this. Decreasing this will ensure more stable updates, at the cost of slower learning.
-`hidden_units` correspond to how many units are in each fully connected layer of the neural network. For simple problems
-where the correct action is a straightforward combination of the observation inputs, this should be small. For problems where
-the action is a very complex interaction between the observation variables, this should be larger.
+Typical Range: `3` - `10`
-Typical Range: `32` - `512`

 #### Learning Rate

 Typical Range: `1e-5` - `1e-3`

-#### Number of Epochs
-
-`num_epoch` is the number of passes through the experience buffer during gradient descent. The larger the batch size, the
-larger it is acceptable to make this. Decreasing this will ensure more stable updates, at the cost of slower learning.
-
-Typical Range: `3` - `10`

 #### Time Horizon


 #### Max Steps

-`max_steps` corresponds to how many steps of the simulation (multiplied by frame-skip) are run durring the training process. This value should be increased for more complex problems.
+`max_steps` corresponds to how many steps of the simulation (multiplied by frame-skip) are run during the training process. This value should be increased for more complex problems.
+
+Typical Range: `5e5` - `1e7`
-Typical Range: `5e5 - 1e7`
+#### Beta
+
+`beta` corresponds to the strength of the entropy regularization, which makes the policy "more random." This ensures that agents properly explore the action space during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase `beta`. If entropy drops too slowly, decrease `beta`.
+
+Typical Range: `1e-4` - `1e-2`
+
+
+#### Epsilon
+
+`epsilon` corresponds to the acceptable threshold of divergence between the old and new policies during gradient descent updating. Setting this value small will result in more stable updates, but will also slow the training process.
+
+Typical Range: `0.1` - `0.3`

 #### Normalize 

 fewer layers are likely to train faster and more efficiently. More layers may be necessary for more complex control problems.

 Typical range: `1` - `3`
+
+#### Hidden Units
+
+`hidden_units` correspond to how many units are in each fully connected layer of the neural network. For simple problems
+where the correct action is a straightforward combination of the observation inputs, this should be small. For problems where
+the action is a very complex interaction between the observation variables, this should be larger.
+
+Typical Range: `32` - `512`

 ### Training Statistics

--- a/docs/Using-Docker.md
+++ b/docs/Using-Docker.md
-# Using Docker For ML Agents (Experimental)
+# Using Docker For ML-Agents (Experimental)
-We are currently offering an experimental solution for Windows and Mac users who would like to do training or inference using Docker. This option may be appealing to users who would like to avoid dealing with Python and TensorFlow installation on their host machines. This setup currently forces both TensorFlow and Unity to rely on _only_ the CPU for computation purposes. As such, we currently only support training using environments that only contain agents which use vector observations, rather than camera-based visual observations. For example, the [GridWorld](Example-Environments.md#gridworld) environment which use visual observations for training is not supported. 
+We currently offer an experimental solution for Windows and Mac users who would like to do training or inference using Docker. This option may be appealing to those who would like to avoid installing Python and TensorFlow themselves. The current setup forces both TensorFlow and Unity to _only_ rely on the CPU for computations. Consequently, our Docker support is limited to environments whose agents **do not** use camera-based visual observations. For example, the [GridWorld](Learning-Environment-Examples.md#gridworld) environment is **not** supported.
- Unity Linux Standalone Player ([Link](https://unity3d.com/get-unity/download?ref=professional&_ga=2.161111422.259506921.1519336396-1357272041.1488299149))
- Docker ([Link](https://www.docker.com/community-edition#/download))
+- Unity _Linux Build Support_ Component
+- [Docker](https://www.docker.com)
- Install Docker (see link above) if you don't have it setup on your machine. 
+- [Download](https://unity3d.com/get-unity/download) the Unity Installer and
+add the _Linux Build Support_ Component
- Since Docker runs a container in an environment that is isolated from the host machine, we will be using a mounted directory, e.g. `unity-volume` in your host machine in order to share data, e.g. the Unity executable, curriculum files and tensorflow graph.
+- [Download](https://www.docker.com/community-edition#/download) and
+install Docker if you don't have it setup on your machine.
+
+- Since Docker runs a container in an environment that is isolated from the host machine, a mounted directory in your host machine is used to share data, e.g. the Unity executable, curriculum files and tensorflow graph. For convenience, we created an empty `unity-volume` directory at the root of the repository for this purpose, but feel free to use any other directory. The remainder of this guide assumes that the `unity-volume` directory is the one used.
- Docker typically runs a container sharing a (linux) kernel with the host machine, this means that the 
-Unity environment **has** to be built for the **linux platform**. From the Build Settings Window, please select the architecture to be `x86_64` and choose the build to be `headless` (_This is important because we are running it in a container that does not have graphics drivers installed_). 
-Save the generated environment in the directory to be mounted (e.g. we have conveniently created an empty directory called at the top level `unity-volume`). 
+Using Docker for ML-Agents involves three steps: building the Unity environment with specific flags, building a Docker container and, finally, running the container. If you are not familiar with building a Unity environment for ML-Agents, please read through our [Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md) guide first.
+
+### Build the Environment
+
+Since Docker typically runs a container sharing a (linux) kernel with the host machine, the 
+Unity environment **has** to be built for the **linux platform**. When building a Unity environment, please select the following options from the the Build Settings window:
+- Set the _Target Platform_ to `Linux`
+- Set the _Architecture_ to `x86_64`
+- `Uncheck` the _Development Build_ option
+- `Check` the _Headless Mode_ option. (_This is required because the Unity binary will run in a container that does not have graphics drivers installed_.) 
- Ensure that `unity-volume/<environment-name>.x86_64` and `unity-volume/environment-name_Data`. So for example, `<environment_name>` might be `3Dball` and you might want to ensure that `unity-volume/3Dball.x86_64` and `unity-volume/3Dball_Data` are both present in the directory `unity-volume`.
+Then click `Build`, pick an environment name (e.g. `3DBall`) and set the output directory to `unity-volume`. After building, ensure that the file `<environment-name>.x86_64` and subdirectory `<environment-name>_Data/` are created under `unity-volume`.
+
+### Build the Docker Container
- Make sure the docker engine is running on your machine, then build the docker container by running `docker build  -t <image_name> .` . in the top level of the source directory. Replace `<image_name>` by the name of the image that you want to use, e.g. `balance.ball.v0.1`.
+First, make sure the Docker engine is running on your machine. Then build the Docker container by calling the following command at the top-level of the repository:
+```
+docker build -t <image-name> .
+``` 
+Replace `<image-name>` with a name for the Docker image, e.g. `balance.ball.v0.1`.
- Run the container:
+### Run the Docker Container
+Run the Docker container by calling the following command at the top-level of the repository:
-	 <image-name>:latest <environment-name> \
-	 --docker-target-name=unity-volume \
-	 --train --run-id=<run-id>
+     <image-name>:latest <environment-name> \
+     --docker-target-name=unity-volume \
+     --train --run-id=<run-id>
-For the `3DBall` environment, for example this would be:
+Notes on argument values:
+- `<image-name>` and `<environment-name>`: References the image and environment names, respectively.
+- `source`: Reference to the path in your host OS where you will store the Unity executable. 
+- `target`: Tells Docker to mount the `source` path as a disk with this name. 
+- `docker-target-name`: Tells the ML-Agents Python package what the name of the disk where it can read the Unity executable and store the graph. **This should therefore be identical to `target`.**
+- `train`, `run-id`: ML-Agents arguments passed to `learn.py`. `train` trains the algorithm, `run-id` is used to tag each experiment with a unique identifier. 
- Run the container:
+For the `3DBall` environment, for example this would be:
-	 balance.ball.v0.1:latest 3Dball \
-	 --docker-target-name=unity-volume \
-	 --train --run-id=<run-id>
+     balance.ball.v0.1:latest 3Dball \
+     --docker-target-name=unity-volume \
+     --train --run-id=3dball_first_trial
-**Notes on argument values** 
-
- `source` : Reference to the path in your host OS where you will store the Unity executable. 
- `target`: Tells docker to mount the `source` path as a disk with this name. 
- `docker-target-name`: Tells the ML-Agents python package what the name of the disk where it can read the Unity executable and store the graph.**This should therefore be identical to the `target`.**
- `train`, `run-id`: ML-Agents arguments passed to `learn.py`. `train` trains the algorithm, `run-id` is used to tag each experiment with a unique id. 
-
-
-For more details on docker mounts, look at [these](https://docs.docker.com/storage/bind-mounts/) docs from Docker.
-
-
-
-
+For more detail on Docker mounts, check out [these](https://docs.docker.com/storage/bind-mounts/) docs from Docker.
--- a/docs/images/docker_build_settings.png
+++ b/docs/images/docker_build_settings.png
--- a/python/learn.py
+++ b/python/learn.py
    logger = logging.getLogger("unityagents")
    _USAGE = '''
    Usage:
-      learn (<env>) [options] 
+      learn (<env>) [options]
+      learn --help
-      --help                     Show this message.
      --curriculum=<file>        Curriculum json file for environment [default: None].
      --keep-checkpoints=<n>     How many model checkpoints to keep [default: 5].
      --lesson=<n>               Start learning from this lesson [default: 0].
--- a/python/unityagents/environment.py
+++ b/python/unityagents/environment.py
            for i in range(self._num_brains):
                self._brains[self._brain_names[i]] = BrainParameters(self._brain_names[i], p["brainParameters"][i])
            self._loaded = True
-            logger.info("\n'{}' started successfully!".format(self._academy_name))
+            logger.info("\n'{0}' started successfully!\n{1}".format(self._academy_name, str(self)))
            if self._num_external_brains == 0:
                logger.warning(" No External Brains found in the Unity Environment. "
                               "You will not be able to pass actions to your agent(s).")
--- a/python/unitytrainers/trainer_controller.py
+++ b/python/unitytrainers/trainer_controller.py
        tf.set_random_seed(self.seed)
        self.env = UnityEnvironment(file_name=env_path, worker_id=self.worker_id,
                                    curriculum=self.curriculum_file, seed=self.seed)
-        self.logger.info(str(self.env))
        self.env_name = os.path.basename(os.path.normpath(env_path))  # Extract out name of environment

    def _get_progress(self):
--- a/unity-environment/Assets/ML-Agents/Editor/MLAgentsEditModeTest.cs
+++ b/unity-environment/Assets/ML-Agents/Editor/MLAgentsEditModeTest.cs
            collectObservationsCalls += 1;
        }

-        public override void AgentAction(float[] act)
+        public override void AgentAction(float[] vetorAction, string textAction)
        {
            agentActionCalls += 1;
            AddReward(0.1f);
        }
    }

+    // This is an empty class for testing the behavior of agents and academy
+    // It is left empty because we are not testing any brain behavior
-        // TODO : Mock a brain
+
    }


                           BindingFlags.Instance | BindingFlags.NonPublic);
            AcademyInitializeMethod.Invoke(aca, new object[] { });
            Assert.AreEqual(1, aca.initializeAcademyCalls);
-            Assert.AreEqual(1, aca.episodeCount);
+            Assert.AreEqual(0, aca.episodeCount);
-            Assert.AreEqual(1, aca.academyResetCalls);
+            Assert.AreEqual(0, aca.academyResetCalls);
            Assert.AreEqual(0, aca.AcademyStepCalls);
        }

            Assert.AreEqual(false, agent1.IsDone());
            Assert.AreEqual(false, agent2.IsDone());
            // agent1 was not enabled when the academy started
+            // The agents have been initialized
-            Assert.AreEqual(1, agent2.agentResetCalls);
+            Assert.AreEqual(0, agent2.agentResetCalls);
            Assert.AreEqual(1, agent1.initializeAgentCalls);
            Assert.AreEqual(1, agent2.initializeAgentCalls);
            Assert.AreEqual(0, agent1.agentActionCalls);
            MethodInfo AcademyStepMethod = typeof(Academy).GetMethod("_AcademyStep",
                           BindingFlags.Instance | BindingFlags.NonPublic);

+            int numberReset = 0;
-                Assert.AreEqual(1, aca.episodeCount);
+                Assert.AreEqual(numberReset, aca.episodeCount);
-                Assert.AreEqual(1, aca.academyResetCalls);
+                Assert.AreEqual(numberReset, aca.academyResetCalls);
+                // The reset happens at the begining of the first step
+                if (i == 0)
+                { 
+                    numberReset += 1;
+                }
-
            }
        }


            AgentEnableMethod.Invoke(agent1, new object[] { aca });
            AcademyInitializeMethod.Invoke(aca, new object[] { });
-            AgentEnableMethod.Invoke(agent2, new object[] { aca });
+            int numberAgent1Reset = 0; 
+            int numberAgent2Initialization = 0; 
-                
-                Assert.AreEqual(1, agent1.agentResetCalls);
-                Assert.AreEqual(0, agent2.agentResetCalls);
+                Assert.AreEqual(numberAgent1Reset, agent1.agentResetCalls);
+                // Agent2 is never reset since intialized after academy
+                Assert.AreEqual(0, agent2.agentResetCalls); 
-                Assert.AreEqual(1, agent2.initializeAgentCalls);
+                Assert.AreEqual(numberAgent2Initialization, agent2.initializeAgentCalls);
-                if (i % 3 == 0)
+                // Agent 1 resets at the first step
+                if (i == 0) 
+                    numberAgent1Reset += 1;
+                }
+                //Agent 2 is only initialized at step 2
+                if (i == 2) 
+                {
+                    AgentEnableMethod.Invoke(agent2, new object[] { aca });
+                    numberAgent2Initialization += 1;
+                }
+
+                // We are testing request decision and request actions when called
+                // at different intervals
+                if ((i % 3 == 0) && (i > 2))
+                {
+                    //Every 3 steps after agent 2 is initialized, request decision
-                else if (i % 5 == 0)
+                else if ((i % 5 == 0) && (i > 2))
+                    // Every 5 steps after agent 2 is initialized, request action
                    requestAction += 1;
                    agent2.RequestAction();
                }
            MethodInfo AcademyStepMethod = typeof(Academy).GetMethod("_AcademyStep",
                           BindingFlags.Instance | BindingFlags.NonPublic);

-            int numberReset = 1;
+            int numberReset = 0;
            int stepsSinceReset = 0;
            for (int i = 0; i < 50; i++)
            {
                Assert.AreEqual(false, aca.IsDone());
                Assert.AreEqual(numberReset, aca.academyResetCalls);
                Assert.AreEqual(i, aca.AcademyStepCalls);
+                // Academy resets at the first step
+                if (i == 0)
+                {
+                    numberReset += 1;
+                }
-                if (i % 5 == 3)
+                // Regularly set the academy to done to check behavior
+                if (i % 5 == 3) 
                {
                    aca.Done();
                    numberReset += 1;

            AgentEnableMethod.Invoke(agent2, new object[] { aca });
            AcademyInitializeMethod.Invoke(aca, new object[] { });
-            AgentEnableMethod.Invoke(agent1, new object[] { aca });
-            int numberAgent1Reset = 0; // Agent1 was not enabled at Academy start
-            int numberAgent2Reset = 1; 
-            int numberAcaReset = 1;
+            int numberAgent1Reset = 0; 
+            int numberAgent2Reset = 0; 
+            int numberAcaReset = 0;
            int acaStepsSinceReset = 0;
            int agent1StepSinceReset =0;
            int agent2StepSinceReset=0;
                Assert.AreEqual(numberAgent1Reset, agent1.agentResetCalls);
                Assert.AreEqual(numberAgent2Reset, agent2.agentResetCalls);

-                acaStepsSinceReset += 1;
-                agent1StepSinceReset += 1;
-                agent2StepSinceReset += 1;
+                // Agent 2  and academy reset at the first step
+                if (i == 0) 
+                {
+                    numberAcaReset += 1;
+                    numberAgent2Reset += 1;
+                }
+                //Agent 1 is only initialized at step 2
+                if (i == 2) 
+                {
+                    AgentEnableMethod.Invoke(agent1, new object[] { aca });
-                if (i % 100 == 3)
+                }
+                // Reset Academy every 100 steps
+                if (i % 100 == 3) 
-                    acaStepsSinceReset = 1;
+                    acaStepsSinceReset = 0;
-                if (i % 11 == 5)
+                // Set agent 1 to done every 11 steps to test behavior
+                if (i % 11 == 5) 
-                if (i % 13 == 3)
+                // Reseting agent 2 regularly
+                if (i % 13 == 3) 
                {
                    if (!(agent2.IsDone()||aca.IsDone()))
                    {
                        numberAgent2Reset += 1;
-                        agent2StepSinceReset = 1;
+                        agent2StepSinceReset = 0;
-
-                if (i % 3 == 2)
+                // Request a decision for agent 2 regularly
+                if (i % 3 == 2) 
-                else if (i % 5 == 1)
+                else if (i % 5 == 1) 
+                    // Request an action without decision regularly
-                if (agent1.IsDone() && (((acaStepsSinceReset+1) % agent1.agentParameters.numberOfActionsBetweenDecisions==0)) || aca.IsDone())
+                if (agent1.IsDone() && (((acaStepsSinceReset) % agent1.agentParameters.numberOfActionsBetweenDecisions==0)) || aca.IsDone())
-                    agent1StepSinceReset = 1;
+                    agent1StepSinceReset = 0;
-                    agent2StepSinceReset = 1;
+                    agent2StepSinceReset = 0;
+                acaStepsSinceReset += 1;
+                agent1StepSinceReset += 1;
+                agent2StepSinceReset += 1;
+                //Agent 1 is only initialized at step 2
+                if (i < 2) 
+                {
+                    agent1StepSinceReset = 0;
+                }
                AcademyStepMethod.Invoke((object)aca, new object[] { });


            FieldInfo maxStep = typeof(Academy).GetField("maxSteps", BindingFlags.Instance | BindingFlags.NonPublic);
            maxStep.SetValue((object)aca, 20);

-            int numberReset = 1;
+            int numberReset = 0;
-
+                Assert.AreEqual(false, aca.IsDone());
+
+                Assert.AreEqual(i, aca.AcademyStepCalls);
-
-                Assert.AreEqual(false, aca.IsDone());
-                Assert.AreEqual(i, aca.AcademyStepCalls);
-
-                if ((i % 20 == 0) && (i>0))
+                // Make sure max step is reached every 20 steps
+                if (i % 20 == 0) 
                {
                    numberReset += 1;
                    stepsSinceReset = 1;

            AgentEnableMethod.Invoke(agent2, new object[] { aca });
            AcademyInitializeMethod.Invoke(aca, new object[] { });
-            AgentEnableMethod.Invoke(agent1, new object[] { aca });
+
-            int numberAgent1Reset = 0; // Agent1 was not enabled at Academy start
-            int numberAgent2Reset = 1;
-            int numberAcaReset = 1;
+            int numberAgent1Reset = 0; 
+            int numberAgent2Reset = 0;
+            int numberAcaReset = 0;
            int acaStepsSinceReset = 0;
            int agent1StepSinceReset = 0;
            int agent2StepSinceReset = 0;
                Assert.AreEqual(acaStepsSinceReset, aca.stepsSinceReset);
                Assert.AreEqual(1, aca.initializeAcademyCalls);
-                Assert.AreEqual(numberAcaReset, aca.episodeCount);
-                Assert.AreEqual(numberAcaReset, aca.academyResetCalls);
+
+
+                Assert.AreEqual(numberAcaReset, aca.episodeCount);
+                Assert.AreEqual(numberAcaReset, aca.academyResetCalls);
-                agent2.RequestDecision(); // we request a decision at each step
-                acaStepsSinceReset += 1;
-                agent1StepSinceReset += 1;
-                agent2StepSinceReset += 1;
+                //At the first step, Academy and agent 2 reset
+                if (i == 0) 
+                {
+                    numberAcaReset += 1;
+                    numberAgent2Reset += 1;
+                }
+                //Agent 1 is only initialized at step 2
+                if (i == 2) 
+                {
+                    AgentEnableMethod.Invoke(agent1, new object[] { aca });
+                }
+
+                // we request a decision at each step
+                agent2.RequestDecision(); 
+
-                    if (i % 100 == 0)
+                    // Make sure the academy max steps at 100
+                    if (i % 100 == 0) 
-                        acaStepsSinceReset = 1;
-                        agent1StepSinceReset = 1;
-                        agent2StepSinceReset = 1;
+                        acaStepsSinceReset = 0;
+                        agent1StepSinceReset = 0;
+                        agent2StepSinceReset = 0;
                        numberAcaReset += 1;
                        numberAgent1Reset += 1;
                        numberAgent2Reset += 1;
-                        if ((i % 100) % 21 == 0)
+                        //Make sure the agents reset when their max steps is reached
+                        if (agent1StepSinceReset % 21 == 0)
-                            agent1StepSinceReset = 1;
+                            agent1StepSinceReset = 0;
-                        if ((i % 100) % 31 == 0)
+                        if (agent2StepSinceReset % 31 == 0)
-                            agent2StepSinceReset = 1;
+                            agent2StepSinceReset = 0;
+                acaStepsSinceReset += 1;
+                agent1StepSinceReset += 1;
+                agent2StepSinceReset += 1;
+
+                //Agent 1 is only initialized at step 2
+                if (i < 2) 
+                {
+                    agent1StepSinceReset = 0;
+                }
+
+
            }

        }
            brain.brainParameters = new BrainParameters();
            // We use event based so the agent will now try to send anything to the brain
            agent1.agentParameters.onDemandDecision = false;
+            // agent1 will take an action at every step and request a decision every steps
-            // agent1 will take an action at every step and request a decision every 2 steps
-            agent2.agentParameters.onDemandDecision = true;
+            agent2.agentParameters.onDemandDecision = true;
+            //Here we specify that the agent does not reset when done
-            agent2.agentParameters.resetOnDone = false; // Here we specify that the agent does not reset when done
+            agent2.agentParameters.resetOnDone = false; 
            brain.brainParameters.vectorObservationSize = 0;
            brain.brainParameters.cameraResolutions = new resolution[0];
            agent1.GiveBrain(brain);
                Assert.AreEqual(agent1ResetOnDone, agent1.agentOnDoneCalls);
                Assert.AreEqual(agent2ResetOnDone, agent2.agentOnDoneCalls);

-                agent2.RequestDecision(); // we request a decision at each step
+                // we request a decision at each step
+                agent2.RequestDecision(); 
                acaStepsSinceReset += 1;
                if (agent1ResetOnDone ==0)
                    agent1StepSinceReset += 1;
--- a/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs

    }

-    public override void AgentAction(float[] act)
+    public override void AgentAction(float[] vectorAction, string textAction)
-            float action_z = 2f * Mathf.Clamp(act[0], -1f, 1f);
+            float action_z = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
-            float action_x = 2f * Mathf.Clamp(act[1], -1f, 1f);
+            float action_x = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
            if ((gameObject.transform.rotation.x < 0.25f && action_x > 0f) ||
                (gameObject.transform.rotation.x > -0.25f && action_x < 0f))
            {
            Done();
            SetReward(-1f);
        }
-


    }
--- a/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs
        AddVectorObs((ball.transform.position.z - gameObject.transform.position.z));
    }

-    public override void AgentAction(float[] act)
+    public override void AgentAction(float[] vectorAction, string textAction)
-            float action_z = 2f * Mathf.Clamp(act[0], -1f, 1f);
+            float action_z = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
-            float action_x = 2f * Mathf.Clamp(act[1], -1f, 1f);
+            float action_x = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
            if ((gameObject.transform.rotation.x < 0.25f && action_x > 0f) ||
                (gameObject.transform.rotation.x > -0.25f && action_x < 0f))
            {
--- a/unity-environment/Assets/ML-Agents/Examples/Area/Scripts/AreaAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Area/Scripts/AreaAgent.cs
        }
    }

-    public override void AgentAction(float[] act)
-    {
+    public override void AgentAction(float[] vectorAction, string textAction)
+	{
-        MoveAgent(act);
+        MoveAgent(vectorAction);

        if (gameObject.transform.position.y < 0.0f || Mathf.Abs(gameObject.transform.position.x - area.transform.position.x) > 8f || 
            Mathf.Abs(gameObject.transform.position.z + 5 - area.transform.position.z) > 8)
--- a/unity-environment/Assets/ML-Agents/Examples/Area/Scripts/Push/PushAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Area/Scripts/Push/PushAgent.cs

    }

-    public override void AgentAction(float[] act)
-    {
+    public override void AgentAction(float[] vectorAction, string textAction)
+	{
-        MoveAgent(act);
+        MoveAgent(vectorAction);

        if (gameObject.transform.position.y < 0.0f || Mathf.Abs(gameObject.transform.position.x - area.transform.position.x) > 8f ||
            Mathf.Abs(gameObject.transform.position.z + 5 - area.transform.position.z) > 8)
--- a/unity-environment/Assets/ML-Agents/Examples/Area/Scripts/Wall/WallAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Area/Scripts/Wall/WallAgent.cs
        AddVectorObs(blockVelocity.z);
    }

-    public override void AgentAction(float[] act)
-    {
+    public override void AgentAction(float[] vectorAction, string textAction)
+	{
-        MoveAgent(act);
+        MoveAgent(vectorAction);

        if (gameObject.transform.position.y < 0.0f ||
            Mathf.Abs(gameObject.transform.position.x - area.transform.position.x) > 8f ||
--- a/unity-environment/Assets/ML-Agents/Examples/Banana/Scripts/BananaAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Banana/Scripts/BananaAgent.cs



-    public override void AgentAction(float[] act)
+    public override void AgentAction(float[] vectorAction, string textAction)
-        MoveAgent(act);
+        MoveAgent(vectorAction);
    }

    public override void AgentReset()
--- a/unity-environment/Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Basic/Scripts/BasicAgent.cs
        AddVectorObs(position);
    }

-    public override void AgentAction(float[] act)
-    {
-        float movement = act[0];
-        int direction = 0;
-        if (movement == 0) { direction = -1; }
-        if (movement == 1) { direction = 1; }
+    public override void AgentAction(float[] vectorAction, string textAction)
+	{
+        float movement = vectorAction[0];
+		int direction = 0;
+		if (movement == 0) { direction = -1; }
+		if (movement == 1) { direction = 1; }

        position += direction;
        if (position < minPosition) { position = minPosition; }
--- a/unity-environment/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Bouncer/Scripts/BouncerAgent.cs
        AddVectorObs(banana.transform.position.z / 25f);
    }

-    public override void AgentAction(float[] act)
-    {
-        float x = Mathf.Clamp(act[0], -1, 1);
-        float z = Mathf.Clamp(act[1], -1, 1);
+    public override void AgentAction(float[] vectorAction, string textAction)
+	{
+        float x = Mathf.Clamp(vectorAction[0], -1, 1);
+        float z = Mathf.Clamp(vectorAction[1], -1, 1);
        rb.velocity = new Vector3(x, 0, z) ;
        if (rb.velocity.magnitude < 0.01f){
            AddReward(-1);
--- a/unity-environment/Assets/ML-Agents/Examples/Crawler/Scripts/CrawlerAgentConfigurable.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Crawler/Scripts/CrawlerAgentConfigurable.cs
        }
    }

-    public override void AgentAction(float[] act)
+    public override void AgentAction(float[] vectorAction, string textAction)
-        for (int k = 0; k < act.Length; k++)
+        for (int k = 0; k < vectorAction.Length; k++)
-            act[k] = Mathf.Clamp(act[k], -1f, 1f);
+            vectorAction[k] = Mathf.Clamp(vectorAction[k], -1f, 1f);
-        limbRBs[0].AddTorque(-limbs[0].transform.right * strength * act[0]);
-        limbRBs[1].AddTorque(-limbs[1].transform.right * strength * act[1]);
-        limbRBs[2].AddTorque(-limbs[2].transform.right * strength * act[2]);
-        limbRBs[3].AddTorque(-limbs[3].transform.right * strength * act[3]);
-        limbRBs[0].AddTorque(-body.transform.up * strength * act[4]);
-        limbRBs[1].AddTorque(-body.transform.up * strength * act[5]);
-        limbRBs[2].AddTorque(-body.transform.up * strength * act[6]);
-        limbRBs[3].AddTorque(-body.transform.up * strength * act[7]);
-        limbRBs[4].AddTorque(-limbs[4].transform.right * strength * act[8]);
-        limbRBs[5].AddTorque(-limbs[5].transform.right * strength * act[9]);
-        limbRBs[6].AddTorque(-limbs[6].transform.right * strength * act[10]);
-        limbRBs[7].AddTorque(-limbs[7].transform.right * strength * act[11]);
+        limbRBs[0].AddTorque(-limbs[0].transform.right * strength * vectorAction[0]);
+        limbRBs[1].AddTorque(-limbs[1].transform.right * strength * vectorAction[1]);
+        limbRBs[2].AddTorque(-limbs[2].transform.right * strength * vectorAction[2]);
+        limbRBs[3].AddTorque(-limbs[3].transform.right * strength * vectorAction[3]);
+        limbRBs[0].AddTorque(-body.transform.up * strength * vectorAction[4]);
+        limbRBs[1].AddTorque(-body.transform.up * strength * vectorAction[5]);
+        limbRBs[2].AddTorque(-body.transform.up * strength * vectorAction[6]);
+        limbRBs[3].AddTorque(-body.transform.up * strength * vectorAction[7]);
+        limbRBs[4].AddTorque(-limbs[4].transform.right * strength * vectorAction[8]);
+        limbRBs[5].AddTorque(-limbs[5].transform.right * strength * vectorAction[9]);
+        limbRBs[6].AddTorque(-limbs[6].transform.right * strength * vectorAction[10]);
+        limbRBs[7].AddTorque(-limbs[7].transform.right * strength * vectorAction[11]);
-        float torque_penalty = act[0] * act[0] + act[1] * act[1] + act[2] * act[2] + act[3] * act[3]
-                         + act[4] * act[4] + act[5] * act[5] + act[6] * act[6] + act[7] * act[7]
-                         + act[8] * act[8] + act[9] * act[9] + act[10] * act[10] + act[11] * act[11];
+        float torque_penalty = vectorAction[0] * vectorAction[0] + 
+            vectorAction[1] * vectorAction[1] + 
+            vectorAction[2] * vectorAction[2] + 
+            vectorAction[3] * vectorAction[3] +
+            vectorAction[4] * vectorAction[4] + 
+            vectorAction[5] * vectorAction[5] + 
+            vectorAction[6] * vectorAction[6] + 
+            vectorAction[7] * vectorAction[7] + 
+            vectorAction[8] * vectorAction[8] + 
+            vectorAction[9] * vectorAction[9] + 
+            vectorAction[10] * vectorAction[10] + 
+            vectorAction[11] * vectorAction[11];

        if (!IsDone())
        {
--- a/unity-environment/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs
    }

    // to be implemented by the developer
-    public override void AgentAction(float[] act)
+    public override void AgentAction(float[] vectorAction, string textAction)
-        int action = Mathf.FloorToInt(act[0]);
+        int action = Mathf.FloorToInt(vectorAction[0]);

        // 0 - Forward, 1 - Backward, 2 - Left, 3 - Right
        Vector3 targetPos = transform.position;
--- a/unity-environment/Assets/ML-Agents/Examples/Hallway/Scripts/HallwayAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Hallway/Scripts/HallwayAgent.cs
        agentRB.AddForce(dirToGo * academy.agentRunSpeed, ForceMode.VelocityChange); // GO
    }

-    public override void AgentAction(float[] act)
+    public override void AgentAction(float[] vectorAction, string textAction)
-        MoveAgent(act); // perform agent actions
+        MoveAgent(vectorAction); //perform agent actions
        bool fail = false;  // did the agent or block get pushed off the edge?

        if (!Physics.Raycast(agentRB.position, Vector3.down, 20)) // if the agent has gone over the edge, we done.
--- a/unity-environment/Assets/ML-Agents/Examples/Reacher/Scripts/ReacherAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Reacher/Scripts/ReacherAgent.cs
    /// <summary>
    /// The agent's four actions correspond to torques on each of the two joints.
    /// </summary>
-    public override void AgentAction(float[] act)
-    {
+    public override void AgentAction(float[] vectorAction, string textAction)
+	{
-        float torque_x = Mathf.Clamp(act[0], -1, 1) * 100f;
-        float torque_z = Mathf.Clamp(act[1], -1, 1) * 100f;
+        float torque_x = Mathf.Clamp(vectorAction[0], -1, 1) * 100f;
+        float torque_z = Mathf.Clamp(vectorAction[1], -1, 1) * 100f;
-        torque_x = Mathf.Clamp(act[2], -1, 1) * 100f;
-        torque_z = Mathf.Clamp(act[3], -1, 1) * 100f;
+        torque_x = Mathf.Clamp(vectorAction[2], -1, 1) * 100f;
+        torque_z = Mathf.Clamp(vectorAction[3], -1, 1) * 100f;
        rbB.AddTorque(new Vector3(torque_x, 0f, torque_z));
 	}

--- a/unity-environment/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs
    }

    // to be implemented by the developer
-    public override void AgentAction(float[] act)
+    public override void AgentAction(float[] vectorAction, string textAction)
-        moveX = 0.25f * Mathf.Clamp(act[0], -1f, 1f) * invertMult;
-        if (Mathf.Clamp(act[1], -1f, 1f) > 0f && gameObject.transform.position.y - transform.parent.transform.position.y < -1.5f)
+        moveX = 0.25f * Mathf.Clamp(vectorAction[0], -1f, 1f) * invertMult;
+        if (Mathf.Clamp(vectorAction[1], -1f, 1f) > 0f && gameObject.transform.position.y - transform.parent.transform.position.y < -1.5f)
        {
            moveY = 0.5f;
            gameObject.GetComponent<Rigidbody>().velocity = new Vector3(GetComponent<Rigidbody>().velocity.x, moveY * 12f, 0f);
--- a/unity-environment/Assets/ML-Agents/Scripts/Agent.cs
+++ b/unity-environment/Assets/ML-Agents/Scripts/Agent.cs
    }

    /// <summary>
-    /// Adds a vector observation. 
-    /// Note that the number of vector observation to add
+    /// Appends float values to the vector observation.
+    /// Note that the total number of vector observation added
-    /// <param name="observation">The float value to add to 
+    /// <param name="observation">The value to add to 
+    internal void AddVectorObs(int observation)
+    {
+        _info.vectorObservation.Add((float)observation);
+    }
+    internal void AddVectorObs(Vector3 observation)
+    {
+        _info.vectorObservation.Add(observation.x);
+        _info.vectorObservation.Add(observation.y);
+        _info.vectorObservation.Add(observation.z);
+    }
+    internal void AddVectorObs(Vector2 observation)
+    {
+        _info.vectorObservation.Add(observation.x);
+        _info.vectorObservation.Add(observation.y);
+    }
+    internal void AddVectorObs(float[] observation)
+    {
+            _info.vectorObservation.AddRange(observation);
+    }
+    internal void AddVectorObs(List<float> observation)
+    {
+            _info.vectorObservation.AddRange(observation);
+    }
+
+
+    /// <summary>
+    /// Sets the text observation.
+    /// </summary>
+    /// <param name="s">The string the text observation must be set to.</param>
    internal void SetTextObs(object s)
    {
        _info.textObservation = s.ToString();
    /// </summary>
    /// <param name="action">The action the agent receives 
    /// from the brain.</param>
-    public virtual void AgentAction(float[] action)
+    public virtual void AgentAction(float[] vectorAction, string textAction)
    {

    }
        if ((requestAction) && (brain != null))
        {
            requestAction = false;
-            AgentAction(_action.vectorActions);
+            AgentAction(_action.vectorActions, _action.textActions);
        }

        if ((stepCounter >= agentParameters.maxStep)
--- a/unity-environment/Assets/ML-Agents/Scripts/ExternalCommunicator.cs
+++ b/unity-environment/Assets/ML-Agents/Scripts/ExternalCommunicator.cs
            {
                var brainName = brain.gameObject.name;

+                if (current_agents[brainName].Count() == 0)
+                {
+                    continue;
+                }
                var memorySize = rMessage.memory[brainName].Count() / current_agents[brainName].Count();

                for (int i = 0; i < current_agents[brainName].Count(); i++)
--- a/unity-environment/Assets/ML-Agents/Template/Scripts/TemplateAgent.cs
+++ b/unity-environment/Assets/ML-Agents/Template/Scripts/TemplateAgent.cs

    }

-    public override void AgentAction(float[] act)
-    {
+    public override void AgentAction(float[] vectorAction, string textAction)
+	{

    }

--- a/docs/images/unity_linux_build_support.png
+++ b/docs/images/unity_linux_build_support.png