Optional gym wrapper (#1007)

Adds optional gym wrapper UnityEnv to use as python interfaces to Unity environments.
6 年前 · c600a706
--- a/.gitignore
+++ b/.gitignore
 # VSCode hidden files
 *.vscode/

+.DS_Store
+.ipynb_checkpoints
+
 # pytest cache 
 *.pytest_cache/
--- a/docs/Background-Jupyter.md
+++ b/docs/Background-Jupyter.md
 # Background: Jupyter

 [Jupyter](https://jupyter.org) is a fantastic tool for writing code with 
-embedded visualizations. We provide one such notebook, `python/Basics.ipynb`, 
+embedded visualizations. We provide one such notebook, `python/notebooks/getting-started.ipynb`, 
 for testing the Python control interface to a Unity build. This notebook is 
 introduced in the 
 [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
--- a/docs/Python-API.md
+++ b/docs/Python-API.md

 To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).

-For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook (`python/Basics.ipynb`), which opens an environment, runs a few simulation steps taking random actions, and closes the environment. 
+For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook (`python/notebooks/getting-started.ipynb`), which opens an environment, runs a few simulation steps taking random actions, and closes the environment. 

 _Notice: Currently communication between Unity and Python takes place over an open socket without authentication. As such, please make sure that the network where training takes place is secure. This will be addressed in a future release._

--- a/docs/Readme.md
+++ b/docs/Readme.md
 ## API Docs
 * [API Reference](API-Reference.md)
 * [How to use the Python API](Python-API.md)
+ * [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
--- a/python/tests/mock_communicator.py
+++ b/python/tests/mock_communicator.py


 class MockCommunicator(Communicator):
-    def __init__(self, discrete_action=False, visual_inputs=0):
+    def __init__(self, discrete_action=False, visual_inputs=0, stack=True, num_agents=3):
        """
        Python side of the grpc communication. Python is the client and Unity the server

        self.steps = 0
        self.visual_inputs = visual_inputs
        self.has_been_closed = False
+        self.num_agents = num_agents
+        if stack:
+            self.num_stacks = 2
+        else:
+            self.num_stacks = 1

    def initialize(self, inputs: UnityInput) -> UnityOutput:
        resolutions = [ResolutionProto(
        bp = BrainParametersProto(
            vector_observation_size=3,
-            num_stacked_vector_observations=2,
+            num_stacked_vector_observations=self.num_stacks,
            vector_action_size=[2],
            camera_resolutions=resolutions,
            vector_action_descriptions=["", ""],
        else:
            vector_action = [1, 2]
        list_agent_info = []
-        for i in range(3):
+        if self.num_stacks == 1:
+            observation = [1, 2, 3]
+        else:
+            observation = [1, 2, 3, 1, 2, 3]
+
+        for i in range(self.num_agents):
-                    stacked_vector_observation=[1, 2, 3, 1, 2, 3],
+                    stacked_vector_observation=observation,
                    reward=1,
                    stored_vector_actions=vector_action,
                    stored_text_actions="",
--- a/python/notebooks/getting-started.ipynb
+++ b/python/notebooks/getting-started.ipynb
   },
   "outputs": [],
   "source": [
-    "env_name = \"3DBall\"  # Name of the Unity environment binary to launch\n",
+    "env_name = \"../envs/3DBall\"  # Name of the Unity environment binary to launch\n",
    "train_mode = True  # Whether to run the environment in training or inference mode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
+   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
+   "metadata": {},
-    "# Examine environment parameters\n",
-    "print(str(env))\n",
-    "\n",
    "# Set the default brain to work with\n",
    "default_brain = env.brain_names[0]\n",
    "brain = env.brains[default_brain]"
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
+   "metadata": {},
   "outputs": [],
   "source": [
    "# Reset the environment\n",
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
+   "metadata": {},
   "outputs": [],
   "source": [
    "for episode in range(10):\n",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.3"
+   "version": "3.6.2"
  }
 },
 "nbformat": 4,
--- a/gym-unity/Readme.md
+++ b/gym-unity/Readme.md
+# Unity ML-Agents Gym Wrapper
+
+A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by OpenAI called `gym`. For more information on the gym interface, see [here](https://github.com/openai/gym). 
+
+We provide a a gym wrapper, and instructions for using it with existing machine learning algorithms which utilize gyms. Both wrappers provide interfaces on top of our `UnityEnvironment` class, which is the default way of interfacing with a Unity environment via Python.
+
+## Installation
+
+The gym wrapper can be installed using:
+
+```
+pip install gym_unity
+```
+
+or by running the following from the `/gym-unity` directory of the repository:
+
+```
+pip install .
+```
+
+
+## Using the Gym Wrapper
+The gym interface is available from `gym_unity.envs`. To launch an environmnent from the root of the project repository use:
+
+```python
+from gym_unity.envs import UnityEnv
+
+env = UnityEnv(environment_filename, worker_id, default_visual, multiagent)
+```
+
+* `environment_filename` refers to the path to the Unity environment.
+* `worker_id` refers to the port to use for communication with the environment. Defaults to `0`.
+* `use_visual` refers to whether to use visual observations (True) or vector observations (False) as the default observation provided by the `reset` and `step` functions. Defaults to `False`.
+* `multiagent` refers to whether you intent to launch an environment which contains more than one agent. Defaults to `False`.
+
+The returned environment `env` will function as a gym.
+
+For more on using the gym interface, see our [Jupyter Notebook tutorial](../python/notebooks/getting-started-gym.ipynb).
+
+
+## Limitation
+
+ * It is only possible to use an environment with a single Brain.
+ * By default the first visual observation is provided as the `observation`, if present. Otherwise vector observations are provided. 
+ * All `BrainInfo` output from the environment can still be accessed from the `info` provided by `env.step(action)`.
+ * Stacked vector observations are not supported.
+ * Environment registration for use with `gym.make()` is currently not supported.
+
+
+## Running OpenAI Baselines Algorithms
+
+OpenAI provides a set of open-source maintained and tested Reinforcement Learning algorithms called the [Baselines](https://github.com/openai/baselines). 
+
+Using the provided Gym wrapper, it is possible to train ML-Agents environments using these algorithms. This requires the creation of custom training scripts to launch each algorithm. In most cases these scripts can be created by making slightly modifications to the ones provided for Atari and Mujoco environments.
+
+### Example - DQN Baseline
+
+In order to train an agent to play the `GridWorld` environment using the Baselines DQN algorithm, create a file called `train_unity.py` within the `baselines/deepq/experiments` subfolder of the baselines repository. This file will be a modification of the `run_atari.py` file within the same folder. Then create and `/envs/` directory within the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). Add the following code to the `train_unity.py` file:
+
+```
+import gym
+
+from baselines import deepq
+from gym_unity.envs import UnityEnv
+
+def main():
+    env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
+    model = deepq.models.cnn_to_mlp(
+        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
+        hiddens=[256],
+        dueling=True,
+    )
+    act = deepq.learn(
+        env,
+        q_func=model,
+        lr=1e-3,
+        max_timesteps=100000,
+        buffer_size=50000,
+        exploration_fraction=0.1,
+        exploration_final_eps=0.02,
+        print_freq=10,
+    )
+    print("Saving model to unity_model.pkl")
+    act.save("unity_model.pkl")
+
+
+if __name__ == '__main__':
+    main()
+```
+
+
+To start the training process, run the following from the root of the baselines repository:
+
+```
+python -m baselines.deepq.experiments.train_unity
+```
+
+### Other Algorithms
+
+Other algorithms in the Baselines repository can be run using scripts similar to the example provided above. In most cases, the primary changes needed to use a Unity environment are to import `UnityEnv`, and to replace the environment creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)` passing the environment binary path. 
+
+A typical rule of thumb is that for vision-based environments, modification should be done to Atari training scripts, and for vector observation environments, modification should be done to Mujoco scripts.
+
+Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()` functions. These are defined in `baselines/common/cmd_util.py`. In order to use Unity environments for these algorithms, add the following import statement and function to `cmd_utils.py`:
+
+```python
+from gym_unity.envs import UnityEnv
+
+def make_unity_env(env_directory, num_env, visual, start_index=0):
+    """
+    Create a wrapped, monitored Unity environment.
+    """
+    def make_env(rank): # pylint: disable=C0111
+        def _thunk():
+            env = UnityEnv(env_directory, rank, use_visual=True)
+            env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
+            return env
+        return _thunk
+    if visual:
+        return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
+    else:
+        rank = MPI.COMM_WORLD.Get_rank()
+        env = UnityEnv(env_directory, rank, use_visual=False)
+        env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
+        return env
+
+```
--- a/gym-unity/gym_unity/init.py
+++ b/gym-unity/gym_unity/init.py
+from gym.envs.registration import register
--- a/gym-unity/gym_unity/envs/init.py
+++ b/gym-unity/gym_unity/envs/init.py
+from gym_unity.envs.unity_env import UnityEnv, UnityGymException
--- a/gym-unity/gym_unity/envs/unity_env.py
+++ b/gym-unity/gym_unity/envs/unity_env.py
+import gym
+import numpy as np
+from unityagents import UnityEnvironment
+from gym import error, spaces, logger
+
+
+class UnityGymException(error.Error):
+    """
+    Any error related to the gym wrapper of ml-agents.
+    """
+    pass
+
+
+class UnityEnv(gym.Env):
+    """
+    Provides Gym wrapper for Unity Learning Environments.
+    Multi-agent environments use lists for object types, as done here:
+    https://github.com/openai/multiagent-particle-envs
+    """
+
+    def __init__(self, environment_filename: str, worker_id=0, use_visual=False, multiagent=False):
+        """
+        Environment initialization
+        :param environment_filename: The UnityEnvironment path or file to be wrapped in the gym.
+        :param worker_id: Worker number for environment.
+        :param use_visual: Whether to use visual observation or vector observation.
+        :param multiagent: Whether to run in multi-agent mode (lists of obs, reward, done).
+        """
+        self._env = UnityEnvironment(environment_filename, worker_id)
+        self.name = self._env.academy_name
+        self.visual_obs = None
+        self._current_state = None
+        self._n_agents = None
+        self._multiagent = multiagent
+
+        # Check brain configuration
+        if len(self._env.brains) != 1:
+            raise UnityGymException(
+                "There can only be one brain in a UnityEnvironment "
+                "if it is wrapped in a gym.")
+        self.brain_name = self._env.external_brain_names[0]
+        brain = self._env.brains[self.brain_name]
+
+        if use_visual and brain.number_visual_observations == 0:
+            raise UnityGymException("`use_visual` was set to True, however there are no"
+                                    " visual observations as part of this environment.")
+        self.use_visual = brain.number_visual_observations == 1 and use_visual
+
+        if brain.num_stacked_vector_observations != 1:
+            raise UnityGymException(
+                "There can only be one stacked vector observation in a UnityEnvironment "
+                "if it is wrapped in a gym.")
+
+        # Check for number of agents in scene.
+        initial_info = self._env.reset()[self.brain_name]
+        self._check_agents(len(initial_info.agents))
+
+        # Set observation and action spaces
+        if brain.vector_action_space_type == "discrete":
+            if len(brain.vector_action_space_size) == 1:
+                self._action_space = spaces.Discrete(brain.vector_action_space_size[0])
+            else:
+                self._action_space = spaces.MultiDiscrete(brain.vector_action_space_size)
+        else:
+            high = np.array([1] * brain.vector_action_space_size[0])
+            self._action_space = spaces.Box(-high, high, dtype=np.float32)
+        high = np.array([np.inf] * brain.vector_observation_space_size)
+        self.action_meanings = brain.vector_action_descriptions
+        if self.use_visual:
+            if brain.camera_resolutions[0]["blackAndWhite"]:
+                depth = 1
+            else:
+                depth = 3
+            self._observation_space = spaces.Box(0, 1, dtype=np.float32,
+                                                 shape=(brain.camera_resolutions[0]["height"],
+                                                        brain.camera_resolutions[0]["width"],
+                                                        depth))
+        else:
+            self._observation_space = spaces.Box(-high, high, dtype=np.float32)
+
+    def reset(self):
+        """Resets the state of the environment and returns an initial observation.
+        In the case of multi-agent environments, this is a list.
+        Returns: observation (object/list): the initial observation of the
+            space.
+        """
+        info = self._env.reset()[self.brain_name]
+        n_agents = len(info.agents)
+        self._check_agents(n_agents)
+
+        if not self._multiagent:
+            obs, reward, done, info = self._single_step(info)
+        else:
+            obs, reward, done, info = self._multi_step(info)
+        return obs
+
+    def step(self, action):
+        """Run one timestep of the environment's dynamics. When end of
+        episode is reached, you are responsible for calling `reset()`
+        to reset this environment's state.
+        Accepts an action and returns a tuple (observation, reward, done, info).
+        In the case of multi-agent environments, these are lists.
+        Args:
+            action (object/list): an action provided by the environment
+        Returns:
+            observation (object/list): agent's observation of the current environment
+            reward (float/list) : amount of reward returned after previous action
+            done (boolean/list): whether the episode has ended.
+            info (dict): contains auxiliary diagnostic information, including BrainInfo.
+        """
+
+        # Use random actions for all other agents in environment.
+        if self._multiagent:
+            if not isinstance(action, list):
+                raise UnityGymException("The environment was expecting `action` to be a list.")
+            if len(action) != self._n_agents:
+                raise UnityGymException("The environment was expecting a list of {} actions.".format(self._n_agents))
+            else:
+                action = np.array(action)
+
+        info = self._env.step(action)[self.brain_name]
+        n_agents = len(info.agents)
+        self._check_agents(n_agents)
+        self._current_state = info
+
+        if not self._multiagent:
+            obs, reward, done, info = self._single_step(info)
+        else:
+            obs, reward, done, info = self._multi_step(info)
+        return obs, reward, done, info
+
+    def _single_step(self, info):
+        if self.use_visual:
+            self.visual_obs = info.visual_observations[0][0, :, :, :]
+            default_observation = self.visual_obs
+        else:
+            default_observation = info.vector_observations[0, :]
+
+        return default_observation, info.rewards[0], info.local_done[0], {"text_observation": info.text_observations[0],
+                                                                          "brain_info": info}
+
+    def _multi_step(self, info):
+        if self.use_visual:
+            self.visual_obs = info.visual_observations
+            default_observation = self.visual_obs
+        else:
+            default_observation = info.vector_observations
+        return list(default_observation), info.rewards,  info.local_done, {"text_observation": info.text_observations,
+                                                                           "brain_info": info}
+
+    def render(self, mode='rgb_array'):
+        return self.visual_obs
+
+    def close(self):
+        """Override _close in your subclass to perform any necessary cleanup.
+        Environments will automatically close() themselves when
+        garbage collected or when the program exits.
+        """
+        self._env.close()
+
+    def get_action_meanings(self):
+        return self.action_meanings
+
+    def seed(self, seed=None):
+        """Sets the seed for this env's random number generator(s).
+        Currently not implemented.
+        """
+        logger.warn("Could not seed environment %s", self.name)
+        return
+
+    def _check_agents(self, n_agents):
+        if not self._multiagent and n_agents > 1:
+            raise UnityGymException("The environment was launched as a single-agent environment, however"
+                                    "there is more than one agent in the scene.")
+        elif self._multiagent and n_agents <= 1:
+            raise UnityGymException("The environment was launched as a mutli-agent environment, however"
+                                    "there is only one agent in the scene.")
+        if self._n_agents is None:
+            self._n_agents = n_agents
+            logger.info("{} agents within environment.".format(n_agents))
+        elif self._n_agents != n_agents:
+            raise UnityGymException("The number of agents in the environment has changed since "
+                                    "initialization. This is not supported.")
+
+    @property
+    def metadata(self):
+        return {'render.modes': ['rgb_array']}
+
+    @property
+    def reward_range(self):
+        return -float('inf'), float('inf')
+
+    @property
+    def spec(self):
+        return None
+
+    @property
+    def action_space(self):
+        return self._action_space
+
+    @property
+    def observation_space(self):
+        return self._observation_space
+
+    @property
+    def number_agents(self):
+        return self._n_agents
--- a/gym-unity/setup.py
+++ b/gym-unity/setup.py
+#!/usr/bin/env python
+
+from setuptools import setup, Command, find_packages
+
+setup(name='gym_unity',
+      version='0.1.0',
+      description='Unity Machine Learning Agents Gym Interface',
+      license='Apache License 2.0',
+      author='Unity Technologies',
+      author_email='ML-Agents@unity3d.com',
+      url='https://github.com/Unity-Technologies/ml-agents',
+      packages=find_packages(),
+      install_requires = ['gym', 'unityagents']
+     )
--- a/gym-unity/tests/test_gym.py
+++ b/gym-unity/tests/test_gym.py
+import unittest.mock as mock
+import pytest
+import numpy as np
+
+from gym_unity.envs import UnityEnv, UnityGymException
+from tests.mock_communicator import MockCommunicator
+
+@mock.patch('unityagents.UnityEnvironment.executable_launcher')
+@mock.patch('unityagents.UnityEnvironment.get_communicator')
+def test_gym_wrapper(mock_communicator, mock_launcher):
+    mock_communicator.return_value = MockCommunicator(
+        discrete_action=False, visual_inputs=0, stack=False, num_agents=1)
+
+    # Test for incorrect number of agents.
+    with pytest.raises(UnityGymException):
+        UnityEnv(' ', use_visual=False, multiagent=True)
+
+    env = UnityEnv(' ', use_visual=False)
+    assert isinstance(env, UnityEnv)
+    assert isinstance(env.reset(), np.ndarray)
+    actions = env.action_space.sample()
+    assert actions.shape[0] == 2
+    obs, rew, done, info = env.step(actions)
+    assert isinstance(obs, np.ndarray)
+    assert isinstance(rew, float)
+    assert isinstance(done, bool)
+    assert isinstance(info, dict)
+
+@mock.patch('unityagents.UnityEnvironment.executable_launcher')
+@mock.patch('unityagents.UnityEnvironment.get_communicator')
+def test_multi_agent(mock_communicator, mock_launcher):
+    mock_communicator.return_value = MockCommunicator(
+        discrete_action=False, visual_inputs=0, stack=False, num_agents=2)
+
+    # Test for incorrect number of agents.
+    with pytest.raises(UnityGymException):
+        UnityEnv(' ', multiagent=False)
+
+    env = UnityEnv(' ', use_visual=False, multiagent=True)
+    assert isinstance(env.reset(), list)
+    actions = [env.action_space.sample() for i in range(env.number_agents)]
+    obs, rew, done, info = env.step(actions)
+    assert isinstance(obs, list)
+    assert isinstance(rew, list)
+    assert isinstance(done, list)
+    assert isinstance(info, dict)
--- a/python/notebooks/getting-started-gym.ipynb
+++ b/python/notebooks/getting-started-gym.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Unity ML-Agents Toolkit\n",
+    "##  Gym Wrapper Basics\n",
+    "This notebook contains a walkthrough of the basic functions of the Python Gym Wrapper for the Unity ML-Agents toolkit. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Single-Agent Environments\n",
+    "\n",
+    "The first five steps show how to use the `UnityEnv` wrapper with single-agent environments. See below step five for how to use with multi-agent environments."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Load dependencies\n",
+    "\n",
+    "The following loads the necessary dependencies and checks the Python version (at runtime). ML-Agents Toolkit (v0.3 onwards) requires Python 3."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "\n",
+    "from gym_unity.envs import UnityEnv\n",
+    "\n",
+    "%matplotlib inline\n",
+    "\n",
+    "print(\"Python version:\")\n",
+    "print(sys.version)\n",
+    "\n",
+    "# check Python version\n",
+    "if (sys.version_info[0] < 3):\n",
+    "    raise Exception(\"ERROR: ML-Agents Toolkit (v0.3 onwards) requires Python 3\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Start the environment\n",
+    "`UnityEnv` launches and begins communication with the environment when instantiated. We will be using the `GridWorld` environment. You will need to create an `envs` directory within the  `/python` subfolder of the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "env_name = \"../envs/GridWorld\"  # Name of the Unity environment binary to launch\n",
+    "env = UnityEnv(env_name, worker_id=0, use_visual=True)\n",
+    "\n",
+    "# Examine environment parameters\n",
+    "print(str(env))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. Examine the observation and state spaces\n",
+    "We can reset the environment to be provided with an initial observation of the environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Reset the environment\n",
+    "initial_observation = env.reset()\n",
+    "\n",
+    "if len(env.observation_space.shape) == 1:\n",
+    "    # Examine the initial vector observation\n",
+    "    print(\"Agent state looks like: \\n{}\".format(initial_observation))\n",
+    "else:\n",
+    "    # Examine the initial visual observation\n",
+    "    print(\"Agent observations look like:\")\n",
+    "    if env.observation_space.shape[2] == 3:\n",
+    "        plt.imshow(initial_observation[:,:,:])\n",
+    "    else:\n",
+    "        plt.imshow(initial_observation[:,:,0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Take random actions in the environment\n",
+    "Once we restart an environment, we can step the environment forward and provide actions to all of the agents within the environment. Here we simply choose random actions using the `env.action_space.sample()` function.\n",
+    "\n",
+    "Once this cell is executed, 10 messages will be printed that detail how much reward will be accumulated for the next 10 episodes. The Unity environment will then pause, waiting for further signals telling it what to do next. Thus, not seeing any animation is expected when running this cell."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for episode in range(10):\n",
+    "    initial_observation = env.reset()\n",
+    "    done = False\n",
+    "    episode_rewards = 0\n",
+    "    while not done:\n",
+    "        observation, reward, done, info = env.step(env.action_space.sample())\n",
+    "        episode_rewards += reward\n",
+    "    print(\"Total reward this episode: {}\".format(episode_rewards))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5. Close the environment when finished\n",
+    "When we are finished using an environment, we can close it with the function below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "env.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multi-Agent Environments\n",
+    "\n",
+    "It is also possible to use the gym wrapper with multi-agent environments. For these environments, observations, rewards, and done flags will be provided in a list. Likewise, the environment will expect a list of actions when calling `step(action)`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Start the environment\n",
+    "\n",
+    "We will use the `3DBall` environment for this walkthrough. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). We will launch it from the `python/envs` sub-directory of the repo. Please create an `envs` folder if one does not already exist."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Name of the Unity environment binary to launch\n",
+    "multi_env_name = \"../envs/3DBall\"  \n",
+    "multi_env = UnityEnv(multi_env_name, worker_id=1, \n",
+    "                     use_visual=False, multiagent=True)\n",
+    "\n",
+    "# Examine environment parameters\n",
+    "print(str(multi_env))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Examine the observation space "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Reset the environment\n",
+    "initial_observations = multi_env.reset()\n",
+    "\n",
+    "if len(multi_env.observation_space.shape) == 1:\n",
+    "    # Examine the initial vector observation\n",
+    "    print(\"Agent observations look like: \\n{}\".format(initial_observations[0]))\n",
+    "else:\n",
+    "    # Examine the initial visual observation\n",
+    "    print(\"Agent observations look like:\")\n",
+    "    if multi_env.observation_space.shape[2] == 3:\n",
+    "        plt.imshow(initial_observations[0][:,:,:])\n",
+    "    else:\n",
+    "        plt.imshow(initial_observations[0][:,:,0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. Take random steps in the environment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for episode in range(10):\n",
+    "    initial_observation = multi_env.reset()\n",
+    "    done = False\n",
+    "    episode_rewards = 0\n",
+    "    while not done:\n",
+    "        actions = [multi_env.action_space.sample() for agent in range(multi_env.number_agents)]\n",
+    "        observations, rewards, dones, info = multi_env.step(actions)\n",
+    "        episode_rewards += np.mean(rewards)\n",
+    "        done = dones[0]\n",
+    "    print(\"Total reward this episode: {}\".format(episode_rewards))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Close the environment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "multi_env.close()"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a//python/notebooks/getting-started.ipynb
+++ b//python/notebooks/getting-started.ipynb