浏览代码

Optional gym wrapper (#1007)

Adds optional gym wrapper UnityEnv to use as python interfaces to Unity environments.
/develop-generalizationTraining-TrainerController
GitHub 6 年前
当前提交
c600a706
共有 19 个文件被更改,包括 692 次插入23 次删除
  1. 3
      .gitignore
  2. 2
      docs/Background-Jupyter.md
  3. 2
      docs/Python-API.md
  4. 1
      docs/Readme.md
  5. 18
      python/tests/mock_communicator.py
  6. 23
      python/notebooks/getting-started.ipynb
  7. 127
      gym-unity/Readme.md
  8. 1
      gym-unity/gym_unity/__init__.py
  9. 1
      gym-unity/gym_unity/envs/__init__.py
  10. 207
      gym-unity/gym_unity/envs/unity_env.py
  11. 14
      gym-unity/setup.py
  12. 46
      gym-unity/tests/test_gym.py
  13. 270
      python/notebooks/getting-started-gym.ipynb
  14. 0
      /python/notebooks/getting-started.ipynb

3
.gitignore


# VSCode hidden files
*.vscode/
.DS_Store
.ipynb_checkpoints
# pytest cache
*.pytest_cache/

2
docs/Background-Jupyter.md


# Background: Jupyter
[Jupyter](https://jupyter.org) is a fantastic tool for writing code with
embedded visualizations. We provide one such notebook, `python/Basics.ipynb`,
embedded visualizations. We provide one such notebook, `python/notebooks/getting-started.ipynb`,
for testing the Python control interface to a Unity build. This notebook is
introduced in the
[Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)

2
docs/Python-API.md


To communicate with an agent in a Unity environment from a Python program, the agent must either use an **External** brain or use a brain that is broadcasting (has its **Broadcast** property set to true). Your code is expected to return actions for agents with external brains, but can only observe broadcasting brains (the information you receive for an agent is the same in both cases). See [Using the Broadcast Feature](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook (`python/Basics.ipynb`), which opens an environment, runs a few simulation steps taking random actions, and closes the environment.
For a simple example of using the Python API to interact with a Unity environment, see the Basic [Jupyter](Background-Jupyter.md) notebook (`python/notebooks/getting-started.ipynb`), which opens an environment, runs a few simulation steps taking random actions, and closes the environment.
_Notice: Currently communication between Unity and Python takes place over an open socket without authentication. As such, please make sure that the network where training takes place is secure. This will be addressed in a future release._

1
docs/Readme.md


## API Docs
* [API Reference](API-Reference.md)
* [How to use the Python API](Python-API.md)
* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)

18
python/tests/mock_communicator.py


class MockCommunicator(Communicator):
def __init__(self, discrete_action=False, visual_inputs=0):
def __init__(self, discrete_action=False, visual_inputs=0, stack=True, num_agents=3):
"""
Python side of the grpc communication. Python is the client and Unity the server

self.steps = 0
self.visual_inputs = visual_inputs
self.has_been_closed = False
self.num_agents = num_agents
if stack:
self.num_stacks = 2
else:
self.num_stacks = 1
def initialize(self, inputs: UnityInput) -> UnityOutput:
resolutions = [ResolutionProto(

bp = BrainParametersProto(
vector_observation_size=3,
num_stacked_vector_observations=2,
num_stacked_vector_observations=self.num_stacks,
vector_action_size=[2],
camera_resolutions=resolutions,
vector_action_descriptions=["", ""],

else:
vector_action = [1, 2]
list_agent_info = []
for i in range(3):
if self.num_stacks == 1:
observation = [1, 2, 3]
else:
observation = [1, 2, 3, 1, 2, 3]
for i in range(self.num_agents):
stacked_vector_observation=[1, 2, 3, 1, 2, 3],
stacked_vector_observation=observation,
reward=1,
stored_vector_actions=vector_action,
stored_text_actions="",

23
python/notebooks/getting-started.ipynb


},
"outputs": [],
"source": [
"env_name = \"3DBall\" # Name of the Unity environment binary to launch\n",
"env_name = \"../envs/3DBall\" # Name of the Unity environment binary to launch\n",
"train_mode = True # Whether to run the environment in training or inference mode"
]
},

{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",

{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"# Examine environment parameters\n",
"print(str(env))\n",
"\n",
"# Set the default brain to work with\n",
"default_brain = env.brain_names[0]\n",
"brain = env.brains[default_brain]"

{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"# Reset the environment\n",

{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"for episode in range(10):\n",

"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
"version": "3.6.2"
}
},
"nbformat": 4,

127
gym-unity/Readme.md


# Unity ML-Agents Gym Wrapper
A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by OpenAI called `gym`. For more information on the gym interface, see [here](https://github.com/openai/gym).
We provide a a gym wrapper, and instructions for using it with existing machine learning algorithms which utilize gyms. Both wrappers provide interfaces on top of our `UnityEnvironment` class, which is the default way of interfacing with a Unity environment via Python.
## Installation
The gym wrapper can be installed using:
```
pip install gym_unity
```
or by running the following from the `/gym-unity` directory of the repository:
```
pip install .
```
## Using the Gym Wrapper
The gym interface is available from `gym_unity.envs`. To launch an environmnent from the root of the project repository use:
```python
from gym_unity.envs import UnityEnv
env = UnityEnv(environment_filename, worker_id, default_visual, multiagent)
```
* `environment_filename` refers to the path to the Unity environment.
* `worker_id` refers to the port to use for communication with the environment. Defaults to `0`.
* `use_visual` refers to whether to use visual observations (True) or vector observations (False) as the default observation provided by the `reset` and `step` functions. Defaults to `False`.
* `multiagent` refers to whether you intent to launch an environment which contains more than one agent. Defaults to `False`.
The returned environment `env` will function as a gym.
For more on using the gym interface, see our [Jupyter Notebook tutorial](../python/notebooks/getting-started-gym.ipynb).
## Limitation
* It is only possible to use an environment with a single Brain.
* By default the first visual observation is provided as the `observation`, if present. Otherwise vector observations are provided.
* All `BrainInfo` output from the environment can still be accessed from the `info` provided by `env.step(action)`.
* Stacked vector observations are not supported.
* Environment registration for use with `gym.make()` is currently not supported.
## Running OpenAI Baselines Algorithms
OpenAI provides a set of open-source maintained and tested Reinforcement Learning algorithms called the [Baselines](https://github.com/openai/baselines).
Using the provided Gym wrapper, it is possible to train ML-Agents environments using these algorithms. This requires the creation of custom training scripts to launch each algorithm. In most cases these scripts can be created by making slightly modifications to the ones provided for Atari and Mujoco environments.
### Example - DQN Baseline
In order to train an agent to play the `GridWorld` environment using the Baselines DQN algorithm, create a file called `train_unity.py` within the `baselines/deepq/experiments` subfolder of the baselines repository. This file will be a modification of the `run_atari.py` file within the same folder. Then create and `/envs/` directory within the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). Add the following code to the `train_unity.py` file:
```
import gym
from baselines import deepq
from gym_unity.envs import UnityEnv
def main():
env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
model = deepq.models.cnn_to_mlp(
convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
hiddens=[256],
dueling=True,
)
act = deepq.learn(
env,
q_func=model,
lr=1e-3,
max_timesteps=100000,
buffer_size=50000,
exploration_fraction=0.1,
exploration_final_eps=0.02,
print_freq=10,
)
print("Saving model to unity_model.pkl")
act.save("unity_model.pkl")
if __name__ == '__main__':
main()
```
To start the training process, run the following from the root of the baselines repository:
```
python -m baselines.deepq.experiments.train_unity
```
### Other Algorithms
Other algorithms in the Baselines repository can be run using scripts similar to the example provided above. In most cases, the primary changes needed to use a Unity environment are to import `UnityEnv`, and to replace the environment creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)` passing the environment binary path.
A typical rule of thumb is that for vision-based environments, modification should be done to Atari training scripts, and for vector observation environments, modification should be done to Mujoco scripts.
Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()` functions. These are defined in `baselines/common/cmd_util.py`. In order to use Unity environments for these algorithms, add the following import statement and function to `cmd_utils.py`:
```python
from gym_unity.envs import UnityEnv
def make_unity_env(env_directory, num_env, visual, start_index=0):
"""
Create a wrapped, monitored Unity environment.
"""
def make_env(rank): # pylint: disable=C0111
def _thunk():
env = UnityEnv(env_directory, rank, use_visual=True)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk
if visual:
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
else:
rank = MPI.COMM_WORLD.Get_rank()
env = UnityEnv(env_directory, rank, use_visual=False)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
```

1
gym-unity/gym_unity/__init__.py


from gym.envs.registration import register

1
gym-unity/gym_unity/envs/__init__.py


from gym_unity.envs.unity_env import UnityEnv, UnityGymException

207
gym-unity/gym_unity/envs/unity_env.py


import gym
import numpy as np
from unityagents import UnityEnvironment
from gym import error, spaces, logger
class UnityGymException(error.Error):
"""
Any error related to the gym wrapper of ml-agents.
"""
pass
class UnityEnv(gym.Env):
"""
Provides Gym wrapper for Unity Learning Environments.
Multi-agent environments use lists for object types, as done here:
https://github.com/openai/multiagent-particle-envs
"""
def __init__(self, environment_filename: str, worker_id=0, use_visual=False, multiagent=False):
"""
Environment initialization
:param environment_filename: The UnityEnvironment path or file to be wrapped in the gym.
:param worker_id: Worker number for environment.
:param use_visual: Whether to use visual observation or vector observation.
:param multiagent: Whether to run in multi-agent mode (lists of obs, reward, done).
"""
self._env = UnityEnvironment(environment_filename, worker_id)
self.name = self._env.academy_name
self.visual_obs = None
self._current_state = None
self._n_agents = None
self._multiagent = multiagent
# Check brain configuration
if len(self._env.brains) != 1:
raise UnityGymException(
"There can only be one brain in a UnityEnvironment "
"if it is wrapped in a gym.")
self.brain_name = self._env.external_brain_names[0]
brain = self._env.brains[self.brain_name]
if use_visual and brain.number_visual_observations == 0:
raise UnityGymException("`use_visual` was set to True, however there are no"
" visual observations as part of this environment.")
self.use_visual = brain.number_visual_observations == 1 and use_visual
if brain.num_stacked_vector_observations != 1:
raise UnityGymException(
"There can only be one stacked vector observation in a UnityEnvironment "
"if it is wrapped in a gym.")
# Check for number of agents in scene.
initial_info = self._env.reset()[self.brain_name]
self._check_agents(len(initial_info.agents))
# Set observation and action spaces
if brain.vector_action_space_type == "discrete":
if len(brain.vector_action_space_size) == 1:
self._action_space = spaces.Discrete(brain.vector_action_space_size[0])
else:
self._action_space = spaces.MultiDiscrete(brain.vector_action_space_size)
else:
high = np.array([1] * brain.vector_action_space_size[0])
self._action_space = spaces.Box(-high, high, dtype=np.float32)
high = np.array([np.inf] * brain.vector_observation_space_size)
self.action_meanings = brain.vector_action_descriptions
if self.use_visual:
if brain.camera_resolutions[0]["blackAndWhite"]:
depth = 1
else:
depth = 3
self._observation_space = spaces.Box(0, 1, dtype=np.float32,
shape=(brain.camera_resolutions[0]["height"],
brain.camera_resolutions[0]["width"],
depth))
else:
self._observation_space = spaces.Box(-high, high, dtype=np.float32)
def reset(self):
"""Resets the state of the environment and returns an initial observation.
In the case of multi-agent environments, this is a list.
Returns: observation (object/list): the initial observation of the
space.
"""
info = self._env.reset()[self.brain_name]
n_agents = len(info.agents)
self._check_agents(n_agents)
if not self._multiagent:
obs, reward, done, info = self._single_step(info)
else:
obs, reward, done, info = self._multi_step(info)
return obs
def step(self, action):
"""Run one timestep of the environment's dynamics. When end of
episode is reached, you are responsible for calling `reset()`
to reset this environment's state.
Accepts an action and returns a tuple (observation, reward, done, info).
In the case of multi-agent environments, these are lists.
Args:
action (object/list): an action provided by the environment
Returns:
observation (object/list): agent's observation of the current environment
reward (float/list) : amount of reward returned after previous action
done (boolean/list): whether the episode has ended.
info (dict): contains auxiliary diagnostic information, including BrainInfo.
"""
# Use random actions for all other agents in environment.
if self._multiagent:
if not isinstance(action, list):
raise UnityGymException("The environment was expecting `action` to be a list.")
if len(action) != self._n_agents:
raise UnityGymException("The environment was expecting a list of {} actions.".format(self._n_agents))
else:
action = np.array(action)
info = self._env.step(action)[self.brain_name]
n_agents = len(info.agents)
self._check_agents(n_agents)
self._current_state = info
if not self._multiagent:
obs, reward, done, info = self._single_step(info)
else:
obs, reward, done, info = self._multi_step(info)
return obs, reward, done, info
def _single_step(self, info):
if self.use_visual:
self.visual_obs = info.visual_observations[0][0, :, :, :]
default_observation = self.visual_obs
else:
default_observation = info.vector_observations[0, :]
return default_observation, info.rewards[0], info.local_done[0], {"text_observation": info.text_observations[0],
"brain_info": info}
def _multi_step(self, info):
if self.use_visual:
self.visual_obs = info.visual_observations
default_observation = self.visual_obs
else:
default_observation = info.vector_observations
return list(default_observation), info.rewards, info.local_done, {"text_observation": info.text_observations,
"brain_info": info}
def render(self, mode='rgb_array'):
return self.visual_obs
def close(self):
"""Override _close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when
garbage collected or when the program exits.
"""
self._env.close()
def get_action_meanings(self):
return self.action_meanings
def seed(self, seed=None):
"""Sets the seed for this env's random number generator(s).
Currently not implemented.
"""
logger.warn("Could not seed environment %s", self.name)
return
def _check_agents(self, n_agents):
if not self._multiagent and n_agents > 1:
raise UnityGymException("The environment was launched as a single-agent environment, however"
"there is more than one agent in the scene.")
elif self._multiagent and n_agents <= 1:
raise UnityGymException("The environment was launched as a mutli-agent environment, however"
"there is only one agent in the scene.")
if self._n_agents is None:
self._n_agents = n_agents
logger.info("{} agents within environment.".format(n_agents))
elif self._n_agents != n_agents:
raise UnityGymException("The number of agents in the environment has changed since "
"initialization. This is not supported.")
@property
def metadata(self):
return {'render.modes': ['rgb_array']}
@property
def reward_range(self):
return -float('inf'), float('inf')
@property
def spec(self):
return None
@property
def action_space(self):
return self._action_space
@property
def observation_space(self):
return self._observation_space
@property
def number_agents(self):
return self._n_agents

14
gym-unity/setup.py


#!/usr/bin/env python
from setuptools import setup, Command, find_packages
setup(name='gym_unity',
version='0.1.0',
description='Unity Machine Learning Agents Gym Interface',
license='Apache License 2.0',
author='Unity Technologies',
author_email='ML-Agents@unity3d.com',
url='https://github.com/Unity-Technologies/ml-agents',
packages=find_packages(),
install_requires = ['gym', 'unityagents']
)

46
gym-unity/tests/test_gym.py


import unittest.mock as mock
import pytest
import numpy as np
from gym_unity.envs import UnityEnv, UnityGymException
from tests.mock_communicator import MockCommunicator
@mock.patch('unityagents.UnityEnvironment.executable_launcher')
@mock.patch('unityagents.UnityEnvironment.get_communicator')
def test_gym_wrapper(mock_communicator, mock_launcher):
mock_communicator.return_value = MockCommunicator(
discrete_action=False, visual_inputs=0, stack=False, num_agents=1)
# Test for incorrect number of agents.
with pytest.raises(UnityGymException):
UnityEnv(' ', use_visual=False, multiagent=True)
env = UnityEnv(' ', use_visual=False)
assert isinstance(env, UnityEnv)
assert isinstance(env.reset(), np.ndarray)
actions = env.action_space.sample()
assert actions.shape[0] == 2
obs, rew, done, info = env.step(actions)
assert isinstance(obs, np.ndarray)
assert isinstance(rew, float)
assert isinstance(done, bool)
assert isinstance(info, dict)
@mock.patch('unityagents.UnityEnvironment.executable_launcher')
@mock.patch('unityagents.UnityEnvironment.get_communicator')
def test_multi_agent(mock_communicator, mock_launcher):
mock_communicator.return_value = MockCommunicator(
discrete_action=False, visual_inputs=0, stack=False, num_agents=2)
# Test for incorrect number of agents.
with pytest.raises(UnityGymException):
UnityEnv(' ', multiagent=False)
env = UnityEnv(' ', use_visual=False, multiagent=True)
assert isinstance(env.reset(), list)
actions = [env.action_space.sample() for i in range(env.number_agents)]
obs, rew, done, info = env.step(actions)
assert isinstance(obs, list)
assert isinstance(rew, list)
assert isinstance(done, list)
assert isinstance(info, dict)

270
python/notebooks/getting-started-gym.ipynb


{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unity ML-Agents Toolkit\n",
"## Gym Wrapper Basics\n",
"This notebook contains a walkthrough of the basic functions of the Python Gym Wrapper for the Unity ML-Agents toolkit. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Single-Agent Environments\n",
"\n",
"The first five steps show how to use the `UnityEnv` wrapper with single-agent environments. See below step five for how to use with multi-agent environments."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Load dependencies\n",
"\n",
"The following loads the necessary dependencies and checks the Python version (at runtime). ML-Agents Toolkit (v0.3 onwards) requires Python 3."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import sys\n",
"\n",
"from gym_unity.envs import UnityEnv\n",
"\n",
"%matplotlib inline\n",
"\n",
"print(\"Python version:\")\n",
"print(sys.version)\n",
"\n",
"# check Python version\n",
"if (sys.version_info[0] < 3):\n",
" raise Exception(\"ERROR: ML-Agents Toolkit (v0.3 onwards) requires Python 3\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Start the environment\n",
"`UnityEnv` launches and begins communication with the environment when instantiated. We will be using the `GridWorld` environment. You will need to create an `envs` directory within the `/python` subfolder of the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"env_name = \"../envs/GridWorld\" # Name of the Unity environment binary to launch\n",
"env = UnityEnv(env_name, worker_id=0, use_visual=True)\n",
"\n",
"# Examine environment parameters\n",
"print(str(env))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Examine the observation and state spaces\n",
"We can reset the environment to be provided with an initial observation of the environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Reset the environment\n",
"initial_observation = env.reset()\n",
"\n",
"if len(env.observation_space.shape) == 1:\n",
" # Examine the initial vector observation\n",
" print(\"Agent state looks like: \\n{}\".format(initial_observation))\n",
"else:\n",
" # Examine the initial visual observation\n",
" print(\"Agent observations look like:\")\n",
" if env.observation_space.shape[2] == 3:\n",
" plt.imshow(initial_observation[:,:,:])\n",
" else:\n",
" plt.imshow(initial_observation[:,:,0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Take random actions in the environment\n",
"Once we restart an environment, we can step the environment forward and provide actions to all of the agents within the environment. Here we simply choose random actions using the `env.action_space.sample()` function.\n",
"\n",
"Once this cell is executed, 10 messages will be printed that detail how much reward will be accumulated for the next 10 episodes. The Unity environment will then pause, waiting for further signals telling it what to do next. Thus, not seeing any animation is expected when running this cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for episode in range(10):\n",
" initial_observation = env.reset()\n",
" done = False\n",
" episode_rewards = 0\n",
" while not done:\n",
" observation, reward, done, info = env.step(env.action_space.sample())\n",
" episode_rewards += reward\n",
" print(\"Total reward this episode: {}\".format(episode_rewards))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. Close the environment when finished\n",
"When we are finished using an environment, we can close it with the function below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"env.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-Agent Environments\n",
"\n",
"It is also possible to use the gym wrapper with multi-agent environments. For these environments, observations, rewards, and done flags will be provided in a list. Likewise, the environment will expect a list of actions when calling `step(action)`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Start the environment\n",
"\n",
"We will use the `3DBall` environment for this walkthrough. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). We will launch it from the `python/envs` sub-directory of the repo. Please create an `envs` folder if one does not already exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Name of the Unity environment binary to launch\n",
"multi_env_name = \"../envs/3DBall\" \n",
"multi_env = UnityEnv(multi_env_name, worker_id=1, \n",
" use_visual=False, multiagent=True)\n",
"\n",
"# Examine environment parameters\n",
"print(str(multi_env))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Examine the observation space "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Reset the environment\n",
"initial_observations = multi_env.reset()\n",
"\n",
"if len(multi_env.observation_space.shape) == 1:\n",
" # Examine the initial vector observation\n",
" print(\"Agent observations look like: \\n{}\".format(initial_observations[0]))\n",
"else:\n",
" # Examine the initial visual observation\n",
" print(\"Agent observations look like:\")\n",
" if multi_env.observation_space.shape[2] == 3:\n",
" plt.imshow(initial_observations[0][:,:,:])\n",
" else:\n",
" plt.imshow(initial_observations[0][:,:,0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Take random steps in the environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for episode in range(10):\n",
" initial_observation = multi_env.reset()\n",
" done = False\n",
" episode_rewards = 0\n",
" while not done:\n",
" actions = [multi_env.action_space.sample() for agent in range(multi_env.number_agents)]\n",
" observations, rewards, dones, info = multi_env.step(actions)\n",
" episode_rewards += np.mean(rewards)\n",
" done = dones[0]\n",
" print(\"Total reward this episode: {}\".format(episode_rewards))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Close the environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"multi_env.close()"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

/python/Basics.ipynb → /python/notebooks/getting-started.ipynb

正在加载...
取消
保存