Unity ML-Agents Gym Wrapper

A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by OpenAI called gym. For more information on the gym interface, see here.

We provide a a gym wrapper, and instructions for using it with existing machine learning algorithms which utilize gyms. Both wrappers provide interfaces on top of our UnityEnvironment class, which is the default way of interfacing with a Unity environment via Python.

Installation

The gym wrapper can be installed using:

pip install gym_unity

or by running the following from the /gym-unity directory of the repository:

pip install .

Using the Gym Wrapper

The gym interface is available from gym_unity.envs. To launch an environmnent from the root of the project repository use:

from gym_unity.envs import UnityEnv

env = UnityEnv(environment_filename, worker_id, default_visual, multiagent)

environment_filename refers to the path to the Unity environment.
worker_id refers to the port to use for communication with the environment. Defaults to 0.
use_visual refers to whether to use visual observations (True) or vector observations (False) as the default observation provided by the reset and step functions. Defaults to False.
multiagent refers to whether you intent to launch an environment which contains more than one agent. Defaults to False.

The returned environment env will function as a gym.

For more on using the gym interface, see our Jupyter Notebook tutorial.

Limitation

It is only possible to use an environment with a single Brain.
By default the first visual observation is provided as the observation, if present. Otherwise vector observations are provided.
All BrainInfo output from the environment can still be accessed from the info provided by env.step(action).
Stacked vector observations are not supported.
Environment registration for use with gym.make() is currently not supported.

Running OpenAI Baselines Algorithms

OpenAI provides a set of open-source maintained and tested Reinforcement Learning algorithms called the Baselines.

Using the provided Gym wrapper, it is possible to train ML-Agents environments using these algorithms. This requires the creation of custom training scripts to launch each algorithm. In most cases these scripts can be created by making slightly modifications to the ones provided for Atari and Mujoco environments.

Example - DQN Baseline

In order to train an agent to play the GridWorld environment using the Baselines DQN algorithm, create a file called train_unity.py within the baselines/deepq/experiments subfolder of the baselines repository. This file will be a modification of the run_atari.py file within the same folder. Then create and /envs/ directory within the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see here. Add the following code to the train_unity.py file:

import gym

from baselines import deepq
from gym_unity.envs import UnityEnv

def main():
    env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
    model = deepq.models.cnn_to_mlp(
        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
        hiddens=[256],
        dueling=True,
    )
    act = deepq.learn(
        env,
        q_func=model,
        lr=1e-3,
        max_timesteps=100000,
        buffer_size=50000,
        exploration_fraction=0.1,
        exploration_final_eps=0.02,
        print_freq=10,
    )
    print("Saving model to unity_model.pkl")
    act.save("unity_model.pkl")


if __name__ == '__main__':
    main()

To start the training process, run the following from the root of the baselines repository:

python -m baselines.deepq.experiments.train_unity

Other Algorithms

Other algorithms in the Baselines repository can be run using scripts similar to the example provided above. In most cases, the primary changes needed to use a Unity environment are to import UnityEnv, and to replace the environment creation code, typically gym.make(), with a call to UnityEnv(env_path) passing the environment binary path.

A typical rule of thumb is that for vision-based environments, modification should be done to Atari training scripts, and for vector observation environments, modification should be done to Mujoco scripts.

Some algorithms will make use of make_atari_env() or make_mujoco_env() functions. These are defined in baselines/common/cmd_util.py. In order to use Unity environments for these algorithms, add the following import statement and function to cmd_utils.py:

from gym_unity.envs import UnityEnv

def make_unity_env(env_directory, num_env, visual, start_index=0):
    """
    Create a wrapped, monitored Unity environment.
    """
    def make_env(rank): # pylint: disable=C0111
        def _thunk():
            env = UnityEnv(env_directory, rank, use_visual=True)
            env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
            return env
        return _thunk
    if visual:
        return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
    else:
        rank = MPI.COMM_WORLD.Get_rank()
        env = UnityEnv(env_directory, rank, use_visual=False)
        env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
        return env

5.5 KiB 原始文件 文件历史