Update gym_unity baselines example, add PPO2 example

We have an example of how to use our gym wrapper with OpenAI baselines, but it was out of date with the latest updates to the baselines library. This updates the instructions in the gym_unity README and adds an example of using PPO2 with a Unity environment.
6 年前 · 3bc092d2
--- a/gym-unity/README.md
+++ b/gym-unity/README.md
 ### Example - DQN Baseline

 In order to train an agent to play the `GridWorld` environment using the
-Baselines DQN algorithm, create a file called `train_unity.py` within the
-`baselines/deepq/experiments` subfolder of the baselines repository. This file
-will be a modification of the `run_atari.py` file within the same folder. Then
-create and `/envs/` directory within the repository, and build the GridWorld
-environment to that directory. For more information on building Unity
-environments, see [here](../docs/Learning-Environment-Executable.md). Add the
-following code to the `train_unity.py` file:
+Baselines DQN algorithm, you first need to install the baselines package using 
+pip:
+
+```
+pip install git+git://github.com/openai/baselines
+```
+
+Next, create a file called `train_unity.py`. Then create an `/envs/` directory 
+and build the GridWorld environment to that directory. For more information on 
+building Unity environments, see 
+[here](../docs/Learning-Environment-Executable.md). Add the following code to 
+the `train_unity.py` file:

 ```python
 import gym

 def main():
    env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
-    model = deepq.models.cnn_to_mlp(
-        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
-        hiddens=[256],
-        dueling=True,
-    )
-        q_func=model,
+        "mlp",
-        max_timesteps=100000,
+        total_timesteps=100000,
-        print_freq=10,
+        print_freq=10
    )
    print("Saving model to unity_model.pkl")
    act.save("unity_model.pkl")
 repository:

 ```sh
-python -m baselines.deepq.experiments.train_unity
+python -m train_unity
-the example provided above. In most cases, the primary changes needed to use a
-Unity environment are to import `UnityEnv`, and to replace the environment
+the examples from the baselines package. In most cases, the primary changes needed 
+to use a Unity environment are to import `UnityEnv`, and to replace the environment
 creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)`
 passing the environment binary path.


-Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()`
-functions. These are defined in `baselines/common/cmd_util.py`. In order to use
-Unity environments for these algorithms, add the following import statement and
-function to `cmd_utils.py`:
+Some algorithms will make use of `make_env()` or `make_mujoco_env()`
+functions. You can define a similar function for Unity environments.  An example of 
+such a method using the PPO2 baseline:
+from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
+from baselines.bench import Monitor
+from baselines import logger
+import baselines.ppo2.ppo2 as ppo2
+
+import os
+
+try:
+    from mpi4py import MPI
+except ImportError:
+    MPI = None
-    def make_env(rank): # pylint: disable=C0111
+    def make_env(rank, use_visual=True): # pylint: disable=C0111
-            env = UnityEnv(env_directory, rank, use_visual=True)
+            env = UnityEnv(env_directory, rank, use_visual=use_visual)
            env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
            return env
        return _thunk
-        rank = MPI.COMM_WORLD.Get_rank()
-        env = UnityEnv(env_directory, rank, use_visual=False)
-        env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
-        return env
+        rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
+        return make_env(rank, use_visual=False)
+def main():
+    env = make_unity_env('./envs/GridWorld', 4, True)
+    ppo2.learn(
+        network="mlp",
+        env=env,
+        total_timesteps=100000,
+        lr=1e-3,
+    )
+
+if __name__ == '__main__':
+    main()
 ```