浏览代码

Update gym_unity baselines example, add PPO2 example

We have an example of how to use our gym wrapper with OpenAI baselines,
but it was out of date with the latest updates to the baselines library.
This updates the instructions in the gym_unity README and adds an example
of using PPO2 with a Unity environment.
/develop-generalizationTraining-TrainerController
Jonathan Harper 6 年前
当前提交
3bc092d2
共有 1 个文件被更改,包括 47 次插入28 次删除
  1. 75
      gym-unity/README.md

75
gym-unity/README.md


### Example - DQN Baseline
In order to train an agent to play the `GridWorld` environment using the
Baselines DQN algorithm, create a file called `train_unity.py` within the
`baselines/deepq/experiments` subfolder of the baselines repository. This file
will be a modification of the `run_atari.py` file within the same folder. Then
create and `/envs/` directory within the repository, and build the GridWorld
environment to that directory. For more information on building Unity
environments, see [here](../docs/Learning-Environment-Executable.md). Add the
following code to the `train_unity.py` file:
Baselines DQN algorithm, you first need to install the baselines package using
pip:
```
pip install git+git://github.com/openai/baselines
```
Next, create a file called `train_unity.py`. Then create an `/envs/` directory
and build the GridWorld environment to that directory. For more information on
building Unity environments, see
[here](../docs/Learning-Environment-Executable.md). Add the following code to
the `train_unity.py` file:
```python
import gym

def main():
env = UnityEnv("./envs/GridWorld", 0, use_visual=True)
model = deepq.models.cnn_to_mlp(
convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
hiddens=[256],
dueling=True,
)
q_func=model,
"mlp",
max_timesteps=100000,
total_timesteps=100000,
print_freq=10,
print_freq=10
)
print("Saving model to unity_model.pkl")
act.save("unity_model.pkl")

repository:
```sh
python -m baselines.deepq.experiments.train_unity
python -m train_unity
the example provided above. In most cases, the primary changes needed to use a
Unity environment are to import `UnityEnv`, and to replace the environment
the examples from the baselines package. In most cases, the primary changes needed
to use a Unity environment are to import `UnityEnv`, and to replace the environment
creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)`
passing the environment binary path.

Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()`
functions. These are defined in `baselines/common/cmd_util.py`. In order to use
Unity environments for these algorithms, add the following import statement and
function to `cmd_utils.py`:
Some algorithms will make use of `make_env()` or `make_mujoco_env()`
functions. You can define a similar function for Unity environments. An example of
such a method using the PPO2 baseline:
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
from baselines.bench import Monitor
from baselines import logger
import baselines.ppo2.ppo2 as ppo2
import os
try:
from mpi4py import MPI
except ImportError:
MPI = None
def make_env(rank): # pylint: disable=C0111
def make_env(rank, use_visual=True): # pylint: disable=C0111
env = UnityEnv(env_directory, rank, use_visual=True)
env = UnityEnv(env_directory, rank, use_visual=use_visual)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk

rank = MPI.COMM_WORLD.Get_rank()
env = UnityEnv(env_directory, rank, use_visual=False)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
return make_env(rank, use_visual=False)
def main():
env = make_unity_env('./envs/GridWorld', 4, True)
ppo2.learn(
network="mlp",
env=env,
total_timesteps=100000,
lr=1e-3,
)
if __name__ == '__main__':
main()
```
正在加载...
取消
保存