浏览代码

[Fix] Edit the gym-unity Readme to fix some issue in the sample code (#5331)

* Edit the gym-unity Readme to fix some issue in the sample code

* Update gym-unity/README.md

Co-authored-by: andrewcoh <54679309+andrewcoh@users.noreply.github.com>

* addressing comments

* Adding the action_seed parameter to the documentation

Co-authored-by: andrewcoh <54679309+andrewcoh@users.noreply.github.com>
/colab-links
GitHub 3 年前
当前提交
d258d17a
共有 1 个文件被更改,包括 23 次插入25 次删除
  1. 48
      gym-unity/README.md

48
gym-unity/README.md


Discrete. Otherwise, it will be converted into a MultiDiscrete. Defaults to
`False`.
- `allow_multiple_obs` will return a list of observations. The first elements contain the visual observations and the
last element contains the array of vector observations. If False the environment returns a single array (containing
a single visual observations, if present, otherwise the vector observation). Defaults to `False`.
- `allow_multiple_obs` will return a list of observations. The first elements
contain the visual observations and the last element contains the array of
vector observations. If False the environment returns a single array (containing
a single visual observations, if present, otherwise the vector observation).
Defaults to `False`.
- `action_space_seed` is the optional seed for action sampling. If non-None, will
be used to set the random seed on created gym.Space instances.
The returned environment `env` will function as a gym.

```
Next, create a file called `train_unity.py`. Then create an `/envs/` directory
and build the GridWorld environment to that directory. For more information on
and build the environment to that directory. For more information on
[here](../docs/Learning-Environment-Executable.md). Add the following code to
the `train_unity.py` file:
[here](../docs/Learning-Environment-Executable.md). Note that because of
limitations of the DQN baseline, the environment must have a single visual
observation, a single discrete action and a single Agent in the scene.
Add the following code to the `train_unity.py` file:
```python
import gym

from gym_unity.envs import UnityToGymWrapper
def main():
unity_env = UnityEnvironment("./envs/GridWorld")
env = UnityToGymWrapper(unity_env, 0, uint8_visual=True)
unity_env = UnityEnvironment(<path-to-environment>)
env = UnityToGymWrapper(unity_env, uint8_visual=True)
"cnn", # conv_only is also a good choice for GridWorld
"cnn", # For visual inputs
lr=2.5e-4,
total_timesteps=1000000,
buffer_size=50000,

"""
def make_env(rank, use_visual=True): # pylint: disable=C0111
def _thunk():
unity_env = UnityEnvironment(env_directory)
env = UnityToGymWrapper(unity_env, rank, uint8_visual=True)
unity_env = UnityEnvironment(env_directory, base_port=5000 + rank)
env = UnityToGymWrapper(unity_env, uint8_visual=True)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk

return DummyVecEnv([make_env(rank, use_visual=False)])
def main():
env = make_unity_env('./envs/GridWorld', 4, True)
env = make_unity_env(<path-to-environment>, 4, True)
ppo2.learn(
network="mlp",
env=env,

```python
game_version = 'v0' if sticky_actions else 'v4'
full_game_name = '{}NoFrameskip-{}'.format(game_name, game_version)
unity_env = UnityEnvironment('./envs/GridWorld')
unity_env = UnityEnvironment(<path-to-environment>)
`./envs/GridWorld` is the path to your built Unity executable. For more
`<path-to-environment>` is the path to your built Unity executable. For more
information on building Unity environments, see
[here](../docs/Learning-Environment-Executable.md), and note the Limitations
section below.

Since Dopamine is designed around variants of DQN, it is only compatible with
discrete action spaces, and specifically the Discrete Gym space. For
environments that use branched discrete action spaces (e.g.
[VisualBanana](../docs/Learning-Environment-Examples.md)), you can enable the
environments that use branched discrete action spaces, you can enable the
`flatten_branched` parameter in `UnityToGymWrapper`, which treats each
combination of branched actions as separate actions.

The hyperparameters provided by Dopamine are tailored to the Atari games, and
you will likely need to adjust them for ML-Agents environments. Here is a sample
`dopamine/agents/rainbow/configs/rainbow.gin` file that is known to work with
GridWorld.
a simple GridWorld.
```python
import dopamine.agents.rainbow.rainbow_agent

![Dopamine on GridWorld](images/dopamine_gridworld_plot.png)
### Example: VisualBanana
As an example of using the `flatten_branched` option, we also used the Rainbow
algorithm to train on the VisualBanana environment, and provide the results
below. The same hyperparameters were used as in the GridWorld case, except that
`replay_history` and `epsilon_decay` were increased to 100000.
![Dopamine on VisualBanana](images/dopamine_visualbanana_plot.png)
正在加载...
取消
保存