浏览代码

External contribution : Allow visual and vector observations at the same time (#3998)

* allow vector observations also when using visual observations

* update changelog

* Update CHANGELOG.md

* Update __init__.py

* remove trailing whitespace

* Fix test case where visual and vector observations are used simultaneously

* fix formatting

* add test for visual and vector observations

* Assert vector action shape

* Fix test environment to return multiple visual observations

* use_visual and allow_multiple_visual_obs are replaced by allow_multiple_obs which allows visual and vector observations to be used simultaneously.

* fixing run_gym.py test

* [ci]

* Added some more tests and made the observation space a tuple when using multiple observations

* Modifying the change log

* Addding to the Migrating doc

* Edits to Migrating.md

* Simplification of the code to generate the observation spaces

* Simplified warning messages

* Adding contr...
/docs-update
GitHub 5 年前
当前提交
45737208
共有 6 个文件被更改,包括 182 次插入71 次删除
  1. 3
      com.unity.ml-agents/CHANGELOG.md
  2. 8
      docs/Migrating.md
  3. 21
      gym-unity/README.md
  4. 102
      gym-unity/gym_unity/envs/__init__.py
  5. 111
      gym-unity/gym_unity/tests/test_gym.py
  6. 8
      ml-agents/tests/yamato/scripts/run_gym.py

3
com.unity.ml-agents/CHANGELOG.md


- `beta` and `epsilon` in `PPO` are no longer decayed by default but follow the same schedule as learning rate. (#3940)
- `get_behavior_names()` and `get_behavior_spec()` on UnityEnvironment were replaced by the `behavior_specs` property. (#3946)
- The first version of the Unity Environment Registry (Experimental) has been released. More information [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Unity-Environment-Registry.md)(#3967)
- `use_visual` and `allow_multiple_visual_obs` in the `UnityToGymWrapper` constructor
were replaced by `allow_multiple_obs` which allows one or more visual observations and
vector observations to be used simultaneously. (#3981) Thank you @shakenes !
### Minor Changes
#### com.unity.ml-agents (C#)
- `ObservableAttribute` was added. Adding the attribute to fields or properties on an Agent will allow it to generate

8
docs/Migrating.md


configuration have all been moved to a single YAML file. (#3791)
- `max_step` in the `TerminalStep` and `TerminalSteps` objects was renamed `interrupted`.
- On the UnityEnvironment API, `get_behavior_names()` and `get_behavior_specs()` methods were combined into the property `behavior_specs` that contains a mapping from behavior names to behavior spec.
- `use_visual` and `allow_multiple_visual_obs` in the `UnityToGymWrapper` constructor
were replaced by `allow_multiple_obs` which allows one or more visual observations and
vector observations to be used simultaneously.
### Steps to Migrate
- Before upgrading, copy your `Behavior Name` sections from `trainer_config.yaml` into

- If your training uses [parameter randomization](Training-ML-Agents.md#environment-parameter-randomization), move
the contents of the sampler config to `parameter_randomization` in the main trainer configuration.
- If you are using `UnityEnvironment` directly, replace `max_step` with `interrupted`
in the `TerminalStep` and `TerminalSteps` objects.
in the `TerminalStep` and `TerminalSteps` objects.
- If you use the `UnityToGymWrapper`, remove `use_visual` and `allow_multiple_visual_obs`
from the constructor and add `allow_multiple_obs = True` if the environment contains either
both visual and vector observations or multiple visual observations.
## Migrating from 0.15 to Release 1

21
gym-unity/README.md


```python
from gym_unity.envs import UnityToGymWrapper
env = UnityToGymWrapper(unity_environment, worker_id, use_visual, uint8_visual)
env = UnityToGymWrapper(unity_environment, uint8_visual, allow_multiple_obs)
- `use_visual` refers to whether to use visual observations (True) or vector
observations (False) as the default observation provided by the `reset` and
`step` functions. Defaults to `False`.
- `uint8_visual` refers to whether to output visual observations as `uint8`
values (0-255). Many common Gym environments (e.g. Atari) do this. By default

Discrete. Otherwise, it will be converted into a MultiDiscrete. Defaults to
`False`.
- `allow_multiple_visual_obs` will return a list of observation instead of only
one if disabled. Defaults to `False`.
- `allow_multiple_obs` will return a list of observations. The first elements contain the visual observations and the
last element contains the array of vector observations. If False the environment returns a single array (containing
a single visual observations, if present, otherwise the vector observation)
The returned environment `env` will function as a gym.

- By default, the first visual observation is provided as the `observation`, if
present. Otherwise, vector observations are provided. You can receive all
visual observations by using the `allow_multiple_visual_obs=True` option in
visual and vector observations by using the `allow_multiple_obs=True` option in
instead of only the first one.
instead of only one.
- The `TerminalSteps` or `DecisionSteps` output from the environment can still
be accessed from the `info` provided by `env.step(action)`.
- Stacked vector observations are not supported.

def main():
unity_env = UnityEnvironment("./envs/GridWorld")
env = UnityToGymWrapper(unity_env, 0, use_visual=True, uint8_visual=True)
env = UnityToGymWrapper(unity_env, 0, uint8_visual=True)
logger.configure('./logs') # Çhange to log in a different directory
act = deepq.learn(
env,

def make_env(rank, use_visual=True): # pylint: disable=C0111
def _thunk():
unity_env = UnityEnvironment(env_directory)
env = UnityToGymWrapper(unity_env, rank, use_visual=use_visual, uint8_visual=True)
env = UnityToGymWrapper(unity_env, rank, uint8_visual=True)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk

game_version = 'v0' if sticky_actions else 'v4'
full_game_name = '{}NoFrameskip-{}'.format(game_name, game_version)
unity_env = UnityEnvironment('./envs/GridWorld')
env = UnityToGymWrapper(unity_env, use_visual=True, uint8_visual=True)
env = UnityToGymWrapper(unity_env, uint8_visual=True)
return env
```

102
gym-unity/gym_unity/envs/__init__.py


import itertools
import numpy as np
from typing import Any, Dict, List, Optional, Tuple, Union
from typing import Any, Dict, List, Tuple, Union
import gym
from gym import error, spaces

def __init__(
self,
unity_env: BaseEnv,
use_visual: bool = False,
allow_multiple_visual_obs: bool = False,
allow_multiple_obs: bool = False,
:param use_visual: Whether to use visual observation or vector observation.
:param allow_multiple_visual_obs: If True, return a list of visual observations instead of only one.
:param allow_multiple_obs: If True, return a list of np.ndarrays as observations with the first elements
containing the visual observations and the last element containing the array of vector observations.
If False, returns a single np.ndarray containing either only a single visual observation or the array of
vector observations.
"""
self._env = unity_env

self._flattener = None
# Hidden flag used by Atari environments to determine if the game is over
self.game_over = False
self._allow_multiple_visual_obs = allow_multiple_visual_obs
self._allow_multiple_obs = allow_multiple_obs
# Check brain configuration
if len(self._env.behavior_specs) != 1:

self.name = list(self._env.behavior_specs.keys())[0]
self.group_spec = self._env.behavior_specs[self.name]
if use_visual and self._get_n_vis_obs() == 0:
if self._get_n_vis_obs() == 0 and self._get_vec_obs_size() == 0:
"`use_visual` was set to True, however there are no"
" visual observations as part of this environment."
"There are no observations provided by the environment."
self.use_visual = self._get_n_vis_obs() >= 1 and use_visual
if not use_visual and uint8_visual:
if not self._get_n_vis_obs() >= 1 and uint8_visual:
"`uint8_visual was set to true, but visual observations are not in use. "
"uint8_visual was set to true, but visual observations are not in use. "
if self._get_n_vis_obs() > 1 and not self._allow_multiple_visual_obs:
if (
self._get_n_vis_obs() + self._get_vec_obs_size() >= 2
and not self._allow_multiple_obs
):
"The environment contains more than one visual observation. "
"You must define allow_multiple_visual_obs=True to received them all. "
"Otherwise, please note that only the first will be provided in the observation."
"The environment contains multiple observations. "
"You must define allow_multiple_obs=True to receive them all. "
"Otherwise, only the first visual observation (or vector observation if"
"there are no visual observations) will be provided in the observation."
)
# Check for number of agents in scene.

self._previous_decision_step = decision_steps
# Set observation and action spaces
# Set action spaces
if self.group_spec.is_action_discrete():
branches = self.group_spec.discrete_action_branches
if self.group_spec.action_shape == 1:

)
high = np.array([1] * self.group_spec.action_shape)
self._action_space = spaces.Box(-high, high, dtype=np.float32)
high = np.array([np.inf] * self._get_vec_obs_size())
if self.use_visual:
shape = self._get_vis_obs_shape()
# Set observations space
list_spaces: List[gym.Space] = []
shapes = self._get_vis_obs_shape()
for shape in shapes:
self._observation_space = spaces.Box(
0, 255, dtype=np.uint8, shape=shape
)
list_spaces.append(spaces.Box(0, 255, dtype=np.uint8, shape=shape))
self._observation_space = spaces.Box(
0, 1, dtype=np.float32, shape=shape
)
list_spaces.append(spaces.Box(0, 1, dtype=np.float32, shape=shape))
if self._get_vec_obs_size() > 0:
# vector observation is last
high = np.array([np.inf] * self._get_vec_obs_size())
list_spaces.append(spaces.Box(-high, high, dtype=np.float32))
if self._allow_multiple_obs:
self._observation_space = spaces.Tuple(list_spaces)
self._observation_space = spaces.Box(-high, high, dtype=np.float32)
self._observation_space = list_spaces[0] # only return the first one
def reset(self) -> Union[List[np.ndarray], np.ndarray]:
"""Resets the state of the environment and returns an initial observation.

return self._single_step(decision_step)
def _single_step(self, info: Union[DecisionSteps, TerminalSteps]) -> GymStepResult:
if self.use_visual:
if self._allow_multiple_obs:
visual_obs_list = []
for obs in visual_obs:
visual_obs_list.append(self._preprocess_single(obs[0]))
default_observation = visual_obs_list
if self._get_vec_obs_size() >= 1:
default_observation.append(self._get_vector_obs(info)[0, :])
else:
if self._get_n_vis_obs() >= 1:
visual_obs = self._get_vis_obs_list(info)
default_observation = self._preprocess_single(visual_obs[0][0])
else:
default_observation = self._get_vector_obs(info)[0, :]
if self._allow_multiple_visual_obs:
visual_obs_list = []
for obs in visual_obs:
visual_obs_list.append(self._preprocess_single(obs[0]))
self.visual_obs = visual_obs_list
else:
self.visual_obs = self._preprocess_single(visual_obs[0][0])
if self._get_n_vis_obs() >= 1:
visual_obs = self._get_vis_obs_list(info)
self.visual_obs = self._preprocess_single(visual_obs[0][0])
default_observation = self.visual_obs
elif self._get_vec_obs_size() > 0:
default_observation = self._get_vector_obs(info)[0, :]
else:
raise UnityGymException(
"The Agent does not have vector observations and the environment was not setup "
+ "to use visual observations."
)
done = isinstance(info, TerminalSteps)
return (default_observation, info.reward[0], done, {"step": info})

result += 1
return result
def _get_vis_obs_shape(self) -> Optional[Tuple]:
def _get_vis_obs_shape(self) -> List[Tuple]:
result: List[Tuple] = []
return shape
return None
result.append(shape)
return result
def _get_vis_obs_list(
self, step_result: Union[DecisionSteps, TerminalSteps]

111
gym-unity/gym_unity/tests/test_gym.py


setup_mock_unityenvironment(
mock_env, mock_spec, mock_decision_step, mock_terminal_step
)
env = UnityToGymWrapper(mock_env, use_visual=False)
env = UnityToGymWrapper(mock_env)
assert isinstance(env, UnityToGymWrapper)
assert isinstance(env.reset(), np.ndarray)
actions = env.action_space.sample()

mock_env, mock_spec, mock_decision_step, mock_terminal_step
)
env = UnityToGymWrapper(mock_env, use_visual=False, flatten_branched=True)
env = UnityToGymWrapper(mock_env, flatten_branched=True)
assert isinstance(env.action_space, spaces.Discrete)
assert env.action_space.n == 12
assert env._flattener.lookup_action(0) == [0, 0, 0]

env = UnityToGymWrapper(mock_env, use_visual=False, flatten_branched=False)
env = UnityToGymWrapper(mock_env, flatten_branched=False)
assert isinstance(env.action_space, spaces.MultiDiscrete)

mock_spec = create_mock_group_spec(number_visual_observations=1)
mock_spec = create_mock_group_spec(
number_visual_observations=1, vector_observation_space_size=0
)
mock_decision_step, mock_terminal_step = create_mock_vector_steps(
mock_spec, number_visual_observations=1
)

env = UnityToGymWrapper(mock_env, use_visual=True, uint8_visual=use_uint8)
env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8)
assert isinstance(env.observation_space, spaces.Box)
assert isinstance(env, UnityToGymWrapper)
assert isinstance(env.reset(), np.ndarray)
actions = env.action_space.sample()

assert isinstance(info, dict)
@pytest.mark.parametrize("use_uint8", [True, False], ids=["float", "uint8"])
def test_gym_wrapper_single_visual_and_vector(use_uint8):
mock_env = mock.MagicMock()
mock_spec = create_mock_group_spec(
number_visual_observations=1,
vector_observation_space_size=3,
vector_action_space_size=[2],
)
mock_decision_step, mock_terminal_step = create_mock_vector_steps(
mock_spec, number_visual_observations=1
)
setup_mock_unityenvironment(
mock_env, mock_spec, mock_decision_step, mock_terminal_step
)
env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8, allow_multiple_obs=True)
assert isinstance(env, UnityToGymWrapper)
assert isinstance(env.observation_space, spaces.Tuple)
assert len(env.observation_space) == 2
reset_obs = env.reset()
assert isinstance(reset_obs, list)
assert len(reset_obs) == 2
assert all(isinstance(ob, np.ndarray) for ob in reset_obs)
assert reset_obs[-1].shape == (3,)
assert len(reset_obs[0].shape) == 3
actions = env.action_space.sample()
assert actions.shape == (2,)
obs, rew, done, info = env.step(actions)
assert isinstance(obs, list)
assert len(obs) == 2
assert all(isinstance(ob, np.ndarray) for ob in obs)
assert reset_obs[-1].shape == (3,)
assert isinstance(rew, float)
assert isinstance(done, (bool, np.bool_))
assert isinstance(info, dict)
# check behaviour for allow_multiple_obs = False
env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8, allow_multiple_obs=False)
assert isinstance(env, UnityToGymWrapper)
assert isinstance(env.observation_space, spaces.Box)
reset_obs = env.reset()
assert isinstance(reset_obs, np.ndarray)
assert len(reset_obs.shape) == 3
actions = env.action_space.sample()
assert actions.shape == (2,)
obs, rew, done, info = env.step(actions)
assert isinstance(obs, np.ndarray)
@pytest.mark.parametrize("use_uint8", [True, False], ids=["float", "uint8"])
def test_gym_wrapper_multi_visual_and_vector(use_uint8):
mock_env = mock.MagicMock()
mock_spec = create_mock_group_spec(
number_visual_observations=2,
vector_observation_space_size=3,
vector_action_space_size=[2],
)
mock_decision_step, mock_terminal_step = create_mock_vector_steps(
mock_spec, number_visual_observations=2
)
setup_mock_unityenvironment(
mock_env, mock_spec, mock_decision_step, mock_terminal_step
)
env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8, allow_multiple_obs=True)
assert isinstance(env, UnityToGymWrapper)
assert isinstance(env.observation_space, spaces.Tuple)
assert len(env.observation_space) == 3
reset_obs = env.reset()
assert isinstance(reset_obs, list)
assert len(reset_obs) == 3
assert all(isinstance(ob, np.ndarray) for ob in reset_obs)
assert reset_obs[-1].shape == (3,)
actions = env.action_space.sample()
assert actions.shape == (2,)
obs, rew, done, info = env.step(actions)
assert all(isinstance(ob, np.ndarray) for ob in obs)
assert isinstance(rew, float)
assert isinstance(done, (bool, np.bool_))
assert isinstance(info, dict)
# check behaviour for allow_multiple_obs = False
env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8, allow_multiple_obs=False)
assert isinstance(env, UnityToGymWrapper)
assert isinstance(env.observation_space, spaces.Box)
reset_obs = env.reset()
assert isinstance(reset_obs, np.ndarray)
assert len(reset_obs.shape) == 3
actions = env.action_space.sample()
assert actions.shape == (2,)
obs, rew, done, info = env.step(actions)
assert isinstance(obs, np.ndarray)
# Helper methods

"""
obs = [np.array([num_agents * [1, 2, 3]]).reshape(num_agents, 3)]
if number_visual_observations:
obs += [np.zeros(shape=(num_agents, 8, 8, 3), dtype=np.float32)]
obs += [
np.zeros(shape=(num_agents, 8, 8, 3), dtype=np.float32)
] * number_visual_observations
rewards = np.array(num_agents * [1.0])
agents = np.array(range(0, num_agents))
return DecisionSteps(obs, rewards, agents, None), TerminalSteps.empty(specs)

8
ml-agents/tests/yamato/scripts/run_gym.py


:param env_name: Name of the Unity environment binary to launch
"""
u_env = UnityEnvironment(env_name, worker_id=1, no_graphics=True)
env = UnityToGymWrapper(u_env, use_visual=False)
env = UnityToGymWrapper(u_env)
try:
# Examine environment parameters

try:
env1 = UnityToGymWrapper(
UnityEnvironment(env_name, worker_id=1, no_graphics=True), use_visual=False
UnityEnvironment(env_name, worker_id=1, no_graphics=True)
UnityEnvironment(env_name, worker_id=1, no_graphics=True), use_visual=False
UnityEnvironment(env_name, worker_id=1, no_graphics=True)
UnityEnvironment(env_name, worker_id=2, no_graphics=True), use_visual=False
UnityEnvironment(env_name, worker_id=2, no_graphics=True)
)
env2.reset()
finally:

正在加载...
取消
保存