External contribution : Allow visual and vector observations at the same time (#3998)

* allow vector observations also when using visual observations * update changelog * Update CHANGELOG.md * Update __init__.py * remove trailing whitespace * Fix test case where visual and vector observations are used simultaneously * fix formatting * add test for visual and vector observations * Assert vector action shape * Fix test environment to return multiple visual observations * use_visual and allow_multiple_visual_obs are replaced by allow_multiple_obs which allows visual and vector observations to be used simultaneously. * fixing run_gym.py test * [ci] * Added some more tests and made the observation space a tuple when using multiple observations * Modifying the change log * Addding to the Migrating doc * Edits to Migrating.md * Simplification of the code to generate the observation spaces * Simplified warning messages * Adding contr...
5 年前 · 45737208
--- a/com.unity.ml-agents/CHANGELOG.md
+++ b/com.unity.ml-agents/CHANGELOG.md
 - `beta` and `epsilon` in `PPO` are no longer decayed by default but follow the same schedule as learning rate. (#3940)
 - `get_behavior_names()` and `get_behavior_spec()` on UnityEnvironment were replaced by the `behavior_specs` property. (#3946)
 - The first version of the Unity Environment Registry (Experimental) has been released. More information [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Unity-Environment-Registry.md)(#3967)
+- `use_visual` and `allow_multiple_visual_obs` in the `UnityToGymWrapper` constructor
+were replaced by `allow_multiple_obs` which allows one or more visual observations and
+vector observations to be used simultaneously. (#3981) Thank you @shakenes !
 ### Minor Changes
 #### com.unity.ml-agents (C#)
 - `ObservableAttribute` was added. Adding the attribute to fields or properties on an Agent will allow it to generate
--- a/docs/Migrating.md
+++ b/docs/Migrating.md
  configuration have all been moved to a single YAML file. (#3791)
 - `max_step` in the `TerminalStep` and `TerminalSteps` objects was renamed `interrupted`.
 - On the UnityEnvironment API, `get_behavior_names()` and `get_behavior_specs()` methods were combined into the property `behavior_specs` that contains a mapping from behavior names to behavior spec.
+- `use_visual` and `allow_multiple_visual_obs` in the `UnityToGymWrapper` constructor
+were replaced by `allow_multiple_obs` which allows one or more visual observations and
+vector observations to be used simultaneously.

 ### Steps to Migrate
 - Before upgrading, copy your `Behavior Name` sections from `trainer_config.yaml` into
  - If your training uses [parameter randomization](Training-ML-Agents.md#environment-parameter-randomization), move
  the contents of the sampler config to `parameter_randomization` in the main trainer configuration.
 - If you are using `UnityEnvironment` directly, replace `max_step` with `interrupted`
-in the `TerminalStep` and `TerminalSteps` objects.
+ in the `TerminalStep` and `TerminalSteps` objects.
+ - If you use the `UnityToGymWrapper`, remove `use_visual` and `allow_multiple_visual_obs`
+ from the constructor and add `allow_multiple_obs = True` if the environment contains either
+ both visual and vector observations or multiple visual observations.

 ## Migrating from 0.15 to Release 1

--- a/gym-unity/README.md
+++ b/gym-unity/README.md
 ```python
 from gym_unity.envs import UnityToGymWrapper

-env = UnityToGymWrapper(unity_environment, worker_id, use_visual, uint8_visual)
+env = UnityToGymWrapper(unity_environment, uint8_visual, allow_multiple_obs)
-
- `use_visual` refers to whether to use visual observations (True) or vector
-  observations (False) as the default observation provided by the `reset` and
-  `step` functions. Defaults to `False`.

 - `uint8_visual` refers to whether to output visual observations as `uint8`
  values (0-255). Many common Gym environments (e.g. Atari) do this. By default
  Discrete. Otherwise, it will be converted into a MultiDiscrete. Defaults to
  `False`.

- `allow_multiple_visual_obs` will return a list of observation instead of only
-  one if disabled. Defaults to `False`.
+- `allow_multiple_obs` will return a list of observations. The first elements contain the visual observations and the
+  last element contains the array of vector observations. If False the environment returns a single array (containing
+  a single visual observations, if present, otherwise the vector observation)

 The returned environment `env` will function as a gym.

 - By default, the first visual observation is provided as the `observation`, if
  present. Otherwise, vector observations are provided. You can receive all
-  visual observations by using the `allow_multiple_visual_obs=True` option in
+  visual and vector observations by using the `allow_multiple_obs=True` option in
-  instead of only the first one.
+  instead of only one.
 - The `TerminalSteps` or `DecisionSteps` output from the environment can still
  be accessed from the `info` provided by `env.step(action)`.
 - Stacked vector observations are not supported.

 def main():
    unity_env = UnityEnvironment("./envs/GridWorld")
-    env = UnityToGymWrapper(unity_env, 0, use_visual=True, uint8_visual=True)
+    env = UnityToGymWrapper(unity_env, 0, uint8_visual=True)
    logger.configure('./logs') # Çhange to log in a different directory
    act = deepq.learn(
        env,
    def make_env(rank, use_visual=True): # pylint: disable=C0111
        def _thunk():
            unity_env = UnityEnvironment(env_directory)
-            env = UnityToGymWrapper(unity_env, rank, use_visual=use_visual, uint8_visual=True)
+            env = UnityToGymWrapper(unity_env, rank, uint8_visual=True)
            env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
            return env
        return _thunk
    game_version = 'v0' if sticky_actions else 'v4'
    full_game_name = '{}NoFrameskip-{}'.format(game_name, game_version)
    unity_env = UnityEnvironment('./envs/GridWorld')
-    env = UnityToGymWrapper(unity_env, use_visual=True, uint8_visual=True)
+    env = UnityToGymWrapper(unity_env, uint8_visual=True)
    return env
 ```

--- a/gym-unity/gym_unity/envs/init.py
+++ b/gym-unity/gym_unity/envs/init.py
 import itertools
 import numpy as np
-from typing import Any, Dict, List, Optional, Tuple, Union
+from typing import Any, Dict, List, Tuple, Union

 import gym
 from gym import error, spaces
    def __init__(
        self,
        unity_env: BaseEnv,
-        use_visual: bool = False,
-        allow_multiple_visual_obs: bool = False,
+        allow_multiple_obs: bool = False,
-        :param use_visual: Whether to use visual observation or vector observation.
-        :param allow_multiple_visual_obs: If True, return a list of visual observations instead of only one.
+        :param allow_multiple_obs: If True, return a list of np.ndarrays as observations with the first elements
+            containing the visual observations and the last element containing the array of vector observations.
+            If False, returns a single np.ndarray containing either only a single visual observation or the array of
+            vector observations.
        """
        self._env = unity_env

        self._flattener = None
        # Hidden flag used by Atari environments to determine if the game is over
        self.game_over = False
-        self._allow_multiple_visual_obs = allow_multiple_visual_obs
+        self._allow_multiple_obs = allow_multiple_obs

        # Check brain configuration
        if len(self._env.behavior_specs) != 1:
        self.name = list(self._env.behavior_specs.keys())[0]
        self.group_spec = self._env.behavior_specs[self.name]

-        if use_visual and self._get_n_vis_obs() == 0:
+        if self._get_n_vis_obs() == 0 and self._get_vec_obs_size() == 0:
-                "`use_visual` was set to True, however there are no"
-                " visual observations as part of this environment."
+                "There are no observations provided by the environment."
-        self.use_visual = self._get_n_vis_obs() >= 1 and use_visual
-        if not use_visual and uint8_visual:
+        if not self._get_n_vis_obs() >= 1 and uint8_visual:
-                "`uint8_visual was set to true, but visual observations are not in use. "
+                "uint8_visual was set to true, but visual observations are not in use. "
-
-        if self._get_n_vis_obs() > 1 and not self._allow_multiple_visual_obs:
+        if (
+            self._get_n_vis_obs() + self._get_vec_obs_size() >= 2
+            and not self._allow_multiple_obs
+        ):
-                "The environment contains more than one visual observation. "
-                "You must define allow_multiple_visual_obs=True to received them all. "
-                "Otherwise, please note that only the first will be provided in the observation."
+                "The environment contains multiple observations. "
+                "You must define allow_multiple_obs=True to receive them all. "
+                "Otherwise, only the first visual observation (or vector observation if"
+                "there are no visual observations) will be provided in the observation."
            )

        # Check for number of agents in scene.
        self._previous_decision_step = decision_steps

-        # Set observation and action spaces
+        # Set action spaces
        if self.group_spec.is_action_discrete():
            branches = self.group_spec.discrete_action_branches
            if self.group_spec.action_shape == 1:
                )
            high = np.array([1] * self.group_spec.action_shape)
            self._action_space = spaces.Box(-high, high, dtype=np.float32)
-        high = np.array([np.inf] * self._get_vec_obs_size())
-        if self.use_visual:
-            shape = self._get_vis_obs_shape()
+
+        # Set observations space
+        list_spaces: List[gym.Space] = []
+        shapes = self._get_vis_obs_shape()
+        for shape in shapes:
-                self._observation_space = spaces.Box(
-                    0, 255, dtype=np.uint8, shape=shape
-                )
+                list_spaces.append(spaces.Box(0, 255, dtype=np.uint8, shape=shape))
-                self._observation_space = spaces.Box(
-                    0, 1, dtype=np.float32, shape=shape
-                )
-
+                list_spaces.append(spaces.Box(0, 1, dtype=np.float32, shape=shape))
+        if self._get_vec_obs_size() > 0:
+            # vector observation is last
+            high = np.array([np.inf] * self._get_vec_obs_size())
+            list_spaces.append(spaces.Box(-high, high, dtype=np.float32))
+        if self._allow_multiple_obs:
+            self._observation_space = spaces.Tuple(list_spaces)
-            self._observation_space = spaces.Box(-high, high, dtype=np.float32)
+            self._observation_space = list_spaces[0]  # only return the first one

    def reset(self) -> Union[List[np.ndarray], np.ndarray]:
        """Resets the state of the environment and returns an initial observation.
            return self._single_step(decision_step)

    def _single_step(self, info: Union[DecisionSteps, TerminalSteps]) -> GymStepResult:
-        if self.use_visual:
+        if self._allow_multiple_obs:
+            visual_obs_list = []
+            for obs in visual_obs:
+                visual_obs_list.append(self._preprocess_single(obs[0]))
+            default_observation = visual_obs_list
+            if self._get_vec_obs_size() >= 1:
+                default_observation.append(self._get_vector_obs(info)[0, :])
+        else:
+            if self._get_n_vis_obs() >= 1:
+                visual_obs = self._get_vis_obs_list(info)
+                default_observation = self._preprocess_single(visual_obs[0][0])
+            else:
+                default_observation = self._get_vector_obs(info)[0, :]
-            if self._allow_multiple_visual_obs:
-                visual_obs_list = []
-                for obs in visual_obs:
-                    visual_obs_list.append(self._preprocess_single(obs[0]))
-                self.visual_obs = visual_obs_list
-            else:
-                self.visual_obs = self._preprocess_single(visual_obs[0][0])
+        if self._get_n_vis_obs() >= 1:
+            visual_obs = self._get_vis_obs_list(info)
+            self.visual_obs = self._preprocess_single(visual_obs[0][0])
-            default_observation = self.visual_obs
-        elif self._get_vec_obs_size() > 0:
-            default_observation = self._get_vector_obs(info)[0, :]
-        else:
-            raise UnityGymException(
-                "The Agent does not have vector observations and the environment was not setup "
-                + "to use visual observations."
-            )
        done = isinstance(info, TerminalSteps)

        return (default_observation, info.reward[0], done, {"step": info})
                result += 1
        return result

-    def _get_vis_obs_shape(self) -> Optional[Tuple]:
+    def _get_vis_obs_shape(self) -> List[Tuple]:
+        result: List[Tuple] = []
-                return shape
-        return None
+                result.append(shape)
+        return result

    def _get_vis_obs_list(
        self, step_result: Union[DecisionSteps, TerminalSteps]
--- a/gym-unity/gym_unity/tests/test_gym.py
+++ b/gym-unity/gym_unity/tests/test_gym.py
    setup_mock_unityenvironment(
        mock_env, mock_spec, mock_decision_step, mock_terminal_step
    )
-    env = UnityToGymWrapper(mock_env, use_visual=False)
+    env = UnityToGymWrapper(mock_env)
    assert isinstance(env, UnityToGymWrapper)
    assert isinstance(env.reset(), np.ndarray)
    actions = env.action_space.sample()
        mock_env, mock_spec, mock_decision_step, mock_terminal_step
    )

-    env = UnityToGymWrapper(mock_env, use_visual=False, flatten_branched=True)
+    env = UnityToGymWrapper(mock_env, flatten_branched=True)
    assert isinstance(env.action_space, spaces.Discrete)
    assert env.action_space.n == 12
    assert env._flattener.lookup_action(0) == [0, 0, 0]
-    env = UnityToGymWrapper(mock_env, use_visual=False, flatten_branched=False)
+    env = UnityToGymWrapper(mock_env, flatten_branched=False)
    assert isinstance(env.action_space, spaces.MultiDiscrete)


-    mock_spec = create_mock_group_spec(number_visual_observations=1)
+    mock_spec = create_mock_group_spec(
+        number_visual_observations=1, vector_observation_space_size=0
+    )
    mock_decision_step, mock_terminal_step = create_mock_vector_steps(
        mock_spec, number_visual_observations=1
    )

-    env = UnityToGymWrapper(mock_env, use_visual=True, uint8_visual=use_uint8)
+    env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8)
+    assert isinstance(env.observation_space, spaces.Box)
    assert isinstance(env, UnityToGymWrapper)
    assert isinstance(env.reset(), np.ndarray)
    actions = env.action_space.sample()
    assert isinstance(info, dict)


+@pytest.mark.parametrize("use_uint8", [True, False], ids=["float", "uint8"])
+def test_gym_wrapper_single_visual_and_vector(use_uint8):
+    mock_env = mock.MagicMock()
+    mock_spec = create_mock_group_spec(
+        number_visual_observations=1,
+        vector_observation_space_size=3,
+        vector_action_space_size=[2],
+    )
+    mock_decision_step, mock_terminal_step = create_mock_vector_steps(
+        mock_spec, number_visual_observations=1
+    )
+    setup_mock_unityenvironment(
+        mock_env, mock_spec, mock_decision_step, mock_terminal_step
+    )
+
+    env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8, allow_multiple_obs=True)
+    assert isinstance(env, UnityToGymWrapper)
+    assert isinstance(env.observation_space, spaces.Tuple)
+    assert len(env.observation_space) == 2
+    reset_obs = env.reset()
+    assert isinstance(reset_obs, list)
+    assert len(reset_obs) == 2
+    assert all(isinstance(ob, np.ndarray) for ob in reset_obs)
+    assert reset_obs[-1].shape == (3,)
+    assert len(reset_obs[0].shape) == 3
+    actions = env.action_space.sample()
+    assert actions.shape == (2,)
+    obs, rew, done, info = env.step(actions)
+    assert isinstance(obs, list)
+    assert len(obs) == 2
+    assert all(isinstance(ob, np.ndarray) for ob in obs)
+    assert reset_obs[-1].shape == (3,)
+    assert isinstance(rew, float)
+    assert isinstance(done, (bool, np.bool_))
+    assert isinstance(info, dict)
+
+    # check behaviour for allow_multiple_obs = False
+    env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8, allow_multiple_obs=False)
+    assert isinstance(env, UnityToGymWrapper)
+    assert isinstance(env.observation_space, spaces.Box)
+    reset_obs = env.reset()
+    assert isinstance(reset_obs, np.ndarray)
+    assert len(reset_obs.shape) == 3
+    actions = env.action_space.sample()
+    assert actions.shape == (2,)
+    obs, rew, done, info = env.step(actions)
+    assert isinstance(obs, np.ndarray)
+
+
+@pytest.mark.parametrize("use_uint8", [True, False], ids=["float", "uint8"])
+def test_gym_wrapper_multi_visual_and_vector(use_uint8):
+    mock_env = mock.MagicMock()
+    mock_spec = create_mock_group_spec(
+        number_visual_observations=2,
+        vector_observation_space_size=3,
+        vector_action_space_size=[2],
+    )
+    mock_decision_step, mock_terminal_step = create_mock_vector_steps(
+        mock_spec, number_visual_observations=2
+    )
+    setup_mock_unityenvironment(
+        mock_env, mock_spec, mock_decision_step, mock_terminal_step
+    )
+
+    env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8, allow_multiple_obs=True)
+    assert isinstance(env, UnityToGymWrapper)
+    assert isinstance(env.observation_space, spaces.Tuple)
+    assert len(env.observation_space) == 3
+    reset_obs = env.reset()
+    assert isinstance(reset_obs, list)
+    assert len(reset_obs) == 3
+    assert all(isinstance(ob, np.ndarray) for ob in reset_obs)
+    assert reset_obs[-1].shape == (3,)
+    actions = env.action_space.sample()
+    assert actions.shape == (2,)
+    obs, rew, done, info = env.step(actions)
+    assert all(isinstance(ob, np.ndarray) for ob in obs)
+    assert isinstance(rew, float)
+    assert isinstance(done, (bool, np.bool_))
+    assert isinstance(info, dict)
+
+    # check behaviour for allow_multiple_obs = False
+    env = UnityToGymWrapper(mock_env, uint8_visual=use_uint8, allow_multiple_obs=False)
+    assert isinstance(env, UnityToGymWrapper)
+    assert isinstance(env.observation_space, spaces.Box)
+    reset_obs = env.reset()
+    assert isinstance(reset_obs, np.ndarray)
+    assert len(reset_obs.shape) == 3
+    actions = env.action_space.sample()
+    assert actions.shape == (2,)
+    obs, rew, done, info = env.step(actions)
+    assert isinstance(obs, np.ndarray)
+
+
 # Helper methods


    """
    obs = [np.array([num_agents * [1, 2, 3]]).reshape(num_agents, 3)]
    if number_visual_observations:
-        obs += [np.zeros(shape=(num_agents, 8, 8, 3), dtype=np.float32)]
+        obs += [
+            np.zeros(shape=(num_agents, 8, 8, 3), dtype=np.float32)
+        ] * number_visual_observations
    rewards = np.array(num_agents * [1.0])
    agents = np.array(range(0, num_agents))
    return DecisionSteps(obs, rewards, agents, None), TerminalSteps.empty(specs)
--- a/ml-agents/tests/yamato/scripts/run_gym.py
+++ b/ml-agents/tests/yamato/scripts/run_gym.py
    :param env_name: Name of the Unity environment binary to launch
    """
    u_env = UnityEnvironment(env_name, worker_id=1, no_graphics=True)
-    env = UnityToGymWrapper(u_env, use_visual=False)
+    env = UnityToGymWrapper(u_env)

    try:
        # Examine environment parameters

    try:
        env1 = UnityToGymWrapper(
-            UnityEnvironment(env_name, worker_id=1, no_graphics=True), use_visual=False
+            UnityEnvironment(env_name, worker_id=1, no_graphics=True)
-            UnityEnvironment(env_name, worker_id=1, no_graphics=True), use_visual=False
+            UnityEnvironment(env_name, worker_id=1, no_graphics=True)
-            UnityEnvironment(env_name, worker_id=2, no_graphics=True), use_visual=False
+            UnityEnvironment(env_name, worker_id=2, no_graphics=True)
        )
        env2.reset()
    finally: