浏览代码

Merge branch 'master' into develop-sampler-refactor

/sampler-refactor-copy
Andrew Cohen 5 年前
当前提交
e7750fc9
共有 31 个文件被更改,包括 443 次插入99 次删除
  1. 4
      .pre-commit-search-and-replace.yaml
  2. 4
      README.md
  3. 16
      com.unity.ml-agents/CHANGELOG.md
  4. 10
      config/sac/Walker.yaml
  5. 8
      config/sac/WallJump.yaml
  6. 2
      docs/Background-Machine-Learning.md
  7. 14
      docs/FAQ.md
  8. 2
      docs/Installation.md
  9. 38
      docs/Learning-Environment-Create-New.md
  10. 4
      docs/Migrating.md
  11. 8
      docs/Training-ML-Agents.md
  12. 2
      docs/Training-on-Amazon-Web-Service.md
  13. 5
      ml-agents-envs/mlagents_envs/rpc_communicator.py
  14. 2
      ml-agents/README.md
  15. 10
      ml-agents/mlagents/model_serialization.py
  16. 7
      ml-agents/mlagents/trainers/cli_utils.py
  17. 2
      ml-agents/mlagents/trainers/ghost/trainer.py
  18. 22
      ml-agents/mlagents/trainers/learn.py
  19. 25
      ml-agents/mlagents/trainers/meta_curriculum.py
  20. 62
      ml-agents/mlagents/trainers/policy/tf_policy.py
  21. 22
      ml-agents/mlagents/trainers/sac/trainer.py
  22. 1
      ml-agents/mlagents/trainers/settings.py
  23. 7
      ml-agents/mlagents/trainers/tests/test_learn.py
  24. 23
      ml-agents/mlagents/trainers/tests/test_meta_curriculum.py
  25. 21
      ml-agents/mlagents/trainers/tests/test_nn_policy.py
  26. 8
      ml-agents/mlagents/trainers/tests/test_policy.py
  27. 16
      ml-agents/mlagents/trainers/tests/test_sac.py
  28. 18
      ml-agents/mlagents/trainers/tests/test_simple_rl.py
  29. 4
      ml-agents/mlagents/trainers/trainer_controller.py
  30. 60
      ml-agents/mlagents/trainers/tests/test_training_status.py
  31. 115
      ml-agents/mlagents/trainers/training_status.py

4
.pre-commit-search-and-replace.yaml


search: /ML[ -]Agents toolkit/
replacement: ML-Agents Toolkit
insensitive: true
- description: Replace "the the"
search: /the the/
replacement: the
insensitive: true

4
README.md


[contribution guidelines](com.unity.ml-agents/CONTRIBUTING.md) and
[code of conduct](CODE_OF_CONDUCT.md).
For problems with the installation and setup of the the ML-Agents Toolkit, or
For problems with the installation and setup of the ML-Agents Toolkit, or
using the ML-Agents Toolkit, or have a specific feature requests, please
using the ML-Agents Toolkit or have a specific feature request, please
[submit a GitHub issue](https://github.com/Unity-Technologies/ml-agents/issues).
Your opinion matters a great deal to us. Only by hearing your thoughts on the

16
com.unity.ml-agents/CHANGELOG.md


- `use_visual` and `allow_multiple_visual_obs` in the `UnityToGymWrapper` constructor
were replaced by `allow_multiple_obs` which allows one or more visual observations and
vector observations to be used simultaneously. (#3981) Thank you @shakenes !
### Minor Changes
#### com.unity.ml-agents (C#)
- `ObservableAttribute` was added. Adding the attribute to fields or properties on an Agent will allow it to generate
observations via reflection. (#3925, #4006)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- Curriculum and Parameter Randomization configurations have been merged
into the main training configuration file. Note that this means training
configuration files are now environment-specific. (#3791)

directory. (#3829)
- When using Curriculum, the current lesson will resume if training is quit and resumed. As such,
the `--lesson` CLI option has been removed. (#4025)
### Minor Changes
#### com.unity.ml-agents (C#)
- `ObservableAttribute` was added. Adding the attribute to fields or properties on an Agent will allow it to generate
observations via reflection. (#3925, #4006)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- When trying to load/resume from a checkpoint created with an earlier verison of ML-Agents,
a warning will be thrown. (#4035)
- Fixed an issue where SAC would perform too many model updates when resuming from a
checkpoint, and too few when using `buffer_init_steps`. (#4038)
#### com.unity.ml-agents (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)

10
config/sac/Walker.yaml


hyperparameters:
learning_rate: 0.0003
learning_rate_schedule: constant
batch_size: 256
buffer_size: 500000
batch_size: 1024
buffer_size: 2000000
buffer_init_steps: 0
tau: 0.005
steps_per_update: 30.0

network_settings:
normalize: true
hidden_units: 512
num_layers: 4
hidden_units: 256
num_layers: 3
vis_encode_type: simple
reward_signals:
extrinsic:

keep_checkpoints: 5
max_steps: 20000000
max_steps: 15000000
time_horizon: 1000
summary_freq: 30000
threaded: true

8
config/sac/WallJump.yaml


learning_rate: 0.0003
learning_rate_schedule: constant
batch_size: 128
buffer_size: 50000
buffer_size: 200000
steps_per_update: 10.0
steps_per_update: 20.0
save_replay_buffer: false
init_entcoef: 0.1
reward_signal_steps_per_update: 10.0

strength: 1.0
output_path: default
keep_checkpoints: 5
max_steps: 20000000
max_steps: 15000000
time_horizon: 128
summary_freq: 20000
threaded: true

buffer_size: 50000
buffer_init_steps: 0
tau: 0.005
steps_per_update: 10.0
steps_per_update: 20.0
save_replay_buffer: false
init_entcoef: 0.1
reward_signal_steps_per_update: 10.0

2
docs/Background-Machine-Learning.md


water hose and whether the hose is on or off).
The last remaining piece of the reinforcement learning task is the **reward
signal**. When training a robot to be a mean firefighting machine, we provide it
signal**. The robot is trained to learn a policy that maximizes its overall rewards. When training a robot to be a mean firefighting machine, we provide it
with rewards (positive and negative) indicating how well it is doing on
completing the task. Note that the robot does not _know_ how to put out fires
before it is trained. It learns the objective because it receives a large

14
docs/FAQ.md


search the tensorflow github issues for similar problems and solutions before
creating a new issue.
#### Visual C++ Dependency (Windows Users)
When running `mlagents-learn`, if you see a stack trace with a message like this:
```console
ImportError: DLL load failed: The specified module could not be found.
```
then either of the required DLLs, `msvcp140.dll` (old) or `msvcp140_1.dll` (new), are missing on your machine. The `import tensorflow` command will print this warning message.
To solve it, download and install (with a reboot) the install [Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019](https://support.microsoft.com/en-my/help/2977003/the-latest-supported-visual-c-downloads).
For more details, please see the [TensorFlow 2.1.0 release notes](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0)
and the [TensorFlow github issue](https://github.com/tensorflow/tensorflow/issues/22794#issuecomment-573297027).
## Environment Permission Error
If you directly import your Unity environment without building it in the editor,

2
docs/Installation.md


order to find it.
**NOTE:** If you do not see the ML-Agents package listed in the Package Manager
please follow the the [advanced installation instructions](#advanced-local-installation-for-development) below.
please follow the [advanced installation instructions](#advanced-local-installation-for-development) below.
#### Advanced: Local Installation for Development

38
docs/Learning-Environment-Create-New.md


```yml
behaviors:
RollerBall:
trainer: ppo
batch_size: 10
beta: 5.0e-3
buffer_size: 100
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 3.0e-4
learning_rate_schedule: linear
max_steps: 5.0e4
memory_size: 128
normalize: false
num_epoch: 3
num_layers: 2
trainer_type: ppo
hyperparameters:
batch_size: 10
buffer_size: 100
learning_rate: 3.0e-4
beta: 5.0e-4
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
max_steps: 500000
use_recurrent: false
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
```
Since this example creates a very simple training environment with only a few

4
docs/Migrating.md


- `use_visual` and `allow_multiple_visual_obs` in the `UnityToGymWrapper` constructor
were replaced by `allow_multiple_obs` which allows one or more visual observations and
vector observations to be used simultaneously.
- `--lesson` has been removed from the CLI. Lessons will resume when using `--resume`.
To start at a different lesson, modify your Curriculum configuration.
### Steps to Migrate
- To upgrade your configuration files, an upgrade script has been provided. Run `python config/update_config.py

`RayPerception3d.Perceive()` that was causing the `endOffset` to be used
incorrectly. However this may produce different behavior from previous
versions if you use a non-zero `startOffset`. To reproduce the old behavior,
you should increase the the value of `endOffset` by `startOffset`. You can
you should increase the value of `endOffset` by `startOffset`. You can
verify your raycasts are performing as expected in scene view using the debug
rays.
- If you use RayPerception3D, replace it with RayPerceptionSensorComponent3D

8
docs/Training-ML-Agents.md


mlagents-learn config/ppo/WallJump_curriculum.yaml --run-id=wall-jump-curriculum
```
We can then keep track of the current lessons and progresses via TensorBoard.
**Note**: If you are resuming a training session that uses curriculum, please
pass the number of the last-reached lesson using the `--lesson` flag when
running `mlagents-learn`.
We can then keep track of the current lessons and progresses via TensorBoard. If you've terminated
the run, you can resume it using `--resume` and lesson progress will start off where it
ended.
### Environment Parameter Randomization

2
docs/Training-on-Amazon-Web-Service.md


Fatal server error:
(EE) no screens found(EE)
(EE)
Please consult the The X.Org Foundation support
Please consult the X.Org Foundation support
at http://wiki.x.org
for help.
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.

5
ml-agents-envs/mlagents_envs/rpc_communicator.py


import grpc
from typing import Optional
from sys import platform
import socket
from multiprocessing import Pipe
from concurrent.futures import ThreadPoolExecutor

Attempts to bind to the requested communicator port, checking if it is already in use.
"""
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
if platform == "linux" or platform == "linux2":
# On linux, the port remains unusable for TIME_WAIT=60 seconds after closing
# SO_REUSEADDR frees the port right after closing the environment
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
try:
s.bind(("localhost", port))
except socket.error:

2
ml-agents/README.md


cooperative behavior among different agents is not stable.
- Resuming self-play from a checkpoint resets the reported ELO to the default
value.
- Resuming curriculum learning from a checkpoint requires the last lesson be
specified using the `--lesson` CLI option

10
ml-agents/mlagents/model_serialization.py


)
MODEL_CONSTANTS = frozenset(
["action_output_shape", "is_continuous_control", "memory_size", "version_number"]
[
"action_output_shape",
"is_continuous_control",
"memory_size",
"version_number",
"trainer_major_version",
"trainer_minor_version",
"trainer_patch_version",
]
)
VISUAL_OBSERVATION_PREFIX = "visual_observation_"

7
ml-agents/mlagents/trainers/cli_utils.py


action=DetectDefault,
)
argparser.add_argument(
"--lesson",
default=0,
type=int,
help="The lesson to start with when performing curriculum training",
action=DetectDefault,
)
argparser.add_argument(
"--load",
default=False,
dest="load_model",

2
ml-agents/mlagents/trainers/ghost/trainer.py


)
)
# Counts the The number of steps of the ghost policies. Snapshot swapping
# Counts the number of steps of the ghost policies. Snapshot swapping
# depends on this counter whereas snapshot saving and team switching depends
# on the wrapped. This ensures that all teams train for the same number of trainer
# steps.

22
ml-agents/mlagents/trainers/learn.py


from mlagents.trainers.cli_utils import parser
from mlagents_envs.environment import UnityEnvironment
from mlagents.trainers.settings import RunOptions
from mlagents.trainers.training_status import GlobalTrainingStatus
from mlagents_envs.base_env import BaseEnv
from mlagents.trainers.subprocess_env_manager import SubprocessEnvManager
from mlagents_envs.side_channel.side_channel import SideChannel

from mlagents_envs import logging_util
logger = logging_util.get_logger(__name__)
TRAINING_STATUS_FILE_NAME = "training_status.json"
def get_version_string() -> str:

)
# Make run logs directory
os.makedirs(run_logs_dir, exist_ok=True)
# Load any needed states
if checkpoint_settings.resume:
GlobalTrainingStatus.load_state(
os.path.join(run_logs_dir, "training_status.json")
)
# Configure CSV, Tensorboard Writers and StatsReporter
# We assume reward and episode length are needed in the CSV.
csv_writer = CSVWriter(

env_factory, engine_config, env_settings.num_envs
)
maybe_meta_curriculum = try_create_meta_curriculum(
options.curriculum, env_manager, checkpoint_settings.lesson
options.curriculum, env_manager, restore=checkpoint_settings.resume
)
maybe_add_samplers(options.parameter_randomization, env_manager)

env_manager.close()
write_run_options(write_path, options)
write_timing_tree(run_logs_dir)
write_training_status(run_logs_dir)
def write_run_options(output_dir: str, run_options: RunOptions) -> None:

)
def write_training_status(output_dir: str) -> None:
GlobalTrainingStatus.save_state(os.path.join(output_dir, TRAINING_STATUS_FILE_NAME))
def write_timing_tree(output_dir: str) -> None:
timing_path = os.path.join(output_dir, "timers.json")
try:

def try_create_meta_curriculum(
curriculum_config: Optional[Dict], env: SubprocessEnvManager, lesson: int
curriculum_config: Optional[Dict], env: SubprocessEnvManager, restore: bool = False
# TODO: Should be able to start learning at different lesson numbers
# for each curriculum.
meta_curriculum.set_all_curricula_to_lesson_num(lesson)
if restore:
meta_curriculum.try_restore_all_curriculum()
return meta_curriculum

25
ml-agents/mlagents/trainers/meta_curriculum.py


from typing import Dict, Set
from mlagents.trainers.curriculum import Curriculum
from mlagents.trainers.settings import CurriculumSettings
from mlagents.trainers.training_status import GlobalTrainingStatus, StatusType
from mlagents_envs.logging_util import get_logger

)
return ret
def set_all_curricula_to_lesson_num(self, lesson_num):
"""Sets all the curricula in this meta curriculum to a specified
lesson number.
Args:
lesson_num (int): The lesson number which all the curricula will
be set to.
def try_restore_all_curriculum(self):
for _, curriculum in self.brains_to_curricula.items():
curriculum.lesson_num = lesson_num
Tries to restore all the curriculums to what is saved in training_status.json
"""
for brain_name, curriculum in self.brains_to_curricula.items():
lesson_num = GlobalTrainingStatus.get_parameter_state(
brain_name, StatusType.LESSON_NUM
)
if lesson_num is not None:
logger.info(
f"Resuming curriculum for {brain_name} at lesson {lesson_num}."
)
curriculum.lesson_num = lesson_num
else:
curriculum.lesson_num = 0
def get_config(self):
"""Get the combined configuration of all curricula in this

62
ml-agents/mlagents/trainers/policy/tf_policy.py


from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, Tuple
from distutils.version import LooseVersion
from mlagents.tf_utils import tf
from mlagents import tf_utils
from mlagents_envs.exception import UnityException

from mlagents.trainers.models import ModelUtils
from mlagents.trainers.settings import TrainerSettings, NetworkSettings
from mlagents.trainers.brain import BrainParameters
from mlagents.trainers import __version__
# This is the version number of the inputs and outputs of the model, and
# determines compatibility with inference in Barracuda.
MODEL_FORMAT_VERSION = 2
class UnityPolicyException(UnityException):

:param brain: The corresponding Brain for this policy.
:param trainer_settings: The trainer parameters.
"""
self._version_number_ = 2
self.m_size = 0
self.trainer_settings = trainer_settings
self.network_settings: NetworkSettings = trainer_settings.network_settings

"""
pass
@staticmethod
def _convert_version_string(version_string: str) -> Tuple[int, ...]:
"""
Converts the version string into a Tuple of ints (major_ver, minor_ver, patch_ver).
:param version_string: The semantic-versioned version string (X.Y.Z).
:return: A Tuple containing (major_ver, minor_ver, patch_ver).
"""
ver = LooseVersion(version_string)
return tuple(map(int, ver.version[0:3]))
def _check_model_version(self, version: str) -> None:
"""
Checks whether the model being loaded was created with the same version of
ML-Agents, and throw a warning if not so.
"""
if self.version_tensors is not None:
loaded_ver = tuple(
num.eval(session=self.sess) for num in self.version_tensors
)
if loaded_ver != TFPolicy._convert_version_string(version):
logger.warning(
f"The model checkpoint you are loading from was saved with ML-Agents version "
f"{loaded_ver[0]}.{loaded_ver[1]}.{loaded_ver[2]} but your current ML-Agents"
f"version is {version}. Model may not behave properly."
)
def _initialize_graph(self):
with self.graph.as_default():
self.saver = tf.train.Saver(max_to_keep=self.keep_checkpoints)

model_path
)
)
self._check_model_version(__version__)
if reset_global_steps:
self._set_step(0)
logger.info(

self.prev_action: Optional[tf.Tensor] = None
self.memory_in: Optional[tf.Tensor] = None
self.memory_out: Optional[tf.Tensor] = None
self.version_tensors: Optional[Tuple[tf.Tensor, tf.Tensor, tf.Tensor]] = None
def create_input_placeholders(self):
with self.graph.as_default():

trainable=False,
dtype=tf.int32,
)
int_version = TFPolicy._convert_version_string(__version__)
major_ver_t = tf.Variable(
int_version[0],
name="trainer_major_version",
trainable=False,
dtype=tf.int32,
)
minor_ver_t = tf.Variable(
int_version[1],
name="trainer_minor_version",
trainable=False,
dtype=tf.int32,
)
patch_ver_t = tf.Variable(
int_version[2],
name="trainer_patch_version",
trainable=False,
dtype=tf.int32,
)
self.version_tensors = (major_ver_t, minor_ver_t, patch_ver_t)
self._version_number_,
MODEL_FORMAT_VERSION,
name="version_number",
trainable=False,
dtype=tf.int32,

22
ml-agents/mlagents/trainers/sac/trainer.py


:param training: Whether the trainer is set for training.
:param load: Whether the model should be loaded.
:param seed: The seed the model will be initialized with
:param run_id: The The identifier of the current run
:param run_id: The identifier of the current run
"""
super().__init__(
brain_name, trainer_settings, training, run_id, reward_buff_cap

)
self.step = 0
# Don't count buffer_init_steps in steps_per_update ratio, but also don't divide-by-0
self.update_steps = max(1, self.hyperparameters.buffer_init_steps)
self.reward_signal_update_steps = max(1, self.hyperparameters.buffer_init_steps)
# Don't divide by zero
self.update_steps = 1
self.reward_signal_update_steps = 1
self.steps_per_update = self.hyperparameters.steps_per_update
self.reward_signal_steps_per_update = (

)
batch_update_stats: Dict[str, list] = defaultdict(list)
while self.step / self.update_steps > self.steps_per_update:
while (
self.step - self.hyperparameters.buffer_init_steps
) / self.update_steps > self.steps_per_update:
logger.debug("Updating SAC policy at step {}".format(self.step))
buffer = self.update_buffer
if self.update_buffer.num_experiences >= self.hyperparameters.batch_size:

)
batch_update_stats: Dict[str, list] = defaultdict(list)
while (
self.step / self.reward_signal_update_steps
> self.reward_signal_steps_per_update
):
self.step - self.hyperparameters.buffer_init_steps
) / self.reward_signal_update_steps > self.reward_signal_steps_per_update:
# Get minibatches for reward signal update if needed
reward_signal_minibatches = {}
for name, signal in self.optimizer.reward_signals.items():

self.collected_rewards[_reward_signal] = defaultdict(lambda: 0)
# Needed to resume loads properly
self.step = policy.get_current_step()
# Assume steps were updated at the correct ratio before
self.update_steps = int(max(1, self.step / self.steps_per_update))
self.reward_signal_update_steps = int(
max(1, self.step / self.reward_signal_steps_per_update)
)
self.next_summary_step = self._get_next_summary_step()
def get_policy(self, name_behavior_id: str) -> TFPolicy:

1
ml-agents/mlagents/trainers/settings.py


force: bool = parser.get_default("force")
train_model: bool = parser.get_default("train_model")
inference: bool = parser.get_default("inference")
lesson: int = parser.get_default("lesson")
@attr.s(auto_attribs=True)

7
ml-agents/mlagents/trainers/tests/test_learn.py


base_port: 4001
seed: 9870
checkpoint_settings:
lesson: 2
run_id: uselessrun
save_freq: 654321
debug: false

assert opt.behaviors == {}
assert opt.env_settings.env_path is None
assert opt.parameter_randomization is None
assert opt.checkpoint_settings.lesson == 0
assert opt.checkpoint_settings.resume is False
assert opt.checkpoint_settings.inference is False
assert opt.checkpoint_settings.run_id == "ppo"

full_args = [
"mytrainerpath",
"--env=./myenvfile",
"--lesson=3",
"--resume",
"--inference",
"--run-id=myawesomerun",

assert opt.behaviors == {}
assert opt.env_settings.env_path == "./myenvfile"
assert opt.parameter_randomization is None
assert opt.checkpoint_settings.lesson == 3
assert opt.checkpoint_settings.run_id == "myawesomerun"
assert opt.checkpoint_settings.save_freq == 123456
assert opt.env_settings.seed == 7890

assert opt.behaviors == {}
assert opt.env_settings.env_path == "./oldenvfile"
assert opt.parameter_randomization is None
assert opt.checkpoint_settings.lesson == 2
assert opt.checkpoint_settings.run_id == "uselessrun"
assert opt.checkpoint_settings.save_freq == 654321
assert opt.env_settings.seed == 9870

full_args = [
"mytrainerpath",
"--env=./myenvfile",
"--lesson=3",
"--resume",
"--inference",
"--run-id=myawesomerun",

assert opt.behaviors == {}
assert opt.env_settings.env_path == "./myenvfile"
assert opt.parameter_randomization is None
assert opt.checkpoint_settings.lesson == 3
assert opt.checkpoint_settings.run_id == "myawesomerun"
assert opt.checkpoint_settings.save_freq == 123456
assert opt.env_settings.seed == 7890

23
ml-agents/mlagents/trainers/tests/test_meta_curriculum.py


import pytest
from unittest.mock import patch, Mock
from unittest.mock import patch, Mock, call
from mlagents.trainers.meta_curriculum import MetaCurriculum

)
from mlagents.trainers.tests.test_curriculum import dummy_curriculum_config
from mlagents.trainers.settings import CurriculumSettings
from mlagents.trainers.training_status import StatusType
@pytest.fixture

curriculum_b.increment_lesson.assert_not_called()
def test_set_all_curriculums_to_lesson_num():
@patch("mlagents.trainers.meta_curriculum.GlobalTrainingStatus")
def test_restore_curriculums(mock_trainingstatus):
meta_curriculum.set_all_curricula_to_lesson_num(2)
# Test restore to value
mock_trainingstatus.get_parameter_state.return_value = 2
meta_curriculum.try_restore_all_curriculum()
mock_trainingstatus.get_parameter_state.assert_has_calls(
[call("Brain1", StatusType.LESSON_NUM), call("Brain2", StatusType.LESSON_NUM)],
any_order=True,
)
# Test restore to None
mock_trainingstatus.get_parameter_state.return_value = None
meta_curriculum.try_restore_all_curriculum()
assert meta_curriculum.brains_to_curricula["Brain1"].lesson_num == 0
assert meta_curriculum.brains_to_curricula["Brain2"].lesson_num == 0
def test_get_config():

21
ml-agents/mlagents/trainers/tests/test_nn_policy.py


import pytest
import os
import unittest
import tempfile
import numpy as np
from mlagents.tf_utils import tf

from mlagents.trainers.tests import mock_brain as mb
from mlagents.trainers.settings import TrainerSettings, NetworkSettings
from mlagents.trainers.tests.test_trajectory import make_fake_trajectory
from mlagents.trainers import __version__
VECTOR_ACTION_SPACE = [2]

_compare_two_policies(policy2, policy3)
# Assert that the steps are 0.
assert policy3.get_current_step() == 0
class ModelVersionTest(unittest.TestCase):
def test_version_compare(self):
# Test write_stats
with self.assertLogs("mlagents.trainers", level="WARNING") as cm:
path1 = tempfile.mkdtemp()
trainer_params = TrainerSettings(output_path=path1)
policy = create_policy_mock(trainer_params)
policy.initialize_or_load()
policy._check_model_version(
"0.0.0"
) # This is not the right version for sure
# Assert that 1 warning has been thrown with incorrect version
assert len(cm.output) == 1
policy._check_model_version(__version__) # This should be the right version
# Assert that no additional warnings have been thrown wth correct ver
assert len(cm.output) == 1
def _compare_two_policies(policy1: NNPolicy, policy2: NNPolicy) -> None:

8
ml-agents/mlagents/trainers/tests/test_policy.py


policy_eval_out["action"], policy_eval_out["value"], policy_eval_out, [0]
)
assert result == expected
def test_convert_version_string():
result = TFPolicy._convert_version_string("200.300.100")
assert result == (200, 300, 100)
# Test dev versions
result = TFPolicy._convert_version_string("200.300.100.dev0")
assert result == (200, 300, 100)

16
ml-agents/mlagents/trainers/tests/test_sac.py


discrete_action=False, visual_inputs=0, vec_obs_size=6
)
dummy_config.hyperparameters.steps_per_update = 20
dummy_config.hyperparameters.reward_signal_steps_per_update = 20
dummy_config.hyperparameters.buffer_init_steps = 0
trainer = SACTrainer(brain_params, 0, dummy_config, True, False, 0, "0")
policy = trainer.create_policy(brain_params.brain_name, brain_params)

trainer.advance()
with pytest.raises(AgentManagerQueue.Empty):
policy_queue.get_nowait()
# Call add_policy and check that we update the correct number of times.
# This is to emulate a load from checkpoint.
policy = trainer.create_policy(brain_params.brain_name, brain_params)
policy.get_current_step = lambda: 200
trainer.add_policy(brain_params.brain_name, policy)
trainer.optimizer.update = mock.Mock()
trainer.optimizer.update_reward_signals = mock.Mock()
trainer.optimizer.update_reward_signals.return_value = {}
trainer.optimizer.update.return_value = {}
trajectory_queue.put(trajectory)
trainer.advance()
# Make sure we did exactly 1 update
assert trainer.optimizer.update.call_count == 1
assert trainer.optimizer.update_reward_signals.call_count == 1
if __name__ == "__main__":

18
ml-agents/mlagents/trainers/tests/test_simple_rl.py


@pytest.mark.parametrize("use_discrete", [True, False])
def test_2d_ppo(use_discrete):
env = SimpleEnvironment(
[BRAIN_NAME], use_discrete=use_discrete, action_size=2, step_size=0.5
[BRAIN_NAME], use_discrete=use_discrete, action_size=2, step_size=0.8
)
new_hyperparams = attr.evolve(
PPO_CONFIG.hyperparameters, batch_size=64, buffer_size=640
config = attr.evolve(PPO_CONFIG)
config = attr.evolve(PPO_CONFIG, hyperparameters=new_hyperparams, max_steps=10000)
_check_environment_trains(env, {BRAIN_NAME: config})

@pytest.mark.parametrize("use_discrete", [True, False])
def test_recurrent_sac(use_discrete):
env = MemoryEnvironment([BRAIN_NAME], use_discrete=use_discrete)
step_size = 0.2 if use_discrete else 1.0
env = MemoryEnvironment(
[BRAIN_NAME], use_discrete=use_discrete, step_size=step_size
)
memory=NetworkSettings.MemorySettings(memory_size=16, sequence_length=32),
memory=NetworkSettings.MemorySettings(memory_size=16, sequence_length=16),
batch_size=64,
batch_size=128,
buffer_init_steps=500,
buffer_init_steps=1000,
steps_per_update=2,
)
config = attr.evolve(

4
ml-agents/mlagents/trainers/trainer_controller.py


from mlagents.trainers.behavior_id_utils import BehaviorIdentifiers
from mlagents.trainers.agent_processor import AgentManager
from mlagents.trainers.settings import CurriculumSettings
from mlagents.trainers.training_status import GlobalTrainingStatus, StatusType
class TrainerController(object):

if brain_name in self.trainers:
self.trainers[brain_name].stats_reporter.set_stat(
"Environment/Lesson", curr.lesson_num
)
GlobalTrainingStatus.set_parameter_state(
brain_name, StatusType.LESSON_NUM, curr.lesson_num
)
for trainer in self.trainers.values():

60
ml-agents/mlagents/trainers/tests/test_training_status.py


import os
import unittest
import json
from enum import Enum
from mlagents.trainers.training_status import (
StatusType,
StatusMetaData,
GlobalTrainingStatus,
)
def test_globaltrainingstatus(tmpdir):
path_dir = os.path.join(tmpdir, "test.json")
GlobalTrainingStatus.set_parameter_state("Category1", StatusType.LESSON_NUM, 3)
GlobalTrainingStatus.save_state(path_dir)
with open(path_dir, "r") as fp:
test_json = json.load(fp)
assert "Category1" in test_json
assert StatusType.LESSON_NUM.value in test_json["Category1"]
assert test_json["Category1"][StatusType.LESSON_NUM.value] == 3
assert "metadata" in test_json
GlobalTrainingStatus.load_state(path_dir)
restored_val = GlobalTrainingStatus.get_parameter_state(
"Category1", StatusType.LESSON_NUM
)
assert restored_val == 3
# Test unknown categories and status types (keys)
unknown_category = GlobalTrainingStatus.get_parameter_state(
"Category3", StatusType.LESSON_NUM
)
class FakeStatusType(Enum):
NOTAREALKEY = "notarealkey"
unknown_key = GlobalTrainingStatus.get_parameter_state(
"Category1", FakeStatusType.NOTAREALKEY
)
assert unknown_category is None
assert unknown_key is None
class StatsMetaDataTest(unittest.TestCase):
def test_metadata_compare(self):
# Test write_stats
with self.assertLogs("mlagents.trainers", level="WARNING") as cm:
default_metadata = StatusMetaData()
version_statsmetadata = StatusMetaData(mlagents_version="test")
default_metadata.check_compatibility(version_statsmetadata)
tf_version_statsmetadata = StatusMetaData(tensorflow_version="test")
default_metadata.check_compatibility(tf_version_statsmetadata)
# Assert that 2 warnings have been thrown
assert len(cm.output) == 2

115
ml-agents/mlagents/trainers/training_status.py


from typing import Dict, Any
from enum import Enum
from collections import defaultdict
import json
import attr
import cattr
from mlagents.tf_utils import tf
from mlagents_envs.logging_util import get_logger
from mlagents.trainers import __version__
from mlagents.trainers.exception import TrainerError
logger = get_logger(__name__)
STATUS_FORMAT_VERSION = "0.1.0"
class StatusType(Enum):
LESSON_NUM = "lesson_num"
STATS_METADATA = "metadata"
@attr.s(auto_attribs=True)
class StatusMetaData:
stats_format_version: str = STATUS_FORMAT_VERSION
mlagents_version: str = __version__
tensorflow_version: str = tf.__version__
def to_dict(self) -> Dict[str, str]:
return cattr.unstructure(self)
@staticmethod
def from_dict(import_dict: Dict[str, str]) -> "StatusMetaData":
return cattr.structure(import_dict, StatusMetaData)
def check_compatibility(self, other: "StatusMetaData") -> None:
"""
Check compatibility with a loaded StatsMetaData and warn the user
if versions mismatch. This is used for resuming from old checkpoints.
"""
# This should cover all stats version mismatches as well.
if self.mlagents_version != other.mlagents_version:
logger.warning(
"Checkpoint was loaded from a different version of ML-Agents. Some things may not resume properly."
)
if self.tensorflow_version != other.tensorflow_version:
logger.warning(
"Tensorflow checkpoint was saved with a different version of Tensorflow. Model may not resume properly."
)
class GlobalTrainingStatus:
"""
GlobalTrainingStatus class that contains static methods to save global training status and
load it on a resume. These are values that might be needed for the training resume that
cannot/should not be captured in a model checkpoint, such as curriclum lesson.
"""
saved_state: Dict[str, Dict[str, Any]] = defaultdict(lambda: {})
@staticmethod
def load_state(path: str) -> None:
"""
Load a JSON file that contains saved state.
:param path: Path to the JSON file containing the state.
"""
try:
with open(path, "r") as f:
loaded_dict = json.load(f)
# Compare the metadata
_metadata = loaded_dict[StatusType.STATS_METADATA.value]
StatusMetaData.from_dict(_metadata).check_compatibility(StatusMetaData())
# Update saved state.
GlobalTrainingStatus.saved_state.update(loaded_dict)
except FileNotFoundError:
logger.warning(
"Training status file not found. Not all functions will resume properly."
)
except KeyError:
raise TrainerError(
"Metadata not found, resuming from an incompatible version of ML-Agents."
)
@staticmethod
def save_state(path: str) -> None:
"""
Save a JSON file that contains saved state.
:param path: Path to the JSON file containing the state.
"""
GlobalTrainingStatus.saved_state[
StatusType.STATS_METADATA.value
] = StatusMetaData().to_dict()
with open(path, "w") as f:
json.dump(GlobalTrainingStatus.saved_state, f, indent=4)
@staticmethod
def set_parameter_state(category: str, key: StatusType, value: Any) -> None:
"""
Stores an arbitrary-named parameter in the global saved state.
:param category: The category (usually behavior name) of the parameter.
:param key: The parameter, e.g. lesson number.
:param value: The value.
"""
GlobalTrainingStatus.saved_state[category][key.value] = value
@staticmethod
def get_parameter_state(category: str, key: StatusType) -> Any:
"""
Loads an arbitrary-named parameter from training_status.json.
If not found, returns None.
:param category: The category (usually behavior name) of the parameter.
:param key: The statistic, e.g. lesson number.
:param value: The value.
"""
return GlobalTrainingStatus.saved_state[category].get(key.value, None)
正在加载...
取消
保存