浏览代码

[refactor] Move output artifacts to a single results/ folder (#3829)

/whitepaper-experiments
GitHub 4 年前
当前提交
232519e4
共有 33 个文件被更改,包括 192 次插入172 次删除
  1. 4
      .gitignore
  2. 3
      com.unity.ml-agents/CHANGELOG.md
  3. 5
      docs/Getting-Started.md
  4. 5
      docs/Learning-Environment-Executable.md
  5. 2
      docs/Migrating.md
  6. 7
      docs/Training-ML-Agents.md
  7. 2
      docs/Training-PPO.md
  8. 2
      docs/Training-SAC.md
  9. 2
      docs/Using-Tensorboard.md
  10. 30
      ml-agents-envs/mlagents_envs/environment.py
  11. 12
      ml-agents-envs/mlagents_envs/tests/test_envs.py
  12. 60
      ml-agents/mlagents/trainers/learn.py
  13. 5
      ml-agents/mlagents/trainers/policy/tf_policy.py
  14. 3
      ml-agents/mlagents/trainers/ppo/trainer.py
  15. 7
      ml-agents/mlagents/trainers/sac/trainer.py
  16. 6
      ml-agents/mlagents/trainers/tests/test_barracuda_converter.py
  17. 2
      ml-agents/mlagents/trainers/tests/test_bcmodule.py
  18. 9
      ml-agents/mlagents/trainers/tests/test_ghost.py
  19. 17
      ml-agents/mlagents/trainers/tests/test_learn.py
  20. 8
      ml-agents/mlagents/trainers/tests/test_nn_policy.py
  21. 2
      ml-agents/mlagents/trainers/tests/test_policy.py
  22. 9
      ml-agents/mlagents/trainers/tests/test_ppo.py
  23. 2
      ml-agents/mlagents/trainers/tests/test_reward_signals.py
  24. 2
      ml-agents/mlagents/trainers/tests/test_rl_trainer.py
  25. 14
      ml-agents/mlagents/trainers/tests/test_sac.py
  26. 6
      ml-agents/mlagents/trainers/tests/test_simple_rl.py
  27. 6
      ml-agents/mlagents/trainers/tests/test_trainer_controller.py
  28. 60
      ml-agents/mlagents/trainers/tests/test_trainer_util.py
  29. 3
      ml-agents/mlagents/trainers/trainer/trainer.py
  30. 20
      ml-agents/mlagents/trainers/trainer_controller.py
  31. 30
      ml-agents/mlagents/trainers/trainer_util.py
  32. 17
      ml-agents/tests/yamato/scripts/run_llapi.py
  33. 2
      ml-agents/tests/yamato/training_int_tests.py

4
.gitignore


# Tensorflow Model Info
# Output Artifacts (Legacy)
# Output Artifacts
/results
# Training environments
/envs

3
com.unity.ml-agents/CHANGELOG.md


instead of "camelCase"; for example, `Agent.maxStep` was renamed to
`Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
- Update Barracuda to 0.7.0-preview which has breaking namespace and assembly name changes.
- Training artifacts (trained models, summaries) are now found in the `results/`
directory. (#3829)
### Minor Changes

- The maximum compatible version of tensorflow was changed to allow tensorflow 2.1 and 2.2. This
will allow use with python 3.8 using tensorflow 2.2.0rc3.
- `UnityRLCapabilities` was added to help inform users when RL features are mismatched between C# and Python packages. (#3831)
- Unity Player logs are now written out to the results directory. (#3877)
### Bug Fixes

5
docs/Getting-Started.md


sequence_length: 64
summary_freq: 1000
use_recurrent: False
summary_path: ./summaries/first3DBallRun
model_path: ./models/first3DBallRun/3DBallLearning
output_path: ./results/first3DBallRun/3DBallLearning
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.

mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun --resume
```
Your trained model will be at `models/<run-identifier>/<behavior_name>.nn` where
Your trained model will be at `results/<run-identifier>/<behavior_name>.nn` where
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding
to the model. This file corresponds to your model's latest checkpoint. You can
now embed this trained model into your Agents by following the steps below,

5
docs/Learning-Environment-Executable.md


sequence_length: 64
summary_freq: 1000
use_recurrent: False
summary_path: ./summaries/first-run-0
model_path: ./models/first-run-0/Ball3DLearning
output_path: ./results/first-run-0/Ball3DLearning
INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.

```
You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
`results/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
latest checkpoint. (**Note:** There is a known bug on Windows that causes the
saving of the model to fail when you early terminate the training, it's
recommended to wait until Step has reached the max_steps parameter you set in

2
docs/Migrating.md


instead of "camelCase"; for example, `Agent.maxStep` was renamed to
`Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
- `WriteAdapter` was renamed to `ObservationWriter`. (#3834)
- Training artifacts (trained models, summaries) are now found under `results/`
instead of `summaries/` and `models/`.
### Steps to Migrate

7
docs/Training-ML-Agents.md


Regardless of which training methods, configurations or hyperparameters you
provide, the training process will always generate three artifacts:
1. Summaries (under the `summaries/` folder): these are training metrics that
1. Summaries (under the `results/<run-identifier>/<behavior-name>` folder):
these are training metrics that
1. Models (under the `models/` folder): these contain the model checkpoints that
1. Models (under the `results/<run-identifier>/` folder): these contain the model checkpoints that
1. Timers file (also under the `summaries/` folder): this contains aggregated
1. Timers file (also under the `results/<run-identifier>` folder): this contains aggregated
metrics on your training process, including time spent on specific code
blocks. See [Profiling in Python](Profiling-Python.md) for more information
on the timers generated.

2
docs/Training-PPO.md


`init_path` can be specified to initialize your model from a previous run before starting.
Note that the prior run should have used the same trainer configurations as the current run,
and have been saved with the same version of ML-Agents. You should provide the full path
to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.
to the folder where the checkpoints were saved, e.g. `./results/{run-id}/{behavior_name}`.
This option is provided in case you want to initialize different behaviors from different runs;
in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize

2
docs/Training-SAC.md


`init_path` can be specified to initialize your model from a previous run before starting.
Note that the prior run should have used the same trainer configurations as the current run,
and have been saved with the same version of ML-Agents. You should provide the full path
to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.
to the folder where the checkpoints were saved, e.g. `./results/{run-id}/{behavior_name}`.
This option is provided in case you want to initialize different behaviors from different runs;
in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize

2
docs/Using-Tensorboard.md


1. Open a terminal or console window:
1. Navigate to the directory where the ML-Agents Toolkit is installed.
1. From the command line run: `tensorboard --logdir=summaries --port=6006`
1. From the command line run: `tensorboard --logdir=results --port=6006`
1. Open a browser window and navigate to
[localhost:6006](http://localhost:6006).

30
ml-agents-envs/mlagents_envs/environment.py


seed: int = 0,
no_graphics: bool = False,
timeout_wait: int = 60,
args: Optional[List[str]] = None,
additional_args: Optional[List[str]] = None,
log_folder: Optional[str] = None,
):
"""
Starts a new unity environment and establishes a connection with the environment.

:int timeout_wait: Time (in seconds) to wait for connection from environment.
:list args: Addition Unity command line arguments
:list side_channels: Additional side channel for no-rl communication with Unity
:str log_folder: Optional folder to write the Unity Player log file into. Requires absolute path.
args = args or []
self.additional_args = additional_args or []
self.no_graphics = no_graphics
# If base port is not specified, use BASE_ENVIRONMENT_PORT if we have
# an environment, otherwise DEFAULT_EDITOR_PORT
if base_port is None:

)
)
self.side_channels[_sc.channel_id] = _sc
self.log_folder = log_folder
# If the environment name is None, a new environment will not be launched
# and the communicator will directly try to connect to an existing unity environment.

"the worker-id must be 0 in order to connect with the Editor."
)
if file_name is not None:
self.executable_launcher(file_name, no_graphics, args)
self.executable_launcher(file_name, no_graphics, additional_args)
else:
logger.info(
f"Listening on port {self.port}. "

launch_string = candidates[0]
return launch_string
def executable_args(self) -> List[str]:
args: List[str] = []
if self.no_graphics:
args += ["-nographics", "-batchmode"]
args += [UnityEnvironment.PORT_COMMAND_LINE_ARG, str(self.port)]
if self.log_folder:
log_file_path = os.path.join(
self.log_folder, f"Player-{self.worker_id}.log"
)
args += ["-logFile", log_file_path]
# Add in arguments passed explicitly by the user.
args += self.additional_args
return args
def executable_launcher(self, file_name, no_graphics, args):
launch_string = self.validate_environment_path(file_name)
if launch_string is None:

else:
logger.debug("This is the launch string {}".format(launch_string))
# Launch Unity environment
subprocess_args = [launch_string]
if no_graphics:
subprocess_args += ["-nographics", "-batchmode"]
subprocess_args += [UnityEnvironment.PORT_COMMAND_LINE_ARG, str(self.port)]
subprocess_args += args
subprocess_args = [launch_string] + self.executable_args()
try:
self.proc1 = subprocess.Popen(
subprocess_args,

12
ml-agents-envs/mlagents_envs/tests/test_envs.py


@mock.patch("mlagents_envs.environment.UnityEnvironment.executable_launcher")
@mock.patch("mlagents_envs.environment.UnityEnvironment.get_communicator")
def test_log_file_path_is_set(mock_communicator, mock_launcher):
mock_communicator.return_value = MockCommunicator()
env = UnityEnvironment(
file_name="myfile", worker_id=0, log_folder="./some-log-folder-path"
)
args = env.executable_args()
log_file_index = args.index("-logFile")
assert args[log_file_index + 1] == "./some-log-folder-path/Player-0.log"
@mock.patch("mlagents_envs.environment.UnityEnvironment.executable_launcher")
@mock.patch("mlagents_envs.environment.UnityEnvironment.get_communicator")
def test_reset(mock_communicator, mock_launcher):
mock_communicator.return_value = MockCommunicator(
discrete_action=False, visual_inputs=0

60
ml-agents/mlagents/trainers/learn.py


# # Unity ML-Agents Toolkit
import argparse
import yaml
import os
import numpy as np

:param run_options: Command line arguments for training.
"""
with hierarchical_timer("run_training.setup"):
model_path = f"./models/{options.run_id}"
base_path = "results"
write_path = os.path.join(base_path, options.run_id)
f"./models/{options.initialize_from}" if options.initialize_from else None
os.path.join(base_path, options.run_id) if options.initialize_from else None
summaries_dir = "./summaries"
run_logs_dir = os.path.join(write_path, "run_logs")
# Check if directory exists
handle_existing_directories(
write_path, options.resume, options.force, maybe_init_path
)
# Make run logs directory
os.makedirs(run_logs_dir, exist_ok=True)
summaries_dir,
write_path,
handle_existing_directories(
model_path, summaries_dir, options.resume, options.force, maybe_init_path
)
tb_writer = TensorboardWriter(summaries_dir, clear_past_data=not options.resume)
tb_writer = TensorboardWriter(write_path, clear_past_data=not options.resume)
gauge_write = GaugeWriter()
console_writer = ConsoleWriter()
StatsReporter.add_writer(tb_writer)

if options.env_path is None:
port = UnityEnvironment.DEFAULT_EDITOR_PORT
env_factory = create_environment_factory(
options.env_path, options.no_graphics, run_seed, port, options.env_args
options.env_path,
options.no_graphics,
run_seed,
port,
options.env_args,
os.path.abspath(run_logs_dir), # Unity environment requires absolute path
)
engine_config = EngineConfig(
width=options.width,

)
trainer_factory = TrainerFactory(
options.trainer_config,
summaries_dir,
model_path,
write_path,
options.keep_checkpoints,
not options.inference,
options.resume,

# Create controller and begin training.
tc = TrainerController(
trainer_factory,
model_path,
summaries_dir,
write_path,
options.run_id,
options.save_freq,
maybe_meta_curriculum,

tc.start_learning(env_manager)
finally:
env_manager.close()
write_timing_tree(summaries_dir, options.run_id)
write_run_options(write_path, options)
write_timing_tree(run_logs_dir)
def write_run_options(output_dir: str, run_options: RunOptions) -> None:
run_options_path = os.path.join(output_dir, "configuration.yaml")
try:
with open(run_options_path, "w") as f:
try:
yaml.dump(dict(run_options._asdict()), f, sort_keys=False)
except TypeError: # Older versions of pyyaml don't support sort_keys
yaml.dump(dict(run_options._asdict()), f)
except FileNotFoundError:
logger.warning(
f"Unable to save configuration to {run_options_path}. Make sure the directory exists"
)
def write_timing_tree(summaries_dir: str, run_id: str) -> None:
timing_path = f"{summaries_dir}/{run_id}_timers.json"
def write_timing_tree(output_dir: str) -> None:
timing_path = os.path.join(output_dir, "timers.json")
try:
with open(timing_path, "w") as f:
json.dump(get_timer_tree(), f, indent=4)

seed: int,
start_port: int,
env_args: Optional[List[str]],
log_folder: str,
) -> Callable[[int, List[SideChannel]], BaseEnv]:
if env_path is not None:
launch_string = UnityEnvironment.validate_environment_path(env_path)

seed=env_seed,
no_graphics=no_graphics,
base_port=start_port,
args=env_args,
additional_args=env_args,
log_folder=log_folder,
)
return create_unity_environment

5
ml-agents/mlagents/trainers/policy/tf_policy.py


from typing import Any, Dict, List, Optional
import abc
import os
import numpy as np
from mlagents.tf_utils import tf
from mlagents import tf_utils

self.use_continuous_act = brain.vector_action_space_type == "continuous"
if self.use_continuous_act:
self.num_branches = self.brain.vector_action_space_size[0]
self.model_path = trainer_parameters["model_path"]
self.model_path = trainer_parameters["output_path"]
self.initialize_path = trainer_parameters.get("init_path", None)
self.keep_checkpoints = trainer_parameters.get("keep_checkpoints", 5)
self.graph = tf.Graph()

:return:
"""
with self.graph.as_default():
last_checkpoint = self.model_path + "/model-" + str(steps) + ".ckpt"
last_checkpoint = os.path.join(self.model_path, f"model-{steps}.ckpt")
self.saver.save(self.sess, last_checkpoint)
tf.train.write_graph(
self.graph, self.model_path, "raw_graph_def.pb", as_text=False

3
ml-agents/mlagents/trainers/ppo/trainer.py


"sequence_length",
"summary_freq",
"use_recurrent",
"summary_path",
"model_path",
"output_path",
"reward_signals",
]
self._check_param_keys()

7
ml-agents/mlagents/trainers/sac/trainer.py


"summary_freq",
"tau",
"use_recurrent",
"summary_path",
"model_path",
"output_path",
"reward_signals",
]

Save the training buffer's update buffer to a pickle file.
"""
filename = os.path.join(
self.trainer_parameters["model_path"], "last_replay_buffer.hdf5"
self.trainer_parameters["output_path"], "last_replay_buffer.hdf5"
)
logger.info("Saving Experience Replay Buffer to {}".format(filename))
with open(filename, "wb") as file_object:

Loads the last saved replay buffer from a file.
"""
filename = os.path.join(
self.trainer_parameters["model_path"], "last_replay_buffer.hdf5"
self.trainer_parameters["output_path"], "last_replay_buffer.hdf5"
)
logger.info("Loading Experience Replay Buffer from {}".format(filename))
with open(filename, "rb+") as file_object:

6
ml-agents/mlagents/trainers/tests/test_barracuda_converter.py


memory_size: 8
curiosity_strength: 0.0
curiosity_enc_size: 1
summary_path: test
model_path: test
output_path: test
reward_signals:
extrinsic:
strength: 1.0

@pytest.mark.parametrize("rnn", [True, False], ids=["rnn", "no_rnn"])
def test_policy_conversion(dummy_config, tmpdir, rnn, visual, discrete):
tf.reset_default_graph()
dummy_config["summary_path"] = str(tmpdir)
dummy_config["model_path"] = os.path.join(tmpdir, "test")
dummy_config["output_path"] = os.path.join(tmpdir, "test")
policy = create_policy_mock(
dummy_config, use_rnn=rnn, use_discrete=discrete, use_visual=visual
)

2
ml-agents/mlagents/trainers/tests/test_bcmodule.py


def create_bc_module(mock_brain, trainer_config, use_rnn, demo_file, tanhresample):
# model_path = env.external_brain_names[0]
trainer_config["model_path"] = "testpath"
trainer_config["output_path"] = "testpath"
trainer_config["keep_checkpoints"] = 3
trainer_config["use_recurrent"] = use_rnn
trainer_config["behavioral_cloning"]["demo_path"] = (

9
ml-agents/mlagents/trainers/tests/test_ghost.py


memory_size: 8
curiosity_strength: 0.0
curiosity_enc_size: 1
summary_path: test
model_path: test
output_path: test
reward_signals:
extrinsic:
strength: 1.0

vector_action_descriptions=[],
vector_action_space_type=0,
)
dummy_config["summary_path"] = "./summaries/test_trainer_summary"
dummy_config["model_path"] = "./models/test_trainer_models/TestModel"
dummy_config["output_path"] = "./results/test_trainer_models/TestModel"
ppo_trainer = PPOTrainer(brain_name, 0, dummy_config, True, False, 0, "0")
controller = GhostController(100)
trainer = GhostTrainer(

vector_action_descriptions=[],
vector_action_space_type=0,
)
dummy_config["summary_path"] = "./summaries/test_trainer_summary"
dummy_config["model_path"] = "./models/test_trainer_models/TestModel"
dummy_config["output_path"] = "./results/test_trainer_models/TestModel"
ppo_trainer = PPOTrainer(brain_name, 0, dummy_config, True, False, 0, "0")
controller = GhostController(100)
trainer = GhostTrainer(

17
ml-agents/mlagents/trainers/tests/test_learn.py


return parse_command_line(args)
@patch("mlagents.trainers.learn.write_timing_tree")
@patch("mlagents.trainers.learn.write_run_options")
@patch("mlagents.trainers.learn.handle_existing_directories")
@patch("mlagents.trainers.learn.TrainerFactory")
@patch("mlagents.trainers.learn.SamplerManager")

sampler_manager_mock,
trainer_factory_mock,
handle_dir_mock,
write_run_options_mock,
write_timing_tree_mock,
):
mock_env = MagicMock()
mock_env.external_brain_names = []

mock_init = MagicMock(return_value=None)
with patch.object(TrainerController, "__init__", mock_init):
with patch.object(TrainerController, "start_learning", MagicMock()):
learn.run_training(0, basic_options())
options = basic_options()
learn.run_training(0, options)
"./models/ppo",
"./summaries",
"results/ppo",
"ppo",
50000,
None,

None,
)
handle_dir_mock.assert_called_once_with(
"./models/ppo", "./summaries", False, False, None
)
handle_dir_mock.assert_called_once_with("results/ppo", False, False, None)
write_timing_tree_mock.assert_called_once_with("results/ppo/run_logs")
write_run_options_mock.assert_called_once_with("results/ppo", options)
StatsReporter.writers.clear() # make sure there aren't any writers as added by learn.py

seed=None,
start_port=8000,
env_args=None,
log_folder="results/log_folder",
)

8
ml-agents/mlagents/trainers/tests/test_nn_policy.py


memory_size: 8
curiosity_strength: 0.0
curiosity_enc_size: 1
summary_path: test
model_path: test
output_path: test
reward_signals:
extrinsic:
strength: 1.0

path1 = os.path.join(tmp_path, "runid1")
path2 = os.path.join(tmp_path, "runid2")
trainer_params = dummy_config
trainer_params["model_path"] = path1
trainer_params["output_path"] = path1
policy = create_policy_mock(trainer_params)
policy.initialize_or_load()
policy.save_model(2000)

vector_action_descriptions=[],
vector_action_space_type=0,
)
dummy_config["summary_path"] = "./summaries/test_trainer_summary"
dummy_config["model_path"] = "./models/test_trainer_models/TestModel"
dummy_config["output_path"] = "./results/test_trainer_models/TestModel"
time_horizon = 6
trajectory = make_fake_trajectory(

2
ml-agents/mlagents/trainers/tests/test_policy.py


def basic_params():
return {"use_recurrent": False, "model_path": "my/path"}
return {"use_recurrent": False, "output_path": "my/path"}
class FakePolicy(TFPolicy):

9
ml-agents/mlagents/trainers/tests/test_ppo.py


memory_size: 10
curiosity_strength: 0.0
curiosity_enc_size: 1
summary_path: test
model_path: test
output_path: test
reward_signals:
extrinsic:
strength: 1.0

vector_action_descriptions=[],
vector_action_space_type=0,
)
dummy_config["summary_path"] = "./summaries/test_trainer_summary"
dummy_config["model_path"] = "./models/test_trainer_models/TestModel"
dummy_config["output_path"] = "./results/test_trainer_models/TestModel"
trainer = PPOTrainer(brain_params, 0, dummy_config, True, False, 0, "0")
policy = trainer.create_policy(brain_params.brain_name, brain_params)
trainer.add_policy(brain_params.brain_name, policy)

mock_optimizer.reward_signals = {}
ppo_optimizer.return_value = mock_optimizer
dummy_config["summary_path"] = "./summaries/test_trainer_summary"
dummy_config["model_path"] = "./models/test_trainer_models/TestModel"
dummy_config["output_path"] = "./results/test_trainer_models/TestModel"
trainer = PPOTrainer(brain_params, 0, dummy_config, True, False, 0, "0")
policy = mock.Mock(spec=NNPolicy)
policy.get_current_step.return_value = 2000

2
ml-agents/mlagents/trainers/tests/test_reward_signals.py


)
trainer_parameters = trainer_config
model_path = "testpath"
trainer_parameters["model_path"] = model_path
trainer_parameters["output_path"] = model_path
trainer_parameters["keep_checkpoints"] = 3
trainer_parameters["reward_signals"].update(reward_signal_config)
trainer_parameters["use_recurrent"] = use_rnn

2
ml-agents/mlagents/trainers/tests/test_rl_trainer.py


def dummy_config():
return yaml.safe_load(
"""
summary_path: "test/"
output_path: "test/"
summary_freq: 1000
max_steps: 100
reward_signals:

14
ml-agents/mlagents/trainers/tests/test_sac.py


trainer_parameters = dummy_config
model_path = "testmodel"
trainer_parameters["model_path"] = model_path
trainer_parameters["output_path"] = model_path
trainer_parameters["keep_checkpoints"] = 3
trainer_parameters["use_recurrent"] = use_rnn
policy = NNPolicy(

discrete_action_space=DISCRETE_ACTION_SPACE,
)
trainer_params = dummy_config
trainer_params["summary_path"] = str(tmpdir)
trainer_params["model_path"] = str(tmpdir)
trainer_params["output_path"] = str(tmpdir)
trainer_params["save_replay_buffer"] = True
trainer = SACTrainer(mock_brain.brain_name, 1, trainer_params, True, False, 0, 0)
policy = trainer.create_policy(mock_brain.brain_name, mock_brain)

mock_optimizer.reward_signals = {}
sac_optimizer.return_value = mock_optimizer
dummy_config["summary_path"] = "./summaries/test_trainer_summary"
dummy_config["model_path"] = "./models/test_trainer_models/TestModel"
dummy_config["output_path"] = "./results/test_trainer_models/TestModel"
trainer = SACTrainer(brain_params, 0, dummy_config, True, False, 0, "0")
policy = mock.Mock(spec=NNPolicy)
policy.get_current_step.return_value = 2000

brain_params = make_brain_parameters(
discrete_action=False, visual_inputs=0, vec_obs_size=6
)
dummy_config["summary_path"] = "./summaries/test_trainer_summary"
dummy_config["model_path"] = "./models/test_trainer_models/TestModel"
dummy_config["output_path"] = "./results/test_trainer_models/TestModel"
dummy_config["steps_per_update"] = 20
trainer = SACTrainer(brain_params, 0, dummy_config, True, False, 0, "0")
policy = trainer.create_policy(brain_params.brain_name, brain_params)

dummy_config["sequence_length"] = 64
dummy_config["batch_size"] = 32
dummy_config["use_recurrent"] = True
dummy_config["summary_path"] = "./summaries/test_trainer_summary"
dummy_config["model_path"] = "./models/test_trainer_models/TestModel"
dummy_config["output_path"] = "./results/test_trainer_models/TestModel"
with pytest.raises(UnityTrainerException):
_ = SACTrainer(brain_params, 0, dummy_config, True, False, 0, "0")

6
ml-agents/mlagents/trainers/tests/test_simple_rl.py


env_manager = SimpleEnvManager(env, EnvironmentParametersChannel())
trainer_factory = TrainerFactory(
trainer_config=trainer_config,
summaries_dir=dir,
model_path=dir,
output_path=dir,
keep_checkpoints=1,
train_model=True,
load_model=False,

tc = TrainerController(
trainer_factory=trainer_factory,
summaries_dir=dir,
model_path=dir,
output_path=dir,
run_id=run_id,
meta_curriculum=meta_curriculum,
train=True,

6
ml-agents/mlagents/trainers/tests/test_trainer_controller.py


trainer_factory_mock.ghost_controller = GhostController()
return TrainerController(
trainer_factory=trainer_factory_mock,
model_path="test_model_path",
summaries_dir="test_summaries_dir",
output_path="test_model_path",
run_id="test_run_id",
save_freq=100,
meta_curriculum=None,

trainer_factory_mock.ghost_controller = GhostController()
TrainerController(
trainer_factory=trainer_factory_mock,
model_path="",
summaries_dir="",
output_path="",
run_id="1",
save_freq=1,
meta_curriculum=None,

60
ml-agents/mlagents/trainers/tests/test_trainer_util.py


def test_initialize_trainer_parameters_override_defaults(
BrainParametersMock, dummy_config_with_override
):
summaries_dir = "test_dir"
model_path = "model_dir"
output_path = "model_dir"
keep_checkpoints = 1
train_model = True
load_model = False

base_config = dummy_config_with_override
expected_config = base_config["default"]
expected_config["summary_path"] = f"{run_id}_testbrain"
expected_config["model_path"] = model_path + "/testbrain"
expected_config["output_path"] = output_path + "/testbrain"
expected_config["keep_checkpoints"] = keep_checkpoints
# Override value from specific brain config

with patch.object(PPOTrainer, "__init__", mock_constructor):
trainer_factory = trainer_util.TrainerFactory(
trainer_config=base_config,
summaries_dir=summaries_dir,
model_path=model_path,
output_path=output_path,
keep_checkpoints=keep_checkpoints,
train_model=train_model,
load_model=load_model,

brain_params_mock = BrainParametersMock()
BrainParametersMock.return_value.brain_name = "testbrain"
external_brains = {"testbrain": BrainParametersMock()}
summaries_dir = "test_dir"
model_path = "model_dir"
output_path = "results_dir"
keep_checkpoints = 1
train_model = True
load_model = False

base_config = dummy_config
expected_config = base_config["default"]
expected_config["summary_path"] = f"{run_id}_testbrain"
expected_config["model_path"] = model_path + "/testbrain"
expected_config["output_path"] = output_path + "/testbrain"
expected_config["keep_checkpoints"] = keep_checkpoints
def mock_constructor(

with patch.object(PPOTrainer, "__init__", mock_constructor):
trainer_factory = trainer_util.TrainerFactory(
trainer_config=base_config,
summaries_dir=summaries_dir,
model_path=model_path,
output_path=output_path,
keep_checkpoints=keep_checkpoints,
train_model=train_model,
load_model=load_model,

def test_initialize_invalid_trainer_raises_exception(
BrainParametersMock, dummy_bad_config
):
summaries_dir = "test_dir"
model_path = "model_dir"
output_path = "results_dir"
keep_checkpoints = 1
train_model = True
load_model = False

with pytest.raises(TrainerConfigError):
trainer_factory = trainer_util.TrainerFactory(
trainer_config=bad_config,
summaries_dir=summaries_dir,
model_path=model_path,
output_path=output_path,
keep_checkpoints=keep_checkpoints,
train_model=train_model,
load_model=load_model,

with pytest.raises(TrainerConfigError):
trainer_factory = trainer_util.TrainerFactory(
trainer_config=bad_config,
summaries_dir=summaries_dir,
model_path=model_path,
output_path=output_path,
keep_checkpoints=keep_checkpoints,
train_model=train_model,
load_model=load_model,

with pytest.raises(UnityTrainerException):
trainer_factory = trainer_util.TrainerFactory(
trainer_config=bad_config,
summaries_dir=summaries_dir,
model_path=model_path,
output_path=output_path,
keep_checkpoints=keep_checkpoints,
train_model=train_model,
load_model=load_model,

trainer_factory = trainer_util.TrainerFactory(
trainer_config=no_default_config,
summaries_dir="test_dir",
model_path="model_dir",
output_path="output_path",
keep_checkpoints=1,
train_model=True,
load_model=False,

trainer_factory = trainer_util.TrainerFactory(
trainer_config=bad_config,
summaries_dir="test_dir",
model_path="model_dir",
output_path="output_path",
keep_checkpoints=1,
train_model=True,
load_model=False,

def test_existing_directories(tmp_path):
model_path = os.path.join(tmp_path, "runid")
# Unused summary path
summary_path = os.path.join(tmp_path, "runid")
output_path = os.path.join(tmp_path, "runid")
trainer_util.handle_existing_directories(model_path, summary_path, False, False)
trainer_util.handle_existing_directories(output_path, False, False)
trainer_util.handle_existing_directories(model_path, summary_path, True, False)
trainer_util.handle_existing_directories(output_path, True, False)
os.mkdir(model_path)
os.mkdir(output_path)
trainer_util.handle_existing_directories(model_path, summary_path, False, False)
trainer_util.handle_existing_directories(output_path, False, False)
trainer_util.handle_existing_directories(model_path, summary_path, True, False)
trainer_util.handle_existing_directories(output_path, True, False)
trainer_util.handle_existing_directories(model_path, summary_path, False, True)
trainer_util.handle_existing_directories(output_path, False, True)
trainer_util.handle_existing_directories(
model_path, summary_path, False, True, init_path
)
trainer_util.handle_existing_directories(output_path, False, True, init_path)
trainer_util.handle_existing_directories(
model_path, summary_path, False, True, init_path
)
trainer_util.handle_existing_directories(output_path, False, True, init_path)

3
ml-agents/mlagents/trainers/trainer/trainer.py


self.brain_name = brain_name
self.run_id = run_id
self.trainer_parameters = trainer_parameters
self.summary_path = trainer_parameters["summary_path"]
self._stats_reporter = StatsReporter(self.summary_path)
self._stats_reporter = StatsReporter(brain_name)
self.is_training = training
self._reward_buffer: Deque[float] = deque(maxlen=reward_buff_cap)
self.policy_queues: List[AgentManagerQueue[Policy]] = []

20
ml-agents/mlagents/trainers/trainer_controller.py


def __init__(
self,
trainer_factory: TrainerFactory,
model_path: str,
summaries_dir: str,
output_path: str,
run_id: str,
save_freq: int,
meta_curriculum: Optional[MetaCurriculum],

resampling_interval: Optional[int],
):
"""
:param model_path: Path to save the model.
:param output_path: Path to save the model.
:param summaries_dir: Folder to save training summaries.
:param run_id: The sub-directory name for model and summary statistics
:param save_freq: Frequency at which to save model

self.trainers: Dict[str, Trainer] = {}
self.brain_name_to_identifier: Dict[str, Set] = defaultdict(set)
self.trainer_factory = trainer_factory
self.model_path = model_path
self.summaries_dir = summaries_dir
self.output_path = output_path
self.logger = get_logger(__name__)
self.run_id = run_id
self.save_freq = save_freq

self.trainers[brain_name].export_model(name_behavior_id)
@staticmethod
def _create_model_path(model_path):
def _create_output_path(output_path):
if not os.path.exists(model_path):
os.makedirs(model_path)
if not os.path.exists(output_path):
os.makedirs(output_path)
"The folder {} containing the "
f"The folder {output_path} containing the "
"permissions are set correctly.".format(model_path)
"permissions are set correctly."
)
@timed

@timed
def start_learning(self, env_manager: EnvManager) -> None:
self._create_model_path(self.model_path)
self._create_output_path(self.output_path)
tf.reset_default_graph()
global_step = 0
last_brain_behavior_ids: Set[str] = set()

30
ml-agents/mlagents/trainers/trainer_util.py


def __init__(
self,
trainer_config: Any,
summaries_dir: str,
model_path: str,
output_path: str,
keep_checkpoints: int,
train_model: bool,
load_model: bool,

multi_gpu: bool = False,
):
self.trainer_config = trainer_config
self.summaries_dir = summaries_dir
self.model_path = model_path
self.output_path = output_path
self.init_path = init_path
self.keep_checkpoints = keep_checkpoints
self.train_model = train_model

return initialize_trainer(
self.trainer_config,
brain_name,
self.summaries_dir,
self.model_path,
self.output_path,
self.keep_checkpoints,
self.train_model,
self.load_model,

def initialize_trainer(
trainer_config: Any,
brain_name: str,
summaries_dir: str,
model_path: str,
output_path: str,
keep_checkpoints: int,
train_model: bool,
load_model: bool,

:param trainer_config: Original trainer configuration loaded from YAML
:param brain_name: Name of the brain to be associated with trainer
:param summaries_dir: Directory to store trainer summary statistics
:param model_path: Path to save the model
:param output_path: Path to save the model and summary statistics
:param keep_checkpoints: How many model checkpoints to keep
:param train_model: Whether to train the model (vs. run inference)
:param load_model: Whether to load the model or randomly initialize

)
trainer_parameters = trainer_config.get("default", {}).copy()
trainer_parameters["summary_path"] = str(run_id) + "_" + brain_name
trainer_parameters["model_path"] = "{basedir}/{name}".format(
basedir=model_path, name=brain_name
)
trainer_parameters["output_path"] = os.path.join(output_path, brain_name)
trainer_parameters["init_path"] = "{basedir}/{name}".format(
basedir=init_path, name=brain_name
)
trainer_parameters["init_path"] = os.path.join(init_path, brain_name)
trainer_parameters["keep_checkpoints"] = keep_checkpoints
if brain_name in trainer_config:
_brain_key: Any = brain_name

def handle_existing_directories(
model_path: str, summary_path: str, resume: bool, force: bool, init_path: str = None
output_path: str, resume: bool, force: bool, init_path: str = None
) -> None:
"""
Validates that if the run_id model exists, we do not overwrite it unless --force is specified.

:param force: Whether or not the --force flag was passed.
"""
model_path_exists = os.path.isdir(model_path)
output_path_exists = os.path.isdir(output_path)
if model_path_exists:
if output_path_exists:
if not resume and not force:
raise UnityTrainerException(
"Previous data from this run ID was found. "

17
ml-agents/tests/yamato/scripts/run_llapi.py


file_name=env_name,
side_channels=[engine_configuration_channel],
no_graphics=True,
args=["-logFile", "-"],
additional_args=["-logFile", "-"],
)
try:

"""
try:
env1 = UnityEnvironment(
file_name=env_name, base_port=5006, no_graphics=True, args=["-logFile", "-"]
file_name=env_name,
base_port=5006,
no_graphics=True,
additional_args=["-logFile", "-"],
file_name=env_name, base_port=5006, no_graphics=True, args=["-logFile", "-"]
file_name=env_name,
base_port=5006,
no_graphics=True,
additional_args=["-logFile", "-"],
file_name=env_name, base_port=5007, no_graphics=True, args=["-logFile", "-"]
file_name=env_name,
base_port=5007,
no_graphics=True,
additional_args=["-logFile", "-"],
)
env2.reset()
finally:

2
ml-agents/tests/yamato/training_int_tests.py


print(
f"Running training with python={python_version or latest} and c#={csharp_version or latest}"
)
nn_file_expected = f"./models/{run_id}/3DBall.nn"
nn_file_expected = f"./results/{run_id}/3DBall.nn"
if os.path.exists(nn_file_expected):
# Should never happen - make sure nothing leftover from an old test.
print("Artifacts from previous build found!")

正在加载...
取消
保存