浏览代码
Merge remote-tracking branch 'origin/develop' into enable-flake8
/develop-generalizationTraining-TrainerController
Merge remote-tracking branch 'origin/develop' into enable-flake8
/develop-generalizationTraining-TrainerController
Chris Elion
6 年前
当前提交
5d07ca1f
共有 161 个文件被更改,包括 22523 次插入 和 476 次删除
-
5.gitignore
-
1README.md
-
10UnitySDK/Assets/ML-Agents/Examples/3DBall/Scenes/3DBall.unity
-
10UnitySDK/Assets/ML-Agents/Examples/3DBall/Scenes/3DBallHard.unity
-
2UnitySDK/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAcademy.cs
-
19UnitySDK/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs
-
19UnitySDK/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DHardAgent.cs
-
6UnitySDK/Assets/ML-Agents/Examples/Soccer/Scenes/SoccerTwos.unity
-
17UnitySDK/Assets/ML-Agents/Examples/Soccer/Scripts/AgentSoccer.cs
-
2UnitySDK/Assets/ML-Agents/Examples/Soccer/Scripts/SoccerAcademy.cs
-
25UnitySDK/Assets/ML-Agents/Examples/Soccer/Scripts/SoccerFieldArea.cs
-
8UnitySDK/Assets/ML-Agents/Examples/Tennis/Scenes/Tennis.unity
-
18UnitySDK/Assets/ML-Agents/Examples/Tennis/Scenes/TennisIL.unity
-
2UnitySDK/Assets/ML-Agents/Examples/Tennis/Scripts/HitWall.cs
-
1UnitySDK/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAcademy.cs
-
40UnitySDK/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs
-
2docs/Background-Machine-Learning.md
-
19docs/Installation.md
-
38docs/Learning-Environment-Examples.md
-
125docs/Training-Imitation-Learning.md
-
70docs/Training-PPO.md
-
99docs/Training-RewardSignals.md
-
23docs/Using-Docker.md
-
6gym-unity/gym_unity/envs/unity_env.py
-
4ml-agents-envs/mlagents/envs/__init__.py
-
13ml-agents-envs/mlagents/envs/brain.py
-
12ml-agents-envs/mlagents/envs/environment.py
-
2ml-agents-envs/setup.py
-
3ml-agents/mlagents/trainers/__init__.py
-
4ml-agents/mlagents/trainers/bc/policy.py
-
14ml-agents/mlagents/trainers/components/reward_signals/curiosity/model.py
-
32ml-agents/mlagents/trainers/components/reward_signals/curiosity/signal.py
-
17ml-agents/mlagents/trainers/components/reward_signals/extrinsic/signal.py
-
22ml-agents/mlagents/trainers/components/reward_signals/reward_signal.py
-
6ml-agents/mlagents/trainers/components/reward_signals/reward_signal_factory.py
-
39ml-agents/mlagents/trainers/demo_loader.py
-
7ml-agents/mlagents/trainers/learn.py
-
37ml-agents/mlagents/trainers/models.py
-
39ml-agents/mlagents/trainers/ppo/policy.py
-
30ml-agents/mlagents/trainers/ppo/trainer.py
-
49ml-agents/mlagents/trainers/tests/mock_brain.py
-
13ml-agents/mlagents/trainers/tests/test_demo_loader.py
-
6ml-agents/mlagents/trainers/tests/test_learn.py
-
8ml-agents/mlagents/trainers/tests/test_policy.py
-
76ml-agents/mlagents/trainers/tests/test_ppo.py
-
154ml-agents/mlagents/trainers/tests/test_reward_signals.py
-
134ml-agents/mlagents/trainers/tests/test_trainer_controller.py
-
27ml-agents/mlagents/trainers/trainer.py
-
187ml-agents/mlagents/trainers/trainer_controller.py
-
14ml-agents/mlagents/trainers/tf_policy.py
-
6setup.cfg
-
92docs/Training-BehavioralCloning.md
-
80docs/images/mlagents-ImitationAndRL.png
-
38ml-agents-envs/mlagents/envs/env_manager.py
-
10ml-agents-envs/mlagents/envs/policy.py
-
187ml-agents-envs/mlagents/envs/subprocess_env_manager.py
-
110ml-agents-envs/mlagents/envs/tests/test_subprocess_env_manager.py
-
96ml-agents-envs/mlagents/envs/tests/test_timers.py
-
181ml-agents-envs/mlagents/envs/timers.py
-
158ml-agents/mlagents/trainers/tests/test_bcmodule.py
-
1001ml-agents/mlagents/trainers/tests/testdcvis.demo
-
442demos/Expert3DBall.demo
-
1001demos/Expert3DBallHard.demo
-
1001demos/ExpertBanana.demo
-
171demos/ExpertBasic.demo
-
198demos/ExpertBouncer.demo
-
1001demos/ExpertCrawlerSta.demo
-
1001demos/ExpertGrid.demo
-
1001demos/ExpertHallway.demo
-
1001demos/ExpertPush.demo
-
1001demos/ExpertPyramid.demo
-
1001demos/ExpertReacher.demo
-
1001demos/ExpertSoccerGoal.demo
-
1001demos/ExpertSoccerStri.demo
-
1001demos/ExpertTennis.demo
-
1001demos/ExpertWalker.demo
-
0docs/localized/KR/docs/Migrating.md
-
0docs/localized/KR/docs/Readme.md
-
115docs/localized/KR/docs/images/academy.png
-
70docs/localized/KR/docs/images/agent.png
-
664docs/localized/KR/docs/images/anaconda_default.PNG
-
635docs/localized/KR/docs/images/anaconda_install.PNG
-
1001docs/localized/KR/docs/images/balance.png
-
1001docs/localized/KR/docs/images/banana.png
-
611docs/localized/KR/docs/images/banner.png
-
69docs/localized/KR/docs/images/basic.png
-
51docs/localized/KR/docs/images/bc_teacher_helper.png
-
955docs/localized/KR/docs/images/bouncer.png
-
121docs/localized/KR/docs/images/brain.png
-
139docs/localized/KR/docs/images/broadcast.png
-
268docs/localized/KR/docs/images/conda_new.PNG
-
1001docs/localized/KR/docs/images/crawler.png
|
|||
from .brain import * |
|||
from .brain import AllBrainInfo, BrainInfo, BrainParameters |
|||
from .action_info import ActionInfo, ActionInfoOutputs |
|||
from .policy import Policy |
|||
from .environment import * |
|||
from .exception import * |
|
|||
# Training with Behavioral Cloning |
|||
|
|||
There are a variety of possible imitation learning algorithms which can |
|||
be used, the simplest one of them is Behavioral Cloning. It works by collecting |
|||
demonstrations from a teacher, and then simply uses them to directly learn a |
|||
policy, in the same way the supervised learning for image classification |
|||
or other traditional Machine Learning tasks work. |
|||
|
|||
## Offline Training |
|||
|
|||
With offline behavioral cloning, we can use demonstrations (`.demo` files) |
|||
generated using the `Demonstration Recorder` as the dataset used to train a behavior. |
|||
|
|||
1. Choose an agent you would like to learn to imitate some set of demonstrations. |
|||
2. Record a set of demonstration using the `Demonstration Recorder` (see [here](Training-Imitation-Learning.md)). |
|||
For illustrative purposes we will refer to this file as `AgentRecording.demo`. |
|||
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to |
|||
Control in the Broadcast Hub. For more information on Brains, see |
|||
[here](Learning-Environment-Design-Brains.md). |
|||
4. Open the `config/offline_bc_config.yaml` file. |
|||
5. Modify the `demo_path` parameter in the file to reference the path to the |
|||
demonstration file recorded in step 2. In our case this is: |
|||
`./UnitySDK/Assets/Demonstrations/AgentRecording.demo` |
|||
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml` |
|||
as the config parameter, and include the `--run-id` and `--train` as usual. |
|||
Provide your environment as the `--env` parameter if it has been compiled |
|||
as standalone, or omit to train in the editor. |
|||
7. (Optional) Observe training performance using TensorBoard. |
|||
|
|||
This will use the demonstration file to train a neural network driven agent |
|||
to directly imitate the actions provided in the demonstration. The environment |
|||
will launch and be used for evaluating the agent's performance during training. |
|||
|
|||
## Online Training |
|||
|
|||
It is also possible to provide demonstrations in realtime during training, |
|||
without pre-recording a demonstration file. The steps to do this are as follows: |
|||
|
|||
1. First create two Brains, one which will be the "Teacher," and the other which |
|||
will be the "Student." We will assume that the names of the Brain |
|||
Assets are "Teacher" and "Student" respectively. |
|||
2. The "Teacher" Brain must be a **Player Brain**. You must properly |
|||
configure the inputs to map to the corresponding actions. |
|||
3. The "Student" Brain must be a **Learning Brain**. |
|||
4. The Brain Parameters of both the "Teacher" and "Student" Brains must be |
|||
compatible with the agent. |
|||
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub` |
|||
and check the `Control` checkbox on the "Student" Brain. |
|||
6. Link the Brains to the desired Agents (one Agent as the teacher and at least |
|||
one Agent as a student). |
|||
7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set |
|||
the `trainer` parameter of this entry to `online_bc`, and the |
|||
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher". |
|||
Additionally, set `batches_per_epoch`, which controls how much training to do |
|||
each moment. Increase the `max_steps` option if you'd like to keep training |
|||
the Agents for a longer period of time. |
|||
8. Launch the training process with `mlagents-learn config/online_bc_config.yaml |
|||
--train --slow`, and press the :arrow_forward: button in Unity when the |
|||
message _"Start training by pressing the Play button in the Unity Editor"_ is |
|||
displayed on the screen |
|||
9. From the Unity window, control the Agent with the Teacher Brain by providing |
|||
"teacher demonstrations" of the behavior you would like to see. |
|||
10. Watch as the Agent(s) with the student Brain attached begin to behave |
|||
similarly to the demonstrations. |
|||
11. Once the Student Agents are exhibiting the desired behavior, end the training |
|||
process with `CTL+C` from the command line. |
|||
12. Move the resulting `*.nn` file into the `TFModels` subdirectory of the |
|||
Assets folder (or a subdirectory within Assets of your choosing) , and use |
|||
with `Learning` Brain. |
|||
|
|||
**BC Teacher Helper** |
|||
|
|||
We provide a convenience utility, `BC Teacher Helper` component that you can add |
|||
to the Teacher Agent. |
|||
|
|||
<p align="center"> |
|||
<img src="images/bc_teacher_helper.png" |
|||
alt="BC Teacher Helper" |
|||
width="375" border="10" /> |
|||
</p> |
|||
|
|||
This utility enables you to use keyboard shortcuts to do the following: |
|||
|
|||
1. To start and stop recording experiences. This is useful in case you'd like to |
|||
interact with the game _but not have the agents learn from these |
|||
interactions_. The default command to toggle this is to press `R` on the |
|||
keyboard. |
|||
|
|||
2. Reset the training buffer. This enables you to instruct the agents to forget |
|||
their buffer of recent experiences. This is useful if you'd like to get them |
|||
to quickly learn a new behavior. The default command to reset the buffer is |
|||
to press `C` on the keyboard. |
|
|||
from abc import ABC, abstractmethod |
|||
from typing import List, Dict, NamedTuple, Optional |
|||
from mlagents.envs import AllBrainInfo, BrainParameters, Policy, ActionInfo |
|||
|
|||
|
|||
class StepInfo(NamedTuple): |
|||
previous_all_brain_info: Optional[AllBrainInfo] |
|||
current_all_brain_info: AllBrainInfo |
|||
brain_name_to_action_info: Optional[Dict[str, ActionInfo]] |
|||
|
|||
|
|||
class EnvManager(ABC): |
|||
def __init__(self): |
|||
self.policies: Dict[str, Policy] = {} |
|||
|
|||
def set_policy(self, brain_name: str, policy: Policy) -> None: |
|||
self.policies[brain_name] = policy |
|||
|
|||
@abstractmethod |
|||
def step(self) -> List[StepInfo]: |
|||
pass |
|||
|
|||
@abstractmethod |
|||
def reset(self, config=None, train_mode=True) -> List[StepInfo]: |
|||
pass |
|||
|
|||
@abstractmethod |
|||
def external_brains(self) -> Dict[str, BrainParameters]: |
|||
pass |
|||
|
|||
@property |
|||
@abstractmethod |
|||
def reset_parameters(self) -> Dict[str, float]: |
|||
pass |
|||
|
|||
@abstractmethod |
|||
def close(self): |
|||
pass |
|
|||
from abc import ABC, abstractmethod |
|||
|
|||
from mlagents.envs import BrainInfo |
|||
from mlagents.envs import ActionInfo |
|||
|
|||
|
|||
class Policy(ABC): |
|||
@abstractmethod |
|||
def get_action(self, brain_info: BrainInfo) -> ActionInfo: |
|||
pass |
|
|||
from typing import * |
|||
import cloudpickle |
|||
|
|||
from mlagents.envs import UnityEnvironment |
|||
from multiprocessing import Process, Pipe |
|||
from multiprocessing.connection import Connection |
|||
from mlagents.envs.base_unity_environment import BaseUnityEnvironment |
|||
from mlagents.envs.env_manager import EnvManager, StepInfo |
|||
from mlagents.envs.timers import timed, hierarchical_timer |
|||
from mlagents.envs import AllBrainInfo, BrainParameters, ActionInfo |
|||
|
|||
|
|||
class EnvironmentCommand(NamedTuple): |
|||
name: str |
|||
payload: Any = None |
|||
|
|||
|
|||
class EnvironmentResponse(NamedTuple): |
|||
name: str |
|||
worker_id: int |
|||
payload: Any |
|||
|
|||
|
|||
class UnityEnvWorker: |
|||
def __init__(self, process: Process, worker_id: int, conn: Connection): |
|||
self.process = process |
|||
self.worker_id = worker_id |
|||
self.conn = conn |
|||
self.previous_step: StepInfo = StepInfo(None, {}, None) |
|||
self.previous_all_action_info: Dict[str, ActionInfo] = {} |
|||
|
|||
def send(self, name: str, payload=None): |
|||
try: |
|||
cmd = EnvironmentCommand(name, payload) |
|||
self.conn.send(cmd) |
|||
except (BrokenPipeError, EOFError): |
|||
raise KeyboardInterrupt |
|||
|
|||
def recv(self) -> EnvironmentResponse: |
|||
try: |
|||
response: EnvironmentResponse = self.conn.recv() |
|||
return response |
|||
except (BrokenPipeError, EOFError): |
|||
raise KeyboardInterrupt |
|||
|
|||
def close(self): |
|||
try: |
|||
self.conn.send(EnvironmentCommand("close")) |
|||
except (BrokenPipeError, EOFError): |
|||
pass |
|||
self.process.join() |
|||
|
|||
|
|||
def worker(parent_conn: Connection, pickled_env_factory: str, worker_id: int): |
|||
env_factory: Callable[[int], UnityEnvironment] = cloudpickle.loads( |
|||
pickled_env_factory |
|||
) |
|||
env = env_factory(worker_id) |
|||
|
|||
def _send_response(cmd_name, payload): |
|||
parent_conn.send(EnvironmentResponse(cmd_name, worker_id, payload)) |
|||
|
|||
try: |
|||
while True: |
|||
cmd: EnvironmentCommand = parent_conn.recv() |
|||
if cmd.name == "step": |
|||
all_action_info = cmd.payload |
|||
if env.global_done: |
|||
all_brain_info = env.reset() |
|||
else: |
|||
actions = {} |
|||
memories = {} |
|||
texts = {} |
|||
values = {} |
|||
for brain_name, action_info in all_action_info.items(): |
|||
actions[brain_name] = action_info.action |
|||
memories[brain_name] = action_info.memory |
|||
texts[brain_name] = action_info.text |
|||
values[brain_name] = action_info.value |
|||
all_brain_info = env.step(actions, memories, texts, values) |
|||
_send_response("step", all_brain_info) |
|||
elif cmd.name == "external_brains": |
|||
_send_response("external_brains", env.external_brains) |
|||
elif cmd.name == "reset_parameters": |
|||
_send_response("reset_parameters", env.reset_parameters) |
|||
elif cmd.name == "reset": |
|||
all_brain_info = env.reset( |
|||
cmd.payload[0], cmd.payload[1], cmd.payload[2] |
|||
) |
|||
_send_response("reset", all_brain_info) |
|||
elif cmd.name == "global_done": |
|||
_send_response("global_done", env.global_done) |
|||
elif cmd.name == "close": |
|||
break |
|||
except KeyboardInterrupt: |
|||
print("UnityEnvironment worker: keyboard interrupt") |
|||
finally: |
|||
env.close() |
|||
|
|||
|
|||
class SubprocessEnvManager(EnvManager): |
|||
def __init__( |
|||
self, env_factory: Callable[[int], BaseUnityEnvironment], n_env: int = 1 |
|||
): |
|||
super().__init__() |
|||
self.env_workers: List[UnityEnvWorker] = [] |
|||
for worker_idx in range(n_env): |
|||
self.env_workers.append(self.create_worker(worker_idx, env_factory)) |
|||
|
|||
def get_last_steps(self): |
|||
return [ew.previous_step for ew in self.env_workers] |
|||
|
|||
@staticmethod |
|||
def create_worker( |
|||
worker_id: int, env_factory: Callable[[int], BaseUnityEnvironment] |
|||
) -> UnityEnvWorker: |
|||
parent_conn, child_conn = Pipe() |
|||
|
|||
# Need to use cloudpickle for the env factory function since function objects aren't picklable |
|||
# on Windows as of Python 3.6. |
|||
pickled_env_factory = cloudpickle.dumps(env_factory) |
|||
child_process = Process( |
|||
target=worker, args=(child_conn, pickled_env_factory, worker_id) |
|||
) |
|||
child_process.start() |
|||
return UnityEnvWorker(child_process, worker_id, parent_conn) |
|||
|
|||
def step(self) -> List[StepInfo]: |
|||
for env_worker in self.env_workers: |
|||
all_action_info = self._take_step(env_worker.previous_step) |
|||
env_worker.previous_all_action_info = all_action_info |
|||
env_worker.send("step", all_action_info) |
|||
|
|||
with hierarchical_timer("recv"): |
|||
step_brain_infos: List[AllBrainInfo] = [ |
|||
self.env_workers[i].recv().payload for i in range(len(self.env_workers)) |
|||
] |
|||
steps = [] |
|||
for i in range(len(step_brain_infos)): |
|||
env_worker = self.env_workers[i] |
|||
step_info = StepInfo( |
|||
env_worker.previous_step.current_all_brain_info, |
|||
step_brain_infos[i], |
|||
env_worker.previous_all_action_info, |
|||
) |
|||
env_worker.previous_step = step_info |
|||
steps.append(step_info) |
|||
return steps |
|||
|
|||
def reset( |
|||
self, config=None, train_mode=True, custom_reset_parameters=None |
|||
) -> List[StepInfo]: |
|||
self._broadcast_message("reset", (config, train_mode, custom_reset_parameters)) |
|||
reset_results = [ |
|||
self.env_workers[i].recv().payload for i in range(len(self.env_workers)) |
|||
] |
|||
for i in range(len(reset_results)): |
|||
env_worker = self.env_workers[i] |
|||
env_worker.previous_step = StepInfo(None, reset_results[i], None) |
|||
return list(map(lambda ew: ew.previous_step, self.env_workers)) |
|||
|
|||
@property |
|||
def external_brains(self) -> Dict[str, BrainParameters]: |
|||
self.env_workers[0].send("external_brains") |
|||
return self.env_workers[0].recv().payload |
|||
|
|||
@property |
|||
def reset_parameters(self) -> Dict[str, float]: |
|||
self.env_workers[0].send("reset_parameters") |
|||
return self.env_workers[0].recv().payload |
|||
|
|||
def close(self): |
|||
for env in self.env_workers: |
|||
env.close() |
|||
|
|||
def _broadcast_message(self, name: str, payload=None): |
|||
for env in self.env_workers: |
|||
env.send(name, payload) |
|||
|
|||
@timed |
|||
def _take_step(self, last_step: StepInfo) -> Dict[str, ActionInfo]: |
|||
all_action_info: Dict[str, ActionInfo] = {} |
|||
for brain_name, brain_info in last_step.current_all_brain_info.items(): |
|||
all_action_info[brain_name] = self.policies[brain_name].get_action( |
|||
brain_info |
|||
) |
|||
return all_action_info |
|
|||
import unittest.mock as mock |
|||
from unittest.mock import Mock, MagicMock |
|||
import unittest |
|||
import cloudpickle |
|||
from mlagents.envs.subprocess_env_manager import StepInfo |
|||
|
|||
from mlagents.envs.subprocess_env_manager import ( |
|||
SubprocessEnvManager, |
|||
EnvironmentResponse, |
|||
EnvironmentCommand, |
|||
worker, |
|||
) |
|||
from mlagents.envs.base_unity_environment import BaseUnityEnvironment |
|||
|
|||
|
|||
def mock_env_factory(worker_id: int): |
|||
return mock.create_autospec(spec=BaseUnityEnvironment) |
|||
|
|||
|
|||
class MockEnvWorker: |
|||
def __init__(self, worker_id, resp=None): |
|||
self.worker_id = worker_id |
|||
self.process = None |
|||
self.conn = None |
|||
self.send = Mock() |
|||
self.recv = Mock(return_value=resp) |
|||
|
|||
|
|||
class SubprocessEnvManagerTest(unittest.TestCase): |
|||
def test_environments_are_created(self): |
|||
SubprocessEnvManager.create_worker = MagicMock() |
|||
env = SubprocessEnvManager(mock_env_factory, 2) |
|||
# Creates two processes |
|||
env.create_worker.assert_has_calls( |
|||
[mock.call(0, mock_env_factory), mock.call(1, mock_env_factory)] |
|||
) |
|||
self.assertEqual(len(env.env_workers), 2) |
|||
|
|||
def test_worker_step_resets_on_global_done(self): |
|||
env_mock = Mock() |
|||
env_mock.reset = Mock(return_value="reset_data") |
|||
env_mock.global_done = True |
|||
|
|||
def mock_global_done_env_factory(worker_id: int): |
|||
return env_mock |
|||
|
|||
mock_parent_connection = Mock() |
|||
step_command = EnvironmentCommand("step", (None, None, None, None)) |
|||
close_command = EnvironmentCommand("close") |
|||
mock_parent_connection.recv.side_effect = [step_command, close_command] |
|||
mock_parent_connection.send = Mock() |
|||
|
|||
worker( |
|||
mock_parent_connection, cloudpickle.dumps(mock_global_done_env_factory), 0 |
|||
) |
|||
|
|||
# recv called twice to get step and close command |
|||
self.assertEqual(mock_parent_connection.recv.call_count, 2) |
|||
|
|||
# worker returns the data from the reset |
|||
mock_parent_connection.send.assert_called_with( |
|||
EnvironmentResponse("step", 0, "reset_data") |
|||
) |
|||
|
|||
def test_reset_passes_reset_params(self): |
|||
manager = SubprocessEnvManager(mock_env_factory, 1) |
|||
params = {"test": "params"} |
|||
manager.reset(params, False) |
|||
manager.env_workers[0].send.assert_called_with("reset", (params, False, None)) |
|||
|
|||
def test_reset_collects_results_from_all_envs(self): |
|||
SubprocessEnvManager.create_worker = lambda em, worker_id, env_factory: MockEnvWorker( |
|||
worker_id, EnvironmentResponse("reset", worker_id, worker_id) |
|||
) |
|||
manager = SubprocessEnvManager(mock_env_factory, 4) |
|||
|
|||
params = {"test": "params"} |
|||
res = manager.reset(params) |
|||
for i, env in enumerate(manager.env_workers): |
|||
env.send.assert_called_with("reset", (params, True, None)) |
|||
env.recv.assert_called() |
|||
# Check that the "last steps" are set to the value returned for each step |
|||
self.assertEqual( |
|||
manager.env_workers[i].previous_step.current_all_brain_info, i |
|||
) |
|||
assert res == list(map(lambda ew: ew.previous_step, manager.env_workers)) |
|||
|
|||
def test_step_takes_steps_for_all_envs(self): |
|||
SubprocessEnvManager.create_worker = lambda em, worker_id, env_factory: MockEnvWorker( |
|||
worker_id, EnvironmentResponse("step", worker_id, worker_id) |
|||
) |
|||
manager = SubprocessEnvManager(mock_env_factory, 2) |
|||
step_mock = Mock() |
|||
last_steps = [Mock(), Mock()] |
|||
manager.env_workers[0].previous_step = last_steps[0] |
|||
manager.env_workers[1].previous_step = last_steps[1] |
|||
manager._take_step = Mock(return_value=step_mock) |
|||
res = manager.step() |
|||
for i, env in enumerate(manager.env_workers): |
|||
env.send.assert_called_with("step", step_mock) |
|||
env.recv.assert_called() |
|||
# Check that the "last steps" are set to the value returned for each step |
|||
self.assertEqual( |
|||
manager.env_workers[i].previous_step.current_all_brain_info, i |
|||
) |
|||
self.assertEqual( |
|||
manager.env_workers[i].previous_step.previous_all_brain_info, |
|||
last_steps[i].current_all_brain_info, |
|||
) |
|||
assert res == list(map(lambda ew: ew.previous_step, manager.env_workers)) |
|
|||
from unittest import mock |
|||
|
|||
from mlagents.envs import timers |
|||
|
|||
|
|||
@timers.timed |
|||
def decorated_func(x: int = 0, y: float = 1.0) -> str: |
|||
return f"{x} + {y} = {x + y}" |
|||
|
|||
|
|||
def test_timers() -> None: |
|||
with mock.patch( |
|||
"mlagents.envs.timers._global_timer_stack", new_callable=timers.TimerStack |
|||
) as test_timer: |
|||
# First, run some simple code |
|||
with timers.hierarchical_timer("top_level"): |
|||
for i in range(3): |
|||
with timers.hierarchical_timer("multiple"): |
|||
decorated_func() |
|||
|
|||
raised = False |
|||
try: |
|||
with timers.hierarchical_timer("raises"): |
|||
raise RuntimeError("timeout!") |
|||
except RuntimeError: |
|||
raised = True |
|||
|
|||
with timers.hierarchical_timer("post_raise"): |
|||
assert raised |
|||
pass |
|||
|
|||
# We expect the hierarchy to look like |
|||
# (root) |
|||
# top_level |
|||
# multiple |
|||
# decorated_func |
|||
# raises |
|||
# post_raise |
|||
root = test_timer.root |
|||
assert root.children.keys() == {"top_level"} |
|||
|
|||
top_level = root.children["top_level"] |
|||
assert top_level.children.keys() == {"multiple", "raises", "post_raise"} |
|||
|
|||
# make sure the scope was closed properly when the exception was raised |
|||
raises = top_level.children["raises"] |
|||
assert raises.count == 1 |
|||
|
|||
multiple = top_level.children["multiple"] |
|||
assert multiple.count == 3 |
|||
|
|||
timer_tree = test_timer.get_timing_tree() |
|||
|
|||
expected_tree = { |
|||
"name": "root", |
|||
"total": mock.ANY, |
|||
"count": 1, |
|||
"self": mock.ANY, |
|||
"children": [ |
|||
{ |
|||
"name": "top_level", |
|||
"total": mock.ANY, |
|||
"count": 1, |
|||
"self": mock.ANY, |
|||
"children": [ |
|||
{ |
|||
"name": "multiple", |
|||
"total": mock.ANY, |
|||
"count": 3, |
|||
"self": mock.ANY, |
|||
"children": [ |
|||
{ |
|||
"name": "decorated_func", |
|||
"total": mock.ANY, |
|||
"count": 3, |
|||
"self": mock.ANY, |
|||
} |
|||
], |
|||
}, |
|||
{ |
|||
"name": "raises", |
|||
"total": mock.ANY, |
|||
"count": 1, |
|||
"self": mock.ANY, |
|||
}, |
|||
{ |
|||
"name": "post_raise", |
|||
"total": mock.ANY, |
|||
"count": 1, |
|||
"self": mock.ANY, |
|||
}, |
|||
], |
|||
} |
|||
], |
|||
} |
|||
assert timer_tree == expected_tree |
|
|||
# # Unity ML-Agents Toolkit |
|||
from time import perf_counter |
|||
|
|||
from contextlib import contextmanager |
|||
from typing import Any, Callable, Dict, Generator, TypeVar |
|||
|
|||
""" |
|||
Lightweight, hierarchical timers for profiling sections of code. |
|||
|
|||
Example: |
|||
|
|||
@timed |
|||
def foo(t): |
|||
time.sleep(t) |
|||
|
|||
def main(): |
|||
for i in range(3): |
|||
foo(i + 1) |
|||
with hierarchical_timer("context"): |
|||
foo(1) |
|||
|
|||
print(get_timer_tree()) |
|||
|
|||
This would produce a timer tree like |
|||
(root) |
|||
"foo" |
|||
"context" |
|||
"foo" |
|||
|
|||
The total time and counts are tracked for each block of code; in this example "foo" and "context.foo" are considered |
|||
distinct blocks, and are tracked separately. |
|||
|
|||
The decorator and contextmanager are equivalent; the context manager may be more useful if you want more control |
|||
over the timer name, or are splitting up multiple sections of a large function. |
|||
""" |
|||
|
|||
|
|||
class TimerNode: |
|||
""" |
|||
Represents the time spent in a block of code. |
|||
""" |
|||
|
|||
__slots__ = ["children", "total", "count"] |
|||
|
|||
def __init__(self): |
|||
# Note that since dictionary keys are the node names, we don't explicitly store the name on the TimerNode. |
|||
self.children: Dict[str, TimerNode] = {} |
|||
self.total: float = 0.0 |
|||
self.count: int = 0 |
|||
|
|||
def get_child(self, name: str) -> "TimerNode": |
|||
""" |
|||
Get the child node corresponding to the name (and create if it doesn't already exist). |
|||
""" |
|||
child = self.children.get(name) |
|||
if child is None: |
|||
child = TimerNode() |
|||
self.children[name] = child |
|||
return child |
|||
|
|||
def add_time(self, elapsed: float) -> None: |
|||
""" |
|||
Accumulate the time spent in the node (and increment the count). |
|||
""" |
|||
self.total += elapsed |
|||
self.count += 1 |
|||
|
|||
|
|||
class TimerStack: |
|||
""" |
|||
Tracks all the time spent. Users shouldn't use this directly, they should use the contextmanager below to make |
|||
sure that pushes and pops are already matched. |
|||
""" |
|||
|
|||
__slots__ = ["root", "stack", "start_time"] |
|||
|
|||
def __init__(self): |
|||
self.root = TimerNode() |
|||
self.stack = [self.root] |
|||
self.start_time = perf_counter() |
|||
|
|||
def push(self, name: str) -> TimerNode: |
|||
""" |
|||
Called when entering a new block of code that is timed (e.g. with a contextmanager). |
|||
""" |
|||
current_node: TimerNode = self.stack[-1] |
|||
next_node = current_node.get_child(name) |
|||
self.stack.append(next_node) |
|||
return next_node |
|||
|
|||
def pop(self) -> None: |
|||
""" |
|||
Called when exiting a new block of code that is timed (e.g. with a contextmanager). |
|||
""" |
|||
self.stack.pop() |
|||
|
|||
def get_timing_tree(self, node: TimerNode = None) -> Dict[str, Any]: |
|||
""" |
|||
Recursively build a tree of timings, suitable for output/archiving. |
|||
""" |
|||
|
|||
if node is None: |
|||
# Special case the root - total is time since it was created, and count is 1 |
|||
node = self.root |
|||
total_elapsed = perf_counter() - self.start_time |
|||
res = {"name": "root", "total": total_elapsed, "count": 1} |
|||
else: |
|||
res = {"total": node.total, "count": node.count} |
|||
|
|||
child_total = 0.0 |
|||
child_list = [] |
|||
for child_name, child_node in node.children.items(): |
|||
child_res: Dict[str, Any] = { |
|||
"name": child_name, |
|||
**self.get_timing_tree(child_node), |
|||
} |
|||
child_list.append(child_res) |
|||
child_total += child_res["total"] |
|||
|
|||
# "self" time is total time minus all time spent on children |
|||
res["self"] = max(0.0, node.total - child_total) |
|||
if child_list: |
|||
res["children"] = child_list |
|||
|
|||
return res |
|||
|
|||
|
|||
# Global instance of a TimerStack. This is generally all that we need for profiling, but you can potentially |
|||
# create multiple instances and pass them to the contextmanager |
|||
_global_timer_stack = TimerStack() |
|||
|
|||
|
|||
@contextmanager |
|||
def hierarchical_timer(name: str, timer_stack: TimerStack = None) -> Generator: |
|||
""" |
|||
Creates a scoped timer around a block of code. This time spent will automatically be incremented when |
|||
the context manager exits. |
|||
""" |
|||
timer_stack = timer_stack or _global_timer_stack |
|||
timer_node = timer_stack.push(name) |
|||
start_time = perf_counter() |
|||
|
|||
try: |
|||
# The wrapped code block will run here. |
|||
yield |
|||
finally: |
|||
# This will trigger either when the context manager exits, or an exception is raised. |
|||
# We'll accumulate the time, and the exception (if any) gets raised automatically. |
|||
elapsed = perf_counter() - start_time |
|||
timer_node.add_time(elapsed) |
|||
timer_stack.pop() |
|||
|
|||
|
|||
# This is used to ensure the signature of the decorated function is preserved |
|||
# See also https://github.com/python/mypy/issues/3157 |
|||
FuncT = TypeVar("FuncT", bound=Callable[..., Any]) |
|||
|
|||
|
|||
def timed(func: FuncT) -> FuncT: |
|||
""" |
|||
Decorator for timing a function or method. The name of the timer will be the qualified name of the function. |
|||
Usage: |
|||
@timed |
|||
def my_func(x, y): |
|||
return x + y |
|||
Note that because this doesn't take arguments, the global timer stack is always used. |
|||
""" |
|||
|
|||
def wrapped(*args, **kwargs): |
|||
with hierarchical_timer(func.__qualname__): |
|||
return func(*args, **kwargs) |
|||
|
|||
return wrapped # type: ignore |
|||
|
|||
|
|||
def get_timer_tree(timer_stack: TimerStack = None) -> Dict[str, Any]: |
|||
""" |
|||
Return the tree of timings from the TimerStack as a dictionary (or the global stack if none is provided) |
|||
""" |
|||
timer_stack = timer_stack or _global_timer_stack |
|||
return timer_stack.get_timing_tree() |
|
|||
import unittest.mock as mock |
|||
import pytest |
|||
import mlagents.trainers.tests.mock_brain as mb |
|||
|
|||
import numpy as np |
|||
import yaml |
|||
import os |
|||
|
|||
from mlagents.trainers.ppo.policy import PPOPolicy |
|||
|
|||
|
|||
@pytest.fixture |
|||
def dummy_config(): |
|||
return yaml.safe_load( |
|||
""" |
|||
trainer: ppo |
|||
batch_size: 32 |
|||
beta: 5.0e-3 |
|||
buffer_size: 512 |
|||
epsilon: 0.2 |
|||
hidden_units: 128 |
|||
lambd: 0.95 |
|||
learning_rate: 3.0e-4 |
|||
max_steps: 5.0e4 |
|||
normalize: true |
|||
num_epoch: 5 |
|||
num_layers: 2 |
|||
time_horizon: 64 |
|||
sequence_length: 64 |
|||
summary_freq: 1000 |
|||
use_recurrent: false |
|||
memory_size: 8 |
|||
pretraining: |
|||
demo_path: ./demos/ExpertPyramid.demo |
|||
strength: 1.0 |
|||
steps: 10000000 |
|||
reward_signals: |
|||
extrinsic: |
|||
strength: 1.0 |
|||
gamma: 0.99 |
|||
""" |
|||
) |
|||
|
|||
|
|||
def create_mock_3dball_brain(): |
|||
mock_brain = mb.create_mock_brainparams( |
|||
vector_action_space_type="continuous", |
|||
vector_action_space_size=[2], |
|||
vector_observation_space_size=8, |
|||
) |
|||
return mock_brain |
|||
|
|||
|
|||
def create_mock_banana_brain(): |
|||
mock_brain = mb.create_mock_brainparams( |
|||
number_visual_observations=1, |
|||
vector_action_space_type="discrete", |
|||
vector_action_space_size=[3, 3, 3, 2], |
|||
vector_observation_space_size=0, |
|||
) |
|||
return mock_brain |
|||
|
|||
|
|||
def create_ppo_policy_with_bc_mock( |
|||
mock_env, mock_brain, dummy_config, use_rnn, demo_file |
|||
): |
|||
mock_braininfo = mb.create_mock_braininfo(num_agents=12, num_vector_observations=8) |
|||
mb.setup_mock_unityenvironment(mock_env, mock_brain, mock_braininfo) |
|||
env = mock_env() |
|||
|
|||
trainer_parameters = dummy_config |
|||
model_path = env.brain_names[0] |
|||
trainer_parameters["model_path"] = model_path |
|||
trainer_parameters["keep_checkpoints"] = 3 |
|||
trainer_parameters["use_recurrent"] = use_rnn |
|||
trainer_parameters["pretraining"]["demo_path"] = ( |
|||
os.path.dirname(os.path.abspath(__file__)) + "/" + demo_file |
|||
) |
|||
policy = PPOPolicy(0, mock_brain, trainer_parameters, False, False) |
|||
return env, policy |
|||
|
|||
|
|||
# Test default values |
|||
@mock.patch("mlagents.envs.UnityEnvironment") |
|||
def test_bcmodule_defaults(mock_env, dummy_config): |
|||
# See if default values match |
|||
mock_brain = create_mock_3dball_brain() |
|||
env, policy = create_ppo_policy_with_bc_mock( |
|||
mock_env, mock_brain, dummy_config, False, "test.demo" |
|||
) |
|||
assert policy.bc_module.num_epoch == dummy_config["num_epoch"] |
|||
assert policy.bc_module.batch_size == dummy_config["batch_size"] |
|||
env.close() |
|||
# Assign strange values and see if it overrides properly |
|||
dummy_config["pretraining"]["num_epoch"] = 100 |
|||
dummy_config["pretraining"]["batch_size"] = 10000 |
|||
env, policy = create_ppo_policy_with_bc_mock( |
|||
mock_env, mock_brain, dummy_config, False, "test.demo" |
|||
) |
|||
assert policy.bc_module.num_epoch == 100 |
|||
assert policy.bc_module.batch_size == 10000 |
|||
env.close() |
|||
|
|||
|
|||
# Test with continuous control env and vector actions |
|||
@mock.patch("mlagents.envs.UnityEnvironment") |
|||
def test_bcmodule_update(mock_env, dummy_config): |
|||
mock_brain = create_mock_3dball_brain() |
|||
env, policy = create_ppo_policy_with_bc_mock( |
|||
mock_env, mock_brain, dummy_config, False, "test.demo" |
|||
) |
|||
stats = policy.bc_module.update() |
|||
for _, item in stats.items(): |
|||
assert isinstance(item, np.float32) |
|||
env.close() |
|||
|
|||
|
|||
# Test with RNN |
|||
@mock.patch("mlagents.envs.UnityEnvironment") |
|||
def test_bcmodule_rnn_update(mock_env, dummy_config): |
|||
mock_brain = create_mock_3dball_brain() |
|||
env, policy = create_ppo_policy_with_bc_mock( |
|||
mock_env, mock_brain, dummy_config, True, "test.demo" |
|||
) |
|||
stats = policy.bc_module.update() |
|||
for _, item in stats.items(): |
|||
assert isinstance(item, np.float32) |
|||
env.close() |
|||
|
|||
|
|||
# Test with discrete control and visual observations |
|||
@mock.patch("mlagents.envs.UnityEnvironment") |
|||
def test_bcmodule_dc_visual_update(mock_env, dummy_config): |
|||
mock_brain = create_mock_banana_brain() |
|||
env, policy = create_ppo_policy_with_bc_mock( |
|||
mock_env, mock_brain, dummy_config, False, "testdcvis.demo" |
|||
) |
|||
stats = policy.bc_module.update() |
|||
for _, item in stats.items(): |
|||
assert isinstance(item, np.float32) |
|||
env.close() |
|||
|
|||
|
|||
# Test with discrete control, visual observations and RNN |
|||
@mock.patch("mlagents.envs.UnityEnvironment") |
|||
def test_bcmodule_rnn_dc_update(mock_env, dummy_config): |
|||
mock_brain = create_mock_banana_brain() |
|||
env, policy = create_ppo_policy_with_bc_mock( |
|||
mock_env, mock_brain, dummy_config, True, "testdcvis.demo" |
|||
) |
|||
stats = policy.bc_module.update() |
|||
for _, item in stats.items(): |
|||
assert isinstance(item, np.float32) |
|||
env.close() |
|||
|
|||
|
|||
if __name__ == "__main__": |
|||
pytest.main() |
1001
ml-agents/mlagents/trainers/tests/testdcvis.demo
文件差异内容过多而无法显示
查看文件
文件差异内容过多而无法显示
查看文件
|
|||
BallDemo� -bfB * * 0:3DBallBrain7 |
|||
f&C�v���x��? �@�Q�� " P���������< |
|||
��<����x��?�;|@�Q�� �"{� "n��>@��==���=P���������< |
|||
�=���x��? 0r@�Q�� �"�� ":MG?��J?=���=P���������< |
|||
���=e] =x��?�a@�Q�� Z<� "�hw�{�>=���=P���������< |
|||
��m={;x��?�BK@�Q�� �"{� "�ѫ��٠�=���=P���������< |
|||
q�<�H=x��?|a.@�Q�� ���� "3����K�>=���=P���������< |
|||
��_�ޚJ=x��?�8@�Q�� Z�� "������E>=���=P���������< |
|||
�l =qR=x��?���?�Q�� r��� "�R?6�o<=���=P���������< |
|||
�')=F-�x��?0FH?�Q�� �"�� "o�= ��=���=P���������< |
|||
���=A��袧?��@?�:����J���{Ѿ"���>�Ҵ�=���=P���������< |
|||
�'H=����(��?-5?0k�Q�(�����t^"�dr���N>=���=P���������< |
|||
���;8��9��?�T?P���,-�z�H�Q���"�%���t>=���=P���������< |
|||
ގn���Y=��? )?�j���!����5�˾"��5�l?=���=P���������< |
|||
��V��9�<�Շ?0�?h�$�;������햾"܂=@ ��=���=P���������< |
|||
�.����<P[�?��? +�~/ӾA=?�}�"�A�=6���=���=P���������< |
|||
��B�%�1= �{?�"?p /�xϿ�3l�<.J�"��>p-c>=���=P���������< |
|||
��U=4(=�Wp?�3A?x�0�1NϾ;������"s;<?�; |
|||
�=���=P���������< |
|||
.�<��|<P�e?�Z1? M4�1NϾ�w�����"�~�ÿ��=���=P���������< |
|||
�/��>F= �Z?P�+?��6�����5���k�"?+���>=���=P���������< |
|||
ψ);��=��N?p�5?�j5��w��k="�3"=dO�>=���=P���������< |
|||
t�v<}��=�C?�9?�2�,��� �Z:>"��>�}U�=���=P���������< |
|||
}nE<Z��<P7?`=)?�=/�,������Z:>"I�����=���=P���������< |
|||
z�ڼ� �<�+?Pa?�+�����J���=>"PA㾻�<=���=P���������< |
|||
^�0=˙X=��?�s8?(."�J�� |
|||
���9�>"��M?�*�>=���=P���������< |
|||
��S<M�< |