浏览代码

Move processing of steps after reset to advance() (#3271)

In the previous PR, steps were processed when the env manager was reset. This was an issue for the very first reset, where we don't actually know which agent groups (and AgentManagers) we needed to send the steps to. These steps were being thrown away.

This PR moves the processing of steps to advance(), so that the initial reset steps are simply processed when the next advance(). This also removes the need for an additional block of code in TrainerController to handle the initial reset.
/asymm-envs
GitHub 5 年前
当前提交
65dbe0ec
共有 2 个文件被更改,包括 12 次插入5 次删除
  1. 12
      ml-agents/mlagents/trainers/env_manager.py
  2. 5
      ml-agents/mlagents/trainers/trainer_controller.py

12
ml-agents/mlagents/trainers/env_manager.py


def __init__(self):
self.policies: Dict[AgentGroup, TFPolicy] = {}
self.agent_managers: Dict[AgentGroup, AgentManager] = {}
self.first_step_infos: List[EnvironmentStep] = None
def set_policy(self, brain_name: AgentGroup, policy: TFPolicy) -> None:
self.policies[brain_name] = policy

def reset(self, config: Dict = None) -> int:
for manager in self.agent_managers.values():
manager.end_episode()
return self._process_step_infos(self._reset_env(config))
# Save the first step infos, after the reset.
# They will be processed on the first advance().
self.first_step_infos = self._reset_env(config)
return len(self.first_step_infos)
@property
@abstractmethod

pass
def advance(self):
# If we had just reset, process the first EnvironmentSteps.
# Note that we do it here instead of in reset() so that on the very first reset(),
# we can create the needed AgentManagers before calling advance() and processing the EnvironmentSteps.
if self.first_step_infos is not None:
self._process_step_infos(self.first_step_infos)
self.first_step_infos = None
# Get new policies if found
for brain_name in self.external_brains:
try:

5
ml-agents/mlagents/trainers/trainer_controller.py


global_step = 0
last_brain_behavior_ids: Set[str] = set()
try:
# Create the initial set of trainers and managers
initial_brain_behaviors = set(env_manager.external_brains.keys())
self._create_trainers_and_managers(env_manager, initial_brain_behaviors)
last_brain_behavior_ids = initial_brain_behaviors
# Initial reset
self._reset_env(env_manager)
while self._not_done_training():
external_brain_behavior_ids = set(env_manager.external_brains.keys())

正在加载...
取消
保存