浏览代码

Remove some vestigial code

/develop/trainerinterface
Ervin Teng 5 年前
当前提交
b3a4e641
共有 3 个文件被更改,包括 0 次插入10 次删除
  1. 1
      ml-agents/mlagents/trainers/ppo/trainer.py
  2. 7
      ml-agents/mlagents/trainers/rl_trainer.py
  3. 2
      ml-agents/mlagents/trainers/sac/trainer.py

1
ml-agents/mlagents/trainers/ppo/trainer.py


The reward signal generators must be updated in this method at their own pace.
"""
buffer_length = self.update_buffer.num_experiences
self.cumulative_returns_since_policy_update.clear()
# Make sure batch_size is a multiple of sequence length. During training, we
# will need to reshape the data into a batch_size x sequence_length tensor.

7
ml-agents/mlagents/trainers/rl_trainer.py


def __init__(self, *args, **kwargs):
super(RLTrainer, self).__init__(*args, **kwargs)
self.param_keys: List[str] = []
self.cumulative_returns_since_policy_update: List[float] = []
self.step: int = 0
self.training_start_time = time.time()
self.summary_freq = self.trainer_parameters["summary_freq"]

A signal that the Episode has ended. The buffer must be reset.
Get only called when the academy resets.
"""
for agent_id in self.episode_steps:
self.episode_steps[agent_id] = 0
self.episode_steps[agent_id] = 0
self.cumulative_returns_since_policy_update.append(
rewards.get(agent_id, 0)
)
self.reward_buffer.appendleft(rewards.get(agent_id, 0))
rewards[agent_id] = 0
else:

2
ml-agents/mlagents/trainers/sac/trainer.py


N times, then the reward signals are updated N times, then reward_signal_updates_per_train
is greater than 1 and the reward signals are not updated in parallel.
"""
self.cumulative_returns_since_policy_update.clear()
n_sequences = max(
int(self.trainer_parameters["batch_size"] / self.policy.sequence_length), 1
)

正在加载...
取消
保存