浏览代码

shared baseline and v

/develop/coma-withq
Andrew Cohen 4 年前
当前提交
f9ff3fef
共有 2 个文件被更改,包括 3 次插入3 次删除
  1. 4
      ml-agents/mlagents/trainers/optimizer/torch_optimizer.py
  2. 2
      ml-agents/mlagents/trainers/torch/networks.py

4
ml-agents/mlagents/trainers/optimizer/torch_optimizer.py


team_act=team_actions,
)
value_estimates, mem = self.policy.actor_critic.target_critic_value(
value_estimates, mem = self.policy.actor_critic.critic_value(
current_obs,
memory,
sequence_length=batch.num_experiences,

boot_value_estimates, mem = self.policy.actor_critic.target_critic_value(
boot_value_estimates, mem = self.policy.actor_critic.critic_value(
next_obs,
memory,
sequence_length=batch.num_experiences,

2
ml-agents/mlagents/trainers/torch/networks.py


team_obs=team_obs,
team_act=team_act,
)
value_outputs, _ = self.target_critic_value(inputs, memories=critic_mem, sequence_length=sequence_length, team_obs=team_obs)
value_outputs, _ = self.critic_value(inputs, memories=critic_mem, sequence_length=sequence_length, team_obs=team_obs)
return log_probs, entropies, q_outputs, baseline_outputs, value_outputs

正在加载...
取消
保存